Random Clustering

Synopsis

This operator performs a random flat clustering of the given ExampleSet. Clustering is concerned with grouping objects together that are similar to each other and dissimilar to the objects belonging to other clusters.

Description

This operator performs a random flat clustering of the given ExampleSet. Please note that this algorithm does not guarantee that all clusters will be non-empty. This operator creates a cluster attribute in the resultant ExampleSet if theadd cluster attributeparameter is set to true. It is important to note that this operator randomly assigns examples to clusters, if you want proper clustering please use an operator that implements a clustering algorithm like the K-Means operator.

Clustering is concerned with grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters. Clustering is a technique for extracting information from unlabeled data. Clustering can be very useful in many different scenarios e.g. in a marketing application we may be interested in finding clusters of customers with similar buying behavior.

Input

example set

The input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

集群model

This port delivers the cluster model which has information regarding the clustering performed. It tells which examples are part of which cluster.

集群ed set

The ExampleSet that was given as input is passed with minor changes to the output through this port. An attribute withidrole is added to the input ExampleSet to distinguish examples. An attribute with集群role may also be added depending on the state of theadd cluster attributeparameter.

Parameters

Add cluster attribute

If enabled, a new attribute with集群role is generated directly in this operator, otherwise this operator does not add the集群attribute. In the latter case you have to use the Apply Model operator to generate the集群attribute.

Add as label

If true, the cluster id is stored in an attribute with thelabelrole instead of集群role (seeadd cluster attributeparameter).

Remove unlabeled

If set to true, unlabeled examples are deleted.

Number of clusters

This parameter specifies the desired number of clusters to form. There is no hard and fast rule for the number of clusters to form. But, generally it is preferred to have a small number of clusters with examples scattered (not too scattered) around them in a balanced way.

Use local random seed

This parameter indicates if alocal random seedshould be used for randomization.

Local random seed

This parameter specifies thelocal random seed．这个参数只是如果可用use local random seedparameter is set to true.