Flatten Clustering

Synopsis

This operator creates a flat clustering model from the given hierarchical clustering model. Clustering is concerned with grouping objects together that are similar to each other and dissimilar to the objects belonging to other clusters.

Description

The Flatten Clustering operator creates a flat cluster model from the given hierarchical cluster model by expanding nodes in the order of their distance until the desired number of clusters (specified by thenumber of clustersparameter) is reached. In RapidMiner, operators like the Agglomerative Clustering operator provide hierarchical cluster models. The Flatten Clustering operator takes this hierarchical cluster model and an ExampleSet as input and returns a flat cluster model and the clustered ExampleSet. Please note that RapidMiner also provides operators that perform Flat clustering e.g. the K-Means operator.

Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other. Hierarchical clustering creates a hierarchy of clusters. Flat clustering is efficient and conceptually simple, but it has a number of drawbacks. These algorithms return a flat unstructured set of clusters, require a prespecified number of clusters as input and are nondeterministic. Hierarchical clustering outputs a hierarchy, a structure that is more informative than the unstructured set of clusters returned by flat clustering. Hierarchical clustering does not require us to prespecify the number of clusters and most hierarchical algorithms that have been used in information retrieval are deterministic. These advantages of hierarchical clustering come at the cost of lower efficiency.

Clustering is concerned with grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters. It is a technique for extracting information from unlabeled data and can be very useful in many different scenarios e.g. in a marketing application we may be interested in finding clusters of customers with similar buying behavior.

Input

hierarchical

This port expects the hierarchical cluster model. Hierarchical clustering operators like the Agglomerative Clustering operator generate such a model.

example set

The input port expects an ExampleSet. It is the output of the Agglomerative Clustering operator in the attached Example Process. The output of other operators can also be used as input.

Output

flat

This port delivers the flat cluster model which has information regarding the clustering performed. It tells which examples are part of which cluster.

example set

The ExampleSet that was given as input is passed with minor changes to the output through this port. An attribute withid角色添加到输入ExampleSet distinguish examples.

Parameters

Number of clusters

This parameter specifies the desired number of clusters to form. There is no hard and fast rule to form a number of clusters. But, generally it is preferred to have a small number of clusters with examples scattered (not too scattered) around them in a balanced way.

Add as label

If true, the cluster id is stored in an attribute with thelabelrole instead of集群role.

Remove unlabeled

If set to true, unlabeled examples are deleted.