Cluster Distance Performance

Synopsis

This operator is used for performance evaluation of centroid based clustering methods. This operator delivers a list of performance criteria values based on cluster centroids.

Description

The centroid based clustering operators like the K-Means and K-Medoids produce a centroid cluster model and a clustered set. The centroid cluster model has information regarding the clustering performed. It tells which examples are parts of which cluster. It also has information regarding centroids of each cluster. The Cluster Distance Performance operator takes this centroid cluster model and clustered set as input and evaluates the performance of the model based on the cluster centroids. Two performance measures are supported: Average within cluster distance and Davies-Bouldin index. These performance measures are explained in the parameters.

Clustering is concerned with grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters. It is a technique for extracting information from unlabeled data and can be very useful in many different scenarios e.g. in a marketing application we may be interested in finding clusters of customers with similar buying behavior.

Input

example set

This input port expects an ExampleSet. It is output of the K-Medoids operator in the attached Example Process.

cluster model

This input port expects a centroid cluster model. It is output of the K-Medoids operator in the attached Example Process. The centroid cluster model has information regarding the clustering performed. It tells which examples are part of which cluster. It also has information regarding centroids of each cluster.

performance

This input port expects a Performance Vector.

Output

performance

The performance of the cluster model is evaluated and the resultant Performance Vector is delivered through this port. The Performance Vector is a list of performance criteria values.

example set

The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

cluster model

The centroid cluster model that was given as input is passed without changing to the output through this port. This is usually used to reuse the same centroid cluster model in further operators or to view it in the Results Workspace.

Parameters

Main criterion

This parameter specifies the main criterion to use for performance evaluation.

avg._within_centroid_distance: The average within cluster distance is calculated by averaging the distance between the centroid and all examples of a cluster.
davies_bouldin: The algorithms that produce clusters with low intra-cluster distances (high intra-cluster similarity) and high inter-cluster distances (low inter-cluster similarity) will have a low Davies–Bouldin index, the clustering algorithm that produces a collection of clusters with the smallest Davies–Bouldin index is considered the best algorithm based on this criterion. Please notice that empty clusters will be ignored in the calculation of the Davies-Bouldin index.

Main criterion only

This parameter specifies if only the main criterion should be delivered by the performance vector. The main criterion is specified by themain criterionparameter

Normalize

This parameter specifies if the results should be normalized. If set to true, the criterion is divide by the number of features.

Maximize

This parameter specifies if the results should be maximized. If set to true, the result is not multiplied by minus one.