Computations for Cluster Distance Performance operator
I am having trouble replicating the computations of the "avg. within cluster distance" metrics produced by thePerformance (Cluster Distance Performance)operator.
The operator documentation states - "avg._within_centroid_distance: The average within cluster distance is calculated by averaging the distance between the centroid and all examples of a cluster." The term "avg_within_centroid_distance" seems confusing to me because the definition is actually stating that it is "avg_within_cluster_distance" which are two different concepts altogether. Also, it is not clear how the overall "avg._within_centroid_distance" is computed in addition to the metric computed for each cluster.
I have attached the sample calculations for the Iris dataset along with the RapidMiner process. I was able to replicate the Davies Bouldin index but not the "avg._within_centroid_distance". Any help would be much appreciated.
On a related note, it is also not clear to me what thePerformance (Cluster Density Performance)operator is computing and how. I did read the operator documentation but it did not make sense to me.
The operator documentation states - "avg._within_centroid_distance: The average within cluster distance is calculated by averaging the distance between the centroid and all examples of a cluster." The term "avg_within_centroid_distance" seems confusing to me because the definition is actually stating that it is "avg_within_cluster_distance" which are two different concepts altogether. Also, it is not clear how the overall "avg._within_centroid_distance" is computed in addition to the metric computed for each cluster.
I have attached the sample calculations for the Iris dataset along with the RapidMiner process. I was able to replicate the Davies Bouldin index but not the "avg._within_centroid_distance". Any help would be much appreciated.
On a related note, it is also not clear to me what thePerformance (Cluster Density Performance)operator is computing and how. I did read the operator documentation but it did not make sense to me.
Tagged:
0
Answers
If someone could clarify what thePerformance (Cluster Density Performance)operator is computing, that would help. Thanks.