Computations for Cluster Distance Performance operator

amitdamitd Member, University ProfessorPosts:49Maven
edited November 2021 inHelp
I am having trouble replicating the computations of the "avg. within cluster distance" metrics produced by thePerformance (Cluster Distance Performance)operator.

The operator documentation states - "avg._within_centroid_distance: The average within cluster distance is calculated by averaging the distance between the centroid and all examples of a cluster." The term "avg_within_centroid_distance" seems confusing to me because the definition is actually stating that it is "avg_within_cluster_distance" which are two different concepts altogether. Also, it is not clear how the overall "avg._within_centroid_distance" is computed in addition to the metric computed for each cluster.

I have attached the sample calculations for the Iris dataset along with the RapidMiner process. I was able to replicate the Davies Bouldin index but not the "avg._within_centroid_distance". Any help would be much appreciated.

On a related note, it is also not clear to me what thePerformance (Cluster Density Performance)operator is computing and how. I did read the operator documentation but it did not make sense to me.


Answers

  • amitdamitd Member, University ProfessorPosts:49Maven
    I figured out that the "avg._within_centroid_distance" computes the average of thesquared Euclidean distancebetween each observation and the corresponding centroid, not the Euclidean distance.

    If someone could clarify what thePerformance (Cluster Density Performance)operator is computing, that would help. Thanks.
Sign InorRegisterto comment.