Davis bouldin index

Hi, I am using davis bouldin index and got minus 2. When I changed in attributes I got - 4
Which one is better? - 2 or - 4?
0
0 votes

Declined·Last Updated

2019年3月以来没有活动或投票。请通讯ent and cc sgenzer if this should be reopened. RM-3972

Comments

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist
    Hi@shaimaa,

    Great question! The D-B index was multiplied by -1 internally for maximizing it. It is a kind-of bug. You could ignore the negative sign from the performance output. So the clustering model with DB index -2 is better.

    “聚类算法产生一个集合of clusters with the smallest Davies–Bouldin index is considered the best algorithm" -Wikipedia

    The Davies-Bouldin Index evaluates intra-cluster similarity and inter-cluster differences. If you consider these to be good criteria, go for the Davies-Bouldin.

    My attached process is an optimization to pick the best K for K-means model, which returns k=3 has the lowest D-B index. You can also tryX-meansto get an optimized clustering.

              [email protected]" / > <参数键=“process_duration_for_mail val”ue="1"/>                                                            Davies-Bouldin Index evaluates intra-cluster similarity and inter-cluster differences. If you consider these to be good criteria, go for the Davies-Bouldin. The Silhouette Index measure the distance between each data point, the centroid of the cluster it was assigned to and the closest centroid belonging to another cluster. If you consider that this is a good criterion, go for the silhouette index.<br><br>How can we say that a clustering quality measure is good?. Available from:https://www.researchgate.net/post/How_can_we_say_that_a_clustering_quality_measure_is_good.;  figure out the best k for k-means                            run x-means for an optimzied clustering               


    YY

  • ShaimaaShaimaa MemberPosts:2Learner I
    Hi@yyhuang
    Thanks for the reply
    But I saw other comments here for other post asking same question and got different reply. We should take the minimum and if maximized (remove multiplication by - 1) we should take the greate number. This what makes me confused
  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn
    The -1 appears in several operators that are based on distances. It's quite annoying!
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    Agreed, it would be very nice to convert these types of measures back to their "standard form" so when we share output from RapidMiner it is comparable to the way the rest of the world expects them to work:smiley:
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    Well, in fact that is what issupposedto happen anyway. We have a mechanism in all those performance criteria to show the value and also to deliver a fitness (which is always to be maximized independent of what value is shown). Unfortunately, some of the criteria (or their developers ;-) are a bit lazy and do not correctly implement this behavior and simply return a negative value instead for both... You can help us actually by pointing out those cases. DB-Index is one, any others you have noticed and remember from the top of your head?
    Thanks,
    Ingo
Sign InorRegisterto comment.