technical question about the combined use of clustering and classification

BelleBelle MemberPosts:3Newbie
大家好!我是一个新手rapidminer confronted a problem regarding the combined use of the clustering and classification.

基本上,我想m的k - means集群发展y initial dataset and then further build models to perform the classification and evaluate their performance for EACH of the clusters. I know how to use the operators to perform cluster analysis and classification respectively but have no idea how to deploy the operators to combine them. I tried many ways such as placing the k-means operators before or within the cross-validation but still fail to either run it successfully or get the performance result of each cluster. Can anyone help?
Any response would be greatly appreciated:)

Thank you!
Jasmine_

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Hi@Belle,

    Are you using one of the performances operators dedicated to clustering (A priori theCluster Distance Performancefor k-Means) :



    Regards,

    Lionel
    Belle
  • BelleBelle MemberPosts:3Newbie
    Hi@lionelderkrikor,

    Thank you for your replay:)
    And yeah, I tried "Cluster Distance Performance" in my process but found out it was just for evaluating the cluster (e.g. telling me the Davies-Bouldin index of the cluster) while the result I want is to see the performance (say, accuracy) in each cluster. Do I misunderstand those operators?

    Thanks!
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    edited May 2020
    @Belle

    I think you have to Generate a "prediction attribute" from your clustering results to perform the correspondence between
    the cluster(s) results and the classes of your label.

    EDIT :
    I'm using the Iris Dataset. To be more precise on the methodology , I 'm c lustering the different examples, and then label each cluster using the majority label of the labelled examples in that cluster.

    You can see what I mean by opening and running the process in attached file.

    Hope this helps,

    Regards,

    Lionel
    Belle
  • BelleBelle MemberPosts:3Newbie
    Hi@lionelderkrikor,

    Big thanks for your explanation and example!:)

    But I came up with two questions regarding your provided process:

    1. In the training section of the cross-validation operator, it uses simply one clustering operator to train the model. I am wondering why we don't need to put any model for classification (e.g. decision tree or neural net) as the whole dataset contains the labelled attribute, which should thus be used as supervised learning? ( In my imagination, if I want to do classification in each of the clusters, I should have used both clustering operator and classification model?)

    2. In the testing section of the cross-validation operator, you use generate attribute to assign the label to each cluster. Does that mean that instead of assigning the label using the classification model, we should assign the label manually (where, I found some inconsistency, e.g. cluster 0 contains both Iris-versicolor & Iris-virginica, but you only assign the cluster 0 to Iris-versicolor?)?

    Thank you so much!

    Belle
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    Take a look at Map Clusters to Labels operator. It will do what you are looking for (I think) but you need to have the same number of classes in your label as you have clusters.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    lionelderkrikor
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Hi@Belle,

    To answer to your question :
    Does that mean that instead of assigning the label using the classification model, we should assign the label manually
    It is effectively what I tried to do manually/ "handcraft" in the process I shared in my previous post. This operation is performed automatically by theMap Clusters on Labelsoperator as said by@Telcontar120, but I was not aware of this operator.
    I can say in conclusion that I learn new things everyday on RapidMiner...;)
    Thanks for sharing this operator, Brian !

    Regards,

    Lionel

    Telcontar120
Sign InorRegisterto comment.