"Set Role Operator as document Name [Solved]"

waelyafoozwaelyafooz MemberPosts:10Contributor II
edited June 2019 inHelp
Hi to all,
I used the rapidMiner for clustering purpose, i used K-mean.

My question is how can i know the documents which cluster inside cluster one and which cluster in cluster 2 and so on . Because i need to

calculate the quality of data clusters (using the F-meausre and Entropy).

FYI. my collection of documents around 1000 textual document . These documents named from A1 to A1000. How can i know the document which clustered in each cluster?

Is the any example of how can i used the SET ROLE OPERATOR to set the document name as column after cluster process?
my regards

Wael.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi Wael,

    use the set Role operator before the clustering operator to define the column that contains the document names as id.

    Best regards,
    Marius
  • waelyafoozwaelyafooz MemberPosts:10Contributor II
    Many thanks for you Marius,
    Yes i used the Set Role Operator before clustering, Then there are two attributes for ROLE OPERATOR are name and target role , So in name i select metadata_file while in target role i select id. unfortunately the document name not appear in the results.
    my regards
    WAEL
  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Can you please post your process setup as described in the link in my signature?

    Best regards,
    Marius
  • waelyafoozwaelyafooz MemberPosts:10Contributor II
    Many thanks for you Marius :

    My steps as following

    1) Process documents from Files and inside it Tokenize
    2) Set Role Operator
    3) Clustering operator (Kmean).

    FYI: the set Role operator before the clustering and after the Process Document from files

















    < portSpacing端口= " source_document”间隔= " 0 " / >





















    my regards
    Wael
  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Wael,

    you have connected only the cluster model to the process output, but not the clustered data. Try connecting the second output of k-Means to the process output to create the expected result.

    Best regards,
    Marius
  • waelyafoozwaelyafooz MemberPosts:10Contributor II
    Many thanks for yoy Marius,
    Also same problem the out put without the file name

















    < portSpacing端口= " source_document”间隔= " 0 " / >























  • waelyafoozwaelyafooz MemberPosts:10Contributor II
    I do like the follwing:

    Many thanks for you Marius :

    My steps as following

    1) Process documents from Files and inside it Tokenize
    2) Set Role Operator
    3) Clustering operator (Kmean).



    Many thanks for you Marius :

    My steps as following

    1) Process documents from Files and inside it Tokenize
    2) Clustering operator (Kmean).
    3) Set Role Operator .

    Please have a look to this :


















    < portSpacing端口= " source_document”间隔= " 0 " / >


























  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Wael,

    what's the problem with the process you posted? What do you want to do? Why are you using the Set Role operator?

    By the way, you are using RapidMiner 5.2.8, which is more than a year old, please update to the latest version 5.3.13.

    Best regards,
    Marius
  • waelyafoozwaelyafooz MemberPosts:10Contributor II
    Hi marius,
    Yes, i already upgrade my version now.

    I want to show the document name in the cluster .
    for example using the K-mean, i produce 5 clusters based , i want to know which document inside which cluster.
    for example :
    文档我有a, b, c, dd so on , if the document b and c in one cluster just show me the name of documents as attribute.

    if you ask me Why i need that ?
    i need to calculate the f-measure based on my data set.

    my regards
    Wael.
  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi Wael,

    but the output of the process setup posted above already contains both the cluster in one column and the filename in another column. I just checked it with the latest version of RapidMiner. Do you have a different output?

    Best regards,
    Marius
  • waelyafoozwaelyafooz MemberPosts:10Contributor II
    Many thanks for you Mr.Marius for you help .... Yes already done . Thanks alot ..
    have alook to this



















    < portSpacing端口= " source_document”间隔= " 0 " / >

























Sign InorRegisterto comment.