"Text mining ( creat a bag of word)"

nabilophone11nabilophone11 MemberPosts:11Contributor II
edited June 2019 inHelp
Hi every body

Please can you tel me how to insert a list of word ( about 600) automaticly (as attribut) in rapidminer 5.1.11, i find just a manual way with " Generate attribut "

thanks for help


N

Answers

  • colocolo MemberPosts:236Maven
    Hi,

    this all depends on how the data is available. You probably have the words in some kind of list (Excel, CSV, Database)!? You can use the import wizard or the import operators like "Read Database" to import the data into RapidMiner. If you have the words as plain text you need operators from the text processing extension like "Read Document" or "Process Documents from Data". But you will have to provide more details for further help.

    Regards
    Matthias
  • nabilophone11nabilophone11 MemberPosts:11Contributor II
    Thanks Matthias,

    Actualy, i have an excel file with 30000 lines and 3 attribute : id, text attribut(adresse), and label (yes/no), i have a bout 600 word who help me to say yes for the adress, sow i want to creat 600 attribut automaticly, i don't have probleme to import data, my pb is with how to creat the 600 attribut ( i didn't find a way to creat a list of word....:'(

    Thanks

    N
  • nabilophone11nabilophone11 MemberPosts:11Contributor II
    Please i need helpp:'(
  • colocolo MemberPosts:236Maven
    Hi,

    I'm not really sure about your intention. Do you need the RapidMiner wordlist format? In this case I don't know how to create one instead of creating word vectors by one of the "Process Documents" operators. What do you mean by "help me to say yes"? What classifier do you want to use?

    Regards
    Matthias
  • nabilophone11nabilophone11 MemberPosts:11Contributor II
    Hi,

    i want to creat a matrix with this 600 attribut, if one of them is true, my class(label) is positive else negative...sorry about my english so i didn't find a way to insert all this attribut automaticly. i think that rapidminer word list format can help yes so i tried to instal WVTOOL but i doesn' work with rapidminer 5.1.11 ?

    regards,

    N

  • colocolo MemberPosts:236Maven
    Hi,

    creating a wordlist for these words should be possible by writing them into a single document (e.g. one word per line or separated by some other whitespace), importing this to RapidMiner, creating a word vector using "Process Documents" (with tokenization inside). The "Process Documents" operator should deliver the desired wordlist. But I have my doubts, if this will really help you, since your classifier seems only to depend on the word lookup. I'm not sure which approach would make sense and my time is limited at the moment... sorry.

    Perhaps someone else may help?

    Regards
    Matthias
  • nabilophone11nabilophone11 MemberPosts:11Contributor II
    Thanks Matthias

    after creating the word list, i'm thinking about using SVM model for learning...i will let you know about the result...


    Best,

    N
  • nabilophone11nabilophone11 MemberPosts:11Contributor II
    now i get my 800 attribut ( bag of words)...success....but not finish yet because i have to find the way to get a matrix with 0/1 for evey attribut of my bag of words...

    Do you have an idea about the perfect way to get the result ?

    Best,


    N
  • nabilophone11nabilophone11 MemberPosts:11Contributor II
    Hi every body,

    I get the result with 10% of error...i'm trying to perform my model...do you have any suggestion ?

    我不知道如何得到一个新的attribut投入e me all attribut with value = true by line ? is that possible

    Thank you for your help

    Best,

    N
  • nabilophone11nabilophone11 MemberPosts:11Contributor II
    Hi,

    It is possible in rapid miner to creat an result attribut who regroup all the attribut with value = Yes


    Ex :
    ID Label AT1 AT2 AT3 AT4.... (what i need)
    row1 : 1 Yes YES NO Yes NO... AT1, AT3
    row2 : 2 Yes NO NO Yes NO... AT3*
    .
    .
    Please need your help ! thank you very much !
Sign InorRegisterto comment.