"How to work and use word2vec"

khazankhazan MemberPosts:23
edited June 2019 inHelp

Hello
I want to use Word2vec to convert sentences to a vector
But I do not know how to do it? My data is Twitter.
Please help me by sending the operator image
Thanks

Answers

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, MemberPosts:295RM Research

    Hi and welcome to the community.

    What you need is first install theWord2Vecextension from the RapidMiner marketplace.

    Once done you'll find the Word2Vec Operator in the Operators search bar (or use the global search).

    You can check this excellent post by Martin on how to apply the extension:

    https://community.www.turtlecreekpls.com/t5/RapidMiner-Studio-Knowledge-Base/Synonym-Detection-with-Word2Vec/ta-p/43860

    Best,
    David

    sgenzer MartinLiebig yyhuang
  • khazankhazan MemberPosts:23

    Hello
    Thank
    I saw the link
    Unfortunately, the pictures are not clear.
    You may want to send an image of the use of this operator to analyze emotions
    Many thanks


    @David_Awrote:

    Hi and welcome to the community.

    What you need is first install theWord2Vecextension from the RapidMiner marketplace.

    Once done you'll find the Word2Vec Operator in the Operators search bar (or use the global search).

    You can check this excellent post by Martin on how to apply the extension:

    https://community.www.turtlecreekpls.com/t5/RapidMiner-Studio-Knowledge-Base/Synonym-Detection-with-Word2Vec/ta-p/43860

    Best,
    David


  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,362RM Data Scientist

    Dear@khazan,

    the mentioned post has an attached .zip. It contains the full analysis so you can load it into your RM and use it yourself.

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    David_A sgenzer dang
  • khazankhazan MemberPosts:23

    Hello
    Thank you for your attention
    I downloaded the sample file to use Word2vec And I entered the RapidMiner.
    But there is an error.
    And I do not know what the process of using this operator is.
    Please give me guidance.
    thank you again88.JPG

    88.JPG 0B
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,362RM Data Scientist

    Hi,

    this operator expects a collection of tokenized documents as an input. Not an Example Set.

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzer
  • thomas_wiedmannthomas_wiedmann MemberPosts:60Guru

    I also try this nice example and get this error

    RapidMiner1.JPG

    My input data

    RapidMiner2.JPG

    My data looks like this

    RapidMiner3.JPG

    Any hint...?

    EDIT - Add RM Logfile

    Thanks!

    Thomas

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,362RM Data Scientist

    Hi,

    how many docs do you feed in? More than number of negative samples? Can you please try more?

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzer thomas_wiedmann
  • thomas_wiedmannthomas_wiedmann MemberPosts:60Guru

    Only this one example from the web page of point 1

    The data is provided in one flat file for each hotel with the following structure:

    4
    $302
    http://www.tripadvisor.com/ShowUserReviews-g60878-d100504-r22932337-Hotel_Monaco_Seattle_a_Kimpton_Hotel-Seattle_Washington.html

    selizabethm
    Wonderful time- even with the snow! What a great experience! From the goldfish in the room (which my daughter loved) to the fact that the valet parking staff who put on my chains on for me it was fabulous. The staff was attentive and went above and beyond to make our stay enjoyable. Oh, and about the parking: the charge is about what you would pay at any garage or lot- and I bet they wouldn't help you out in the snow!
    Dec 23, 2008
    -1
    -1
    5
    4
    5
    5
    5
    5
    5
    -1
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,362RM Data Scientist

    Hi,

    you need to provide more data to be able to run word2vec. A single example won't work to train the model. I've tested it and it starts to work with 5 examples.

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzer thomas_wiedmann
  • thomas_wiedmannthomas_wiedmann MemberPosts:60Guru

    I copied the original record and changed something. Now the database is two *.dat files.

    After this, I get the same error...

    EDIT

    Ok, I build 5 data files and try again...

    Regards,

    Thomas

  • thomas_wiedmannthomas_wiedmann MemberPosts:60Guru

    Right, with five files, the process works. I continue testing now...

    Thanks!

    Thomas

    sgenzer
  • thomas_wiedmannthomas_wiedmann MemberPosts:60Guru

    @mschmitz


    You wrote:"how many docs do you feed in? More than number of negative samples? Can you please try more?"

    Now I try with five small (german) Textsample and the process failed like before. Is there any lower boundary or param?
    Please explain..

    Regards
    Thomas

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,362RM Data Scientist

    Hi,

    well the algorithm itself makes only sense if you have a lot of data. Honestly it would make some sense if i add an error for less than a thousand examples. Less is usally not yielding good results.

    I am not sure where the exact bounday of 5 comes from though.

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzer
  • khazankhazan MemberPosts:23

    Hello
    I've preprocessed and tokenize the data, but it has an error.
    look?
    Please please
    2.JPG
    Thankful

    2.JPG 0B
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,362RM Data Scientist

    Hi,

    Process Documents converts this into a bag of words. Please have a look on the example processes. It shows you how to do this with loop collection.

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzer
  • khazankhazan MemberPosts:23

    I want to use word2vec to analyze emotions
    I downloaded the tutorial from the site
    Instead of entering my text file, I have a csv file and xlsx file
    As transmitted by my loop operator, I read the excel operator, but it has an error in the output
    Please provide guidance on how to use this sample for my csv file
    And that
    These addresses
    C: \ Users \ Martin \ Arbeit \ Tripadvisor

    ../results/Replacement Dictionary
    What is the sample on the site?
    Should I have a dictionary? How? From where?
    Thankful

    w1.JPGw2.JPGw3.JPGw4.JPG

    w1.JPG 0B
    w2.JPG 0B
    w3.JPG 0B
    w4.JPG 0B
  • khazankhazan MemberPosts:23

    hi

    please please helppp me...

  • khazankhazan MemberPosts:23

    I really need help

    please help me

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,362RM Data Scientist

    Hi,

    请阅读埃克塞尔,名义上的文本和数据Documents. That should create a collection which can be handled by Loop Collection.

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • khazankhazan MemberPosts:23

    I used this way, it has an error
    help

    ww1.JPGww2.JPG

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @khazanthat error means no data is being passed out of the Loop Collection. I would put a breakpoint in on the Loop Files operator and see if data is reaching the Loop Collection.

    sgenzer
  • khazankhazan MemberPosts:23

    I used but did not have any results
    Please, if there is an alternate operator for loop file, guide me.
    Please tell me how to use word2vec for twitter data? Thanks

  • khazankhazan MemberPosts:23

    I beg you to help me
    I need a lot to help with this.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    hello@khazanSome quick recommendations for you:
    • Post your XML process here in this thread (seehttps://youtu.be/KkgB5QXWXJ8and "Read Before Posting" on right when you reply)
    • Attach your dataset if possible (use a fictionalized version if there are privacy concerns)
    • Make sure you have all necessary extensions installed (seehttps://youtu.be/pjBqG3xtXx4)

    Scott

Sign InorRegisterto comment.