"Dictionary Based Sentiment Analysis"

Benedict_von_AhBenedict_von_Ah MemberPosts:8Contributor II
edited June 2019 inHelp

Hey guys,

i'm currently working on a dictionary based sentiment analysis from the Operator Toolbox. Everything works out fine so far, but in the end i cannot add a date as an attribute to my output. The model only allows "Text", "Score", "Positivity", "Negativity" and "uncovered token".

Is there a way to add the date from my dataset? So i want the output "Date", "Text", "Score",...

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist
    Solution Accepted

    Hi,

    i've sent you a version of the new operator via mail. I will add it to the marketplace toolbox if this is what you need.

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    sgenzer

Answers

  • Benedict_von_AhBenedict_von_Ah MemberPosts:8Contributor II

    Here is my process:






























































































































































    <参数键= Value =“值属性权重"/>

















































































































































  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    Your XML process is corrupted. Please open the XML view and copy the XML from there.

    sgenzer
  • Benedict_von_AhBenedict_von_Ah MemberPosts:8Contributor II

    how do i get to the xml view?

  • Benedict_von_AhBenedict_von_Ah MemberPosts:8Contributor II






























































































    <参数键= Value =“值属性权重"/>



















































































    <连接from_op = from_port = " outp“循环收集”ut 1" to_op="Apply Dictionary Based Sentiment" to_port="doc"/>
    <连接from_op = "应用基于字典的情绪" from_port="res" to_op="Write CSV" to_port="input"/>
    <连接from_op = "应用基于字典的情绪" from_port="doc" to_op="Process Documents" to_port="documents 1"/>









  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn
  • Benedict_von_AhBenedict_von_Ah MemberPosts:8Contributor II

    yea, thanks

    my code is just above your comment

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    Have you tried using a Set Role to set your Date value as an ID role in the first subprocess? That should flow through the process

  • Benedict_von_AhBenedict_von_Ah MemberPosts:8Contributor II

    hey thanks, i fixed the date problem with excel but got to a new problem.

    do you know a way how i can get to the number of covered/ recognised tokens for each of my text set?

    So the text of day 1 contains 50 tokens and with the dictionary model i get the score of i.e. +2, but i don't know if it is a single word with the weighting of +2 or if its a 100 words with a weighting of 0.02.

    That would be very important to know.

    My code is still the same as above

    Thanks

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist

    Hi,

    this is not yet implemented, but not too hard to do... I will check if i can to this tomorrow.

    Edit: As a work around: you can simply set all weights to -1 or +1 and run it a second time. Afterwards you just rename and join the results.

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    sgenzer
  • simon_kuehnesimon_kuehne MemberPosts:6Contributor I

    Hi Martin,

    could you also sent me the updated version?

    Best regards

    Simon

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist

    Dear@simon_kuehne

    it's on the marketplace. We've updated operator toolbox on Friday.

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    sgenzer
  • simon_kuehnesimon_kuehne MemberPosts:6Contributor I

    Hi Martin,

    thanks. I tried it but it didn't solve my problem.

    我需要一些额外的数据或者至少ID和not only "text" in the results after applying the dictionary based sentiment. The role of the id variable is set as "id".

    Thank you!

    Simon

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist

    hi,

    good point. You might be able to join back the original data by joining on the text attribute.

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • simon_kuehnesimon_kuehne MemberPosts:6Contributor I

    Unfortunately, this is not possible as the text is not a unique attribute. Many tweets are retweets and look the same after some "document processing" steps but differ in terms of metadata (,e.g. geo coordinates).

    Is there a workaround or anything?

    Best,

    Simon

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist

    Hi,

    i see the point. You could add a nominal counter right before the operator to make texts unique.

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • simon_kuehnesimon_kuehne MemberPosts:6Contributor I

    Hi,

    i solved it by using "select attributes" right before the text processing loop where I selected the text variable only. After applying the dictionary based sentiment analysis I also added "Generate ID". I did the same for the orginal output of the "select attributes" module and then joined the data using "ID".

    Thanks for your help.

    Best,
    Simon

    sgenzer
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,438RM Data Scientist

    Hi@simon_kuehne,

    nice workaround! if you would use the Merge operator of toolbox you would not need an ID:).

    How would you enhance the operator?

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    sgenzer
  • simon_kuehnesimon_kuehne MemberPosts:6Contributor I

    Hi Martin,

    i think the best way to enhance this operator would be to add some options allowing to add metadata and to choose which variable is the text variable used for the sentiment analysis.

    我有操作员的另一个问题:我是安娜lyzing tweets, there are lots of hashtags which contain relevant keywords e.g., #ILikeThatKeyword. The problem is that, I did not find a way to extract this hashtag into 4 tokens or to make the operator search for those matches also instad of word tokens only.

    Best,

    Simon

Sign InorRegisterto comment.