Sentiment analysis (positive/negative words) of txt-files with other dictionary

mikesolvaymikesolvay MemberPosts:4Contributor I
Hello

I am conducting some research that involves text mining of a few .txt-files I have stored on my computer. I have successfully managed to count the words and ngrams used in all txt.-documents, which was the first part of my work. Now, I would like to make a table with positive and negative connoted words from the same documents (resulting in, for example "overall, the documents include 55% positive words and 45% negative words). I also want to use a sentiment word list made by Loughran and McDonald (2018).

I was not able to successfully paste my XLM-code, so here is a screenshot of my process so far. In "Process Documents" I do tokenize, stopwords, transform cases and generate ngrams.



I have little experience with RapidMiner, and I am eager to get a better understanding of it. Help is much appreciated.

Jasmine_

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    edited March 2020 Solution Accepted
    this is round about what you want:






    <运营商激活= " true " class = "process" compatibility="9.6.000" expanded="true" name="Process">







    <运营商激活= " true " class = " read_excel”薪酬atibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="112" y="493">

















    Adapt location please

    <运营商激活= " true " class = "generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="493">






    <运营商激活= " true " class = "rename" compatibility="9.6.000" expanded="true" height="82" name="Rename" width="90" x="380" y="493">




    <运营商激活= " true " class = " read_excel”薪酬atibility="9.6.000" expanded="true" height="68" name="Read Excel (2)" width="90" x="112" y="646">

















    Adapt location please

    <运营商激活= " true " class = "generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="246" y="646">






    <运营商激活= " true " class = "rename" compatibility="9.6.000" expanded="true" height="82" name="Rename (2)" width="90" x="380" y="646">




    <运营商激活= " true " class = "append" compatibility="9.6.000" expanded="true" height="103" name="Append" width="90" x="514" y="544">




    <运营商激活= " true " class = "operator_toolbox:dictionary_sentiment_learner" compatibility="2.4.000-SNAPSHOT" expanded="true" height="82" name="Dictionary-Based Sentiment (Documents)" width="90" x="648" y="544">






    <运营商激活= " true " class = "text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document" width="90" x="648" y="187">




    <运营商激活= " true " class = "text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document (2)" width="90" x="648" y="289">




    <运营商激活= " true " class = "collect" compatibility="9.6.000" expanded="true" height="103" name="Collect" width="90" x="782" y="187">


    <运营商激活= " true " class = "loop_collection" compatibility="9.6.000" expanded="true" height="82" name="Loop Collection" width="90" x="916" y="187">





    <运营商激活= " true " class = "text:tokenize" compatibility="8.2.000" expanded="true" height="68" name="Tokenize" width="90" x="380" y="34">





    <运营商激活= " true " class = "text:transform_cases" compatibility="8.2.000" expanded="true" height="68" name="Transform Cases" width="90" x="581" y="34">










    <运营商激活= " true " class = "operator_toolbox:apply_model_documents" compatibility="2.4.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="1117" y="391">


















    <描述一致= "中心”=“黄色”颜色的颜色="false" height="294" resized="true" width="599" x="26" y="454">This generates the dictionary as needed in the &quot;Dict based Sentiment&quot; operator
    <描述一致= "中心”=“黄色”颜色的颜色="false" height="251" resized="true" width="518" x="559" y="119">This creates two test documents. It also does the preprocessing of it. Note that you need to tokenize your documents before applying it! This is done in &quot;loop collection&quot;








    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    [Deleted User] sgenzer Jasmine_ mikesolvay

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    the operator Dictionary Based Sentiment Analysis is what you search for. Actually 'Extract sentiment' bundles models like them for easier use.

    Can you maybe eloberate what's the advantage of the other dictionary? And where i can find it? Maybe its easy just to add it to the operator.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    Jasmine_ yyhuang sgenzer
  • mikesolvaymikesolvay MemberPosts:4Contributor I
    edited February 2020
    Thank you for your reply,@mschmitz!

    I had a look at the operator you mentioned, but I am confused by the parameters I have to set. How does the operator know what words are considered negative and positive just from entering numerial values for the parameters?

    I am sorry for my lack of knowledge. As I said, my experience with RapidMiner is very limited so far.
    As for my preferred dictionary, it is only because it is the basis of the methodology I am basing my research on. If it is troublesome to use a personal dictionary, I would just use a standard one from RapidMiner.

    Jasmine_
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Hi,
    did you have a look at the tutorial processes in the help panel? Those should help.

    Can you maybe post a link to the dictionary? That would allow me to create an example process for you on your dictionary.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    Jasmine_
  • mikesolvaymikesolvay MemberPosts:4Contributor I
    edited February 2020
    I had a brief look at it, as well as some YouTube videoes but I still struggle a bit.

    Link:https://sraf.nd.edu/textual-analysis/resources/#LM%20Sentiment%20Word%20Lists
    You will find it here as an .xlsx-file.

    Thank you so much for your help so far, and for taking the time.
    Jasmine_
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
    hi@mikesolvaywelcome to the community. I just "boosted" your rank so you can now post hyperlinks.
    Jasmine_
  • mikesolvaymikesolvay MemberPosts:4Contributor I
    @mschmitzWould it be possible for you to share an example process of what I am trying to do? It would also be okay to try it with wordnet dictionary or something else. Thank you in advance!
    Jasmine_
Sign InorRegisterto comment.