Sentiment analysis (positive/negative words) of txt-files with other dictionary

mikesolvay · February 2020

Hello

I am conducting some research that involves text mining of a few .txt-files I have stored on my computer. I have successfully managed to count the words and ngrams used in all txt.-documents, which was the first part of my work. Now, I would like to make a table with positive and negative connoted words from the same documents (resulting in, for example "overall, the documents include 55% positive words and 45% negative words). I also want to use a sentiment word list made by Loughran and McDonald (2018).

I was not able to successfully paste my XLM-code, so here is a screenshot of my process so far. In "Process Documents" I do tokenize, stopwords, transform cases and generate ngrams.

Image: https://us.v-cdn.net/6030995/uploads/editor/sj/1sl7dauzqyg9.jpg

I have little experience with RapidMiner, and I am eager to get a better understanding of it. Help is much appreciated.

MartinLiebig · March 2020

Hi@mikesolvay,

this is round about what you want:

<运营商激活= " true " class = "process" compatibility="9.6.000" expanded="true" name="Process">

<运营商激活= " true " class = " read_excel”薪酬atibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="112" y="493">

Adapt location please

<运营商激活= " true " class = "generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="493">

<运营商激活= " true " class = "rename" compatibility="9.6.000" expanded="true" height="82" name="Rename" width="90" x="380" y="493">

<运营商激活= " true " class = " read_excel”薪酬atibility="9.6.000" expanded="true" height="68" name="Read Excel (2)" width="90" x="112" y="646">

Adapt location please

<运营商激活= " true " class = "generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="246" y="646">

<运营商激活= " true " class = "rename" compatibility="9.6.000" expanded="true" height="82" name="Rename (2)" width="90" x="380" y="646">

<运营商激活= " true " class = "append" compatibility="9.6.000" expanded="true" height="103" name="Append" width="90" x="514" y="544">

<运营商激活= " true " class = "operator_toolbox:dictionary_sentiment_learner" compatibility="2.4.000-SNAPSHOT" expanded="true" height="82" name="Dictionary-Based Sentiment (Documents)" width="90" x="648" y="544">

<运营商激活= " true " class = "text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document" width="90" x="648" y="187">

<运营商激活= " true " class = "text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document (2)" width="90" x="648" y="289">

<运营商激活= " true " class = "collect" compatibility="9.6.000" expanded="true" height="103" name="Collect" width="90" x="782" y="187">

<运营商激活= " true " class = "loop_collection" compatibility="9.6.000" expanded="true" height="82" name="Loop Collection" width="90" x="916" y="187">

<运营商激活= " true " class = "text:tokenize" compatibility="8.2.000" expanded="true" height="68" name="Tokenize" width="90" x="380" y="34">

<运营商激活= " true " class = "text:transform_cases" compatibility="8.2.000" expanded="true" height="68" name="Transform Cases" width="90" x="581" y="34">

<运营商激活= " true " class = "operator_toolbox:apply_model_documents" compatibility="2.4.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="1117" y="391">

<描述一致= "中心”=“黄色”颜色的颜色="false" height="294" resized="true" width="599" x="26" y="454">This generates the dictionary as needed in the "Dict based Sentiment" operator
<描述一致= "中心”=“黄色”颜色的颜色="false" height="251" resized="true" width="518" x="559" y="119">This creates two test documents. It also does the preprocessing of it. Note that you need to tokenize your documents before applying it! This is done in "loop collection"

MartinLiebig · February 2020

Hi@mikesolvay,

the operator Dictionary Based Sentiment Analysis is what you search for. Actually 'Extract sentiment' bundles models like them for easier use.

Can you maybe eloberate what's the advantage of the other dictionary? And where i can find it? Maybe its easy just to add it to the operator.

Best,

Martin

mikesolvay · February 2020

Thank you for your reply,@mschmitz!

I had a look at the operator you mentioned, but I am confused by the parameters I have to set. How does the operator know what words are considered negative and positive just from entering numerial values for the parameters?

I am sorry for my lack of knowledge. As I said, my experience with RapidMiner is very limited so far.
As for my preferred dictionary, it is only because it is the basis of the methodology I am basing my research on. If it is troublesome to use a personal dictionary, I would just use a standard one from RapidMiner.

Image: https://us.v-cdn.net/6030995/uploads/editor/at/xmb7afav0j71.jpg

MartinLiebig · February 2020

Hi,

did you have a look at the tutorial processes in the help panel? Those should help.

Can you maybe post a link to the dictionary? That would allow me to create an example process for you on your dictionary.

Best,

Martin

mikesolvay · February 2020

I had a brief look at it, as well as some YouTube videoes but I still struggle a bit.

Link:https://sraf.nd.edu/textual-analysis/resources/#LM%20Sentiment%20Word%20Lists
You will find it here as an .xlsx-file.

Thank you so much for your help so far, and for taking the time.

sgenzer · February 2020

hi@mikesolvaywelcome to the community. I just "boosted" your rank so you can now post hyperlinks.

mikesolvay · March 2020

@mschmitzWould it be possible for you to share an example process of what I am trying to do? It would also be okay to try it with wordnet dictionary or something else. Thank you in advance!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Sentiment analysis (positive/negative words) of txt-files with other dictionary

Best Answer

Answers