Sentiment analysis (positive/negative words) of txt-files with other dictionary
mikesolvay
MemberPosts:4Contributor I
inHelp
Hello
I am conducting some research that involves text mining of a few .txt-files I have stored on my computer. I have successfully managed to count the words and ngrams used in all txt.-documents, which was the first part of my work. Now, I would like to make a table with positive and negative connoted words from the same documents (resulting in, for example "overall, the documents include 55% positive words and 45% negative words). I also want to use a sentiment word list made by Loughran and McDonald (2018).
I was not able to successfully paste my XLM-code, so here is a screenshot of my process so far. In "Process Documents" I do tokenize, stopwords, transform cases and generate ngrams.
I have little experience with RapidMiner, and I am eager to get a better understanding of it. Help is much appreciated.
I am conducting some research that involves text mining of a few .txt-files I have stored on my computer. I have successfully managed to count the words and ngrams used in all txt.-documents, which was the first part of my work. Now, I would like to make a table with positive and negative connoted words from the same documents (resulting in, for example "overall, the documents include 55% positive words and 45% negative words). I also want to use a sentiment word list made by Loughran and McDonald (2018).
I was not able to successfully paste my XLM-code, so here is a screenshot of my process so far. In "Process Documents" I do tokenize, stopwords, transform cases and generate ngrams.
I have little experience with RapidMiner, and I am eager to get a better understanding of it. Help is much appreciated.
1
Best Answer
-
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data ScientistHi@mikesolvay,this is round about what you want:
<运营商激活= " true " class = "process" compatibility="9.6.000" expanded="true" name="Process">
<运营商激活= " true " class = " read_excel”薪酬atibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="112" y="493">Adapt location please
<运营商激活= " true " class = "generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="493">
<运营商激活= " true " class = "rename" compatibility="9.6.000" expanded="true" height="82" name="Rename" width="90" x="380" y="493">
<运营商激活= " true " class = " read_excel”薪酬atibility="9.6.000" expanded="true" height="68" name="Read Excel (2)" width="90" x="112" y="646">Adapt location please
<运营商激活= " true " class = "generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="246" y="646">
<运营商激活= " true " class = "rename" compatibility="9.6.000" expanded="true" height="82" name="Rename (2)" width="90" x="380" y="646">
<运营商激活= " true " class = "append" compatibility="9.6.000" expanded="true" height="103" name="Append" width="90" x="514" y="544">
<运营商激活= " true " class = "operator_toolbox:dictionary_sentiment_learner" compatibility="2.4.000-SNAPSHOT" expanded="true" height="82" name="Dictionary-Based Sentiment (Documents)" width="90" x="648" y="544">
<运营商激活= " true " class = "text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document" width="90" x="648" y="187">
<运营商激活= " true " class = "text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document (2)" width="90" x="648" y="289">
<运营商激活= " true " class = "collect" compatibility="9.6.000" expanded="true" height="103" name="Collect" width="90" x="782" y="187">
<运营商激活= " true " class = "loop_collection" compatibility="9.6.000" expanded="true" height="82" name="Loop Collection" width="90" x="916" y="187">
<运营商激活= " true " class = "text:tokenize" compatibility="8.2.000" expanded="true" height="68" name="Tokenize" width="90" x="380" y="34">
<运营商激活= " true " class = "text:transform_cases" compatibility="8.2.000" expanded="true" height="68" name="Transform Cases" width="90" x="581" y="34">
<运营商激活= " true " class = "operator_toolbox:apply_model_documents" compatibility="2.4.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="1117" y="391">
<描述一致= "中心”=“黄色”颜色的颜色="false" height="294" resized="true" width="599" x="26" y="454">This generates the dictionary as needed in the "Dict based Sentiment" operator
<描述一致= "中心”=“黄色”颜色的颜色="false" height="251" resized="true" width="518" x="559" y="119">This creates two test documents. It also does the preprocessing of it. Note that you need to tokenize your documents before applying it! This is done in "loop collection"
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany4
Answers
Dortmund, Germany
I had a look at the operator you mentioned, but I am confused by the parameters I have to set. How does the operator know what words are considered negative and positive just from entering numerial values for the parameters?
I am sorry for my lack of knowledge. As I said, my experience with RapidMiner is very limited so far.
As for my preferred dictionary, it is only because it is the basis of the methodology I am basing my research on. If it is troublesome to use a personal dictionary, I would just use a standard one from RapidMiner.
Dortmund, Germany
Link:https://sraf.nd.edu/textual-analysis/resources/#LM%20Sentiment%20Word%20Lists
You will find it here as an .xlsx-file.
Thank you so much for your help so far, and for taking the time.