"Dissertation advice for sentiment analysis."
Hello,
I am doing my BSc dissertation where I have to integrate a sentiment analysis engine into a system that provides continuous feedback to lecturers from students. Now I’m planning it and I have some difficulties deciding how I should proceed, and I seek some advice since I’m using RapidMiner and I don’t want to change it because so far I’m loving it.
During my planning I came to these conclusion’s, but since I not really proficient with RapidMiner I need some advice before I start investing more time in it.
1) If I want to use Rapidminer it would be nice if I’m able to customize the algorithms:
-Rapidminer takes a set of documents that are classified as positive or negative
-It generates the rules from that documents so if I do not have a good training dataset I will get bad results with new documents
-Keeping in mind how the classifier should work:
1-Clean data (bad characters, etc.)
2-NLP for filtering some words (stopwords, stemming?, etc.) Here is where I don’t know how to start to customize Rapidminer, and this is the part that I’m not sure how to do it.
3-Extraction of classes and association rules that classify new classes as positive or negative
The consequence is that using RapidMiner I need to adjust the task 2 (custom stop-words, etc.) and if possible to add my own classification rules. That would be nice but I do not need nothing about grammars, etc. because RapidMiner automatically infers that if n documents contains "is" and "good" is classified as positive then the class of all documents containing "is" and "good" is automatically inferred and new documents will be classified as positive. If I change the training dataset these rules could change.
2) From the scratch, in this second case if I prefer to use LingPipe the steps are similar but I can manage in a low level the natural language processing:
1-Lexical analysis of data: stopwords, step
2-Syntactical analysis to identify categories such as verbs, nouns, etc. (english grammars that are available)
3-Entity extraction
4-I can have my own list of "positive" verbs and "positive" adjectives (that can be expanded using Wordnet) and create my own rules or use an external library such as Weka or Mahout but customizing, at the most, the input training dataset
I think that due to time restrictions and my experience this custom solution it is not suitable now.
Best regards.
I am doing my BSc dissertation where I have to integrate a sentiment analysis engine into a system that provides continuous feedback to lecturers from students. Now I’m planning it and I have some difficulties deciding how I should proceed, and I seek some advice since I’m using RapidMiner and I don’t want to change it because so far I’m loving it.
During my planning I came to these conclusion’s, but since I not really proficient with RapidMiner I need some advice before I start investing more time in it.
1) If I want to use Rapidminer it would be nice if I’m able to customize the algorithms:
-Rapidminer takes a set of documents that are classified as positive or negative
-It generates the rules from that documents so if I do not have a good training dataset I will get bad results with new documents
-Keeping in mind how the classifier should work:
1-Clean data (bad characters, etc.)
2-NLP for filtering some words (stopwords, stemming?, etc.) Here is where I don’t know how to start to customize Rapidminer, and this is the part that I’m not sure how to do it.
3-Extraction of classes and association rules that classify new classes as positive or negative
The consequence is that using RapidMiner I need to adjust the task 2 (custom stop-words, etc.) and if possible to add my own classification rules. That would be nice but I do not need nothing about grammars, etc. because RapidMiner automatically infers that if n documents contains "is" and "good" is classified as positive then the class of all documents containing "is" and "good" is automatically inferred and new documents will be classified as positive. If I change the training dataset these rules could change.
2) From the scratch, in this second case if I prefer to use LingPipe the steps are similar but I can manage in a low level the natural language processing:
1-Lexical analysis of data: stopwords, step
2-Syntactical analysis to identify categories such as verbs, nouns, etc. (english grammars that are available)
3-Entity extraction
4-I can have my own list of "positive" verbs and "positive" adjectives (that can be expanded using Wordnet) and create my own rules or use an external library such as Weka or Mahout but customizing, at the most, the input training dataset
I think that due to time restrictions and my experience this custom solution it is not suitable now.
Best regards.
Tagged:
0
Answers
if you have a training set, in most cases a learning algorithm will supersede custom rules, since it can also grab interactions between words. E.g. "good" is different from "not good", which would not be caught by custom rules.
To customize RapidMiner, you could create your own extension, provided that you know Java. Here is a whitepaper describing the basic steps:http://docs.rapid-i.com/files/howtoextend/How%20to%20Extend%20RapidMiner%205.pdf
You should be aware though, that also for extending and customizing RapidMiner you'll need some time to first get the grips on the guts and internals of RapidMiner, and then actually implement your operators.
Best regards,
Marius