"context/feature based opinion mining/sentiment analysis"

alexjohnpalexjohnp MemberPosts:1Contributor I
edited June 2019 inHelp
Hello everybody,
I'm pretty new to Rapidminer, and I'm stuck on the following problem.
I managed to build a simple sentiment classifier following the Pang's theory and the examples on the Internet (especially those on vancouverdata). Now i'd like to extend the concept by extracting the specific features (n-grams) and showing their sentiment score.
For example, let's have the following phrase: "the camera has a pretty good focus, but its flash lacks of speed". I have the two features focus (positive), and flash (negative).
Could you help me get through the pain?
Thank you in advance,

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    I'm sure there are many ways to look at this.
    If your mining is English examples separated by commas, then it's straightforward. You just split on the comma.
    Let's assume that you don't have that luxury, however I am going to assume that you have the posts all on the one subject.
    So for example:
    "thecamerahas apretty good focusbut itsflash lacks of speed"
    "TheCanon Sureshothas apretty good focus and flash, buttastes awful without ketchup."
    "I've alwaysliked the focuson myCanon, but really think thelightmeter is poor."

    I'd suggest the following approach (others may disagree):
    First I'd add an ID so you can split up the documents in many ways, but still combine them again later.
    • 1: build a list of N-Grams (4-5 max terms long seems about right)
    • 2.1: build a list of features of the subject (flash, focus, shutter, lens, etc).
    • 2.2: build a list of positive & negative terms for labelling. e.g postive: good,pretty,
    • 3.1: eliminate any N-Grams that contain more than one feature.
      (this is where I think my approach is wrong)
      do you remove "pretty good focus and flash" and just keep "pretty good focus"?
    • 3.2: eliminate any N-Grams that contain conflicting sentiment (e.g. keep "good focusbutbad flash", do not keep "good focus but bad"
      • 4: build a sentiment mining model from the N-Grams
      • 5: have a look on the most positive / least positive words in the N-Grams (that aren't features) and see if they should be added to the labelling in step 2.2
      After repeating this process a few times on the sample data it should be possible to join your N-Grams up with your list of features to show what the overall sentiment balance is for the individual
      e.g. focus 30 / 45 / 25 (positive, negative, neutral).

      I won't put together a sample process though as I think there are probably better ideas than mine on here.
  • puteri_prameswaputeri_prameswa MemberPosts:3Contributor I
    Dear Alexjohnp,

    I am using RapidMiner for my final thesis aboutfeature-based sentiment analysisand I face the same problem like you. However I would like to know if you already find ways to solve it.

    Could you explain it to me?

    Also thanks JEdward for sharing.

    Thank you so much.
Sign InorRegisterto comment.