Analyze 77000 tweets

vittorio_confuovittorio_confuo MemberPosts:4Contributor I
edited December 2018 inHelp

Dear community,

我必须处理77000条的数据集the following attributes: post_id, username, hash_tag, sent_time, text, user_id, source, is_retweet, is_reply, lang, retweet_count, reply_count, latitude, longitude. I must do an analysis using association rules and clustering but I'm new on RM and I hope someone can give me advice on how to proceed.

My first problem is the free license: I can read only 10000 lines. Do operators exist that generate a significant sample?

Second problem: what kind of association rules can I use? I'm thinking of "manual" sentiment analysis ( I have seen that there is Aylien extension but it has limitation and it doesn't work with italian language): is there a way to find the most important words in the tweet in order to do a positive/negative classification?

Can you suggest me some association rules and/or clustering algorithms that I could use? How could I interpret them?

I apologize for all these questions and I would be very greatful if someone wants is kind enough to help me!

Regards,

Vittorio Confuorto

Tagged:

Answers

  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn

    Hi Vittorio,

    regarding the license I would suggest to request a demo. It may be a temporary solution, but it's the best to get you started and see if the platform brings value to you. You may also be able to apply for an educational license.

    //www.turtlecreekpls.com/contact-sales-request-demo/

    //www.turtlecreekpls.com/educational-program/

    Regarding the analysis, I also work with Twitter and I it's a very special case of text analysis. I think that clustering won't give you the results you want, because most of the words are just garbage and vary a lot from tweet to tweet. My suggestion would be to train a sentiment model using another dataset, and then apply it to the tweets. You have to somehow get your hands on labeled sentiment data in Italian.

    Regards,

    Sebastian

    vittorio_confuo
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@vittorio_confuo,

    As mentionned Aylien has limitation, and does not support Italian.

    So I propose to use a Python script using the "textblob" library.

    This script translate the tweet from italian to english and then extract the sentiment (negative, neutral, positive) :

    -1 < sentiment < -0.1 ==> negative

    -0.1 < sentiment< 0.1 ==> neutral

    0.1< sentiment < 1 ==> positive

    Spelling_Correction_5.png

    The process :











    <列出关键= "function_descriptions"/>
    <列出关键= "numeric_series_configuration"/>
    <列出关键= "date_series_configuration"/>
    <列出关键= "date_series_configuration (interval)"/>



    <列出关键= "macros">







    <列出关键= "function_descriptions">













    To execute this proces, you have to :

    - install python

    - install textblob (pip install textblob)

    - set your text attribute in the Set Macros parameters.

    I hope it helps

    Regards,

    Lionel

    Thomas_Ott SGolbert vittorio_confuo
  • vittorio_confuovittorio_confuo MemberPosts:4Contributor I

    Hi@SGolbert非常感谢你. It is a good idea but actually I haven't so much time for getting by hands a dataset. If I do it (in the next week) I will post the result maybe it can be useful to someone.

    For the association rule I have thought to discretize the sent_time in order to find the most important topic for each time step (for example every hour).

    For the size of the dataset do you know if the operator "Sample stritified" is a good one?

    Thank you for your help,

    Vittorio Confuorto

  • vittorio_confuovittorio_confuo MemberPosts:4Contributor I

    Hi@lionelderkrikor, sorry but I've just seen your answer.

    I have some problem with textblob installation. Can you tell me how to do it?

    Thank you for your time,

    Vittorio Confuorto

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @lionelderkrikorthank you for that awesome python script! You just gave me so many ideas for application here! I have to work with this textblob library more!

    sgenzer
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@vittorio_confuo,

    "I have some problem with textblob installation"

    In order I can help you, can you be more precise ?

    Regards,

    Lionel

    sgenzer vittorio_confuo
  • vittorio_confuovittorio_confuo MemberPosts:4Contributor I

    Hi@lionelderkrikor,

    I solved this problem but I currently have another one.

    The process return me the following error:

    Pic.jpeg

    Do you know how I can solve it?

    You are very kind

    Regards,

    Vittorio Confuorto

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@vittorio_confuo,

    Can you share your dataset and your process, so that I can reproduce the bug.

    Regards,

    Lionel

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@Thomas_Ott,

    You're welcome,

    Happy sentiment analysis !

    Regards,

    Lionel

  • student_computestudent_compute MemberPosts:73Contributor II
    Hello
    Can the analysis of feelings be based on the aspect?
    Thankful
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    Hi@student_compute, I think the Textblob library can't do aspect based sentiment analysis, maybe@lionelderkrikorcan confirm?

    I'm pretty sure the NLTK python library CAN do that, you'd just have to build it into RapidMiner.

    DocMusher
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi all,

    yes, I confirm that the Python's library "textblob" can't do aspect based sentiment analysis.

    Regards,

    Lionel

  • student_computestudent_compute MemberPosts:73Contributor II

    Hello
    Thanks for the help of dear friends
    For a NLTK sample on the rapidminer on Twitter data?
    Sorry
    Thanks a lot

  • SGolbertSGolbert RapidMiner Certified Analyst, MemberPosts:344Unicorn

    Hi,

    thanks for the hint of textblob!

    I wanted also to mention the Stanford CoreNLP library:

    https://stanfordnlp.github.io/CoreNLP/index.html

    It's available both as web service and as Java library. This is surely your best option for productive use, as long as it has the required functionalities. It is a great candidate for adding functionalities to the text processing extension!

    Edit: I have seen that it does not support Italian and the .jar file is 500 mb. This limits its use as addon to the RapidMiner .jar, at least without modularization

    Best regards,

    Sebastian

  • student_computestudent_compute MemberPosts:73Contributor II

    你好。非常感谢你
    I went to the site and downloaded the english file. I copied to the RapidMiner plugins. What should I do now? So I can use it to analyze aspect-based emotions?
    Thank you
    have a nice day

  • jozeftomas_2020jozeftomas_2020 MemberPosts:40

    你好。
    Mr. @ lionelderkrikor
    I run your code but it has an error.
    what's wrong?
    Thanks a lot

    py sa.JPG

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@jozeftomas_2020,

    Here the process works fine :

    Spelling_Correction_6.png

    Some questions :

    - Have you sucessfully install TextBlob ?

    - Do you execute Python 2.x or Python 3.x

    - Have you modified the dataset in the parameters of theCreate ExampleSetoperator ?

    你能送我回你的过程我为了试试to reproduce your bug ?

    Regards,

    Lionel

  • jozeftomas_2020jozeftomas_2020 MemberPosts:40

    Hello
    My process is exactly your code
    Yes I have python and textbolb installed.
    But I do not know why it does not run.:smileymad:
    Could you check this data? Sorry sorry
    https://community.www.turtlecreekpls.com/t5/RapidMiner-Studio-Forum/How-to-correct-the-wrong-words/td-p/51027/page/3
    This is the same Twitter data
    非常感谢你:heart:
    Have a great day

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@jozeftomas_2020,

    OK, I think I understand :

    The process inthisthread is able to study the sentiment of a text which is not in English.

    Your entry dataset is already in english, so the Python script raises an error when it try to translate it in English.

    If you want study sentiment of english text (in your case tweets in english), use this process :











    <列出关键= "function_descriptions"/>
    <列出关键= "numeric_series_configuration"/>
    <列出关键= "date_series_configuration"/>
    <列出关键= "date_series_configuration (interval)"/>



    <列出关键= "macros">







    <列出关键= "function_descriptions">













    I hope it helps,

    Regards,

    Lionel

  • jozeftomas_2020jozeftomas_2020 MemberPosts:40

    Hello
    Thank you so much for the time you spend
    And your valuable answer ..:heart:
    I will check

Sign InorRegisterto comment.