operator for removal of emotions from twitter data
data:image/s3,"s3://crabby-images/e9e37/e9e376f86fc989f8be36462752cae2b4a4f55b06" alt="Arupriya_Sen"
data:image/s3,"s3://crabby-images/7371c/7371cabaeb0bab47310576cbbb2ad0922c241e63" alt=""
which operator am I supposed to use in order to remove emoticons or emotions from twitter data to conduct sentiment analysis? I use Rapidminer version 9.3.0
Tagged:
0
Best Answers
-
varunm1 Moderator, MemberPosts:1,207
Unicorn
Hello@Arupriya_Sen
I am not so sure, but I think tokenization removes these emoticons as they are represented in symbols with punctuations. Give it a try.
@kaymanor@sgenzerany suggestions here.
ThanksRegards,
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
5 -
sgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959
Community Manager
I would convert the text to UTF-8 (use Encode URL operator) like this:
Tweet:
Before encoding:
After encoding:
查找utf - 8的Unicode图表(如https://apps.timwhitlock.info/emoji/tables/unicode):
So 'face with tears of joy' emoji is %F0%9F%98%82 which makes sense as you see this from the encoded text:
and so on. So then it's just a matter of using Replace with %F0%9F%98%xx with the encoded text, then decode back. Something like this:
<?xml version = " 1.0 " encoding = " UTF -8"?>
Scott5