I have problem removing url and hashtags in the data(from excel)

fangirl96fangirl96 MemberPosts:2Contributor I
edited December 2018 inHelp
I’m having a problem in removing url and hashtags in the data(from excel). I have inputted data(tweets) using 3 read excel then append them. After that, I connected the append operator to replace then inputted regex for url and hashtags in parameters named regular expression and replace what. Then, I connected it to data to document then process documents where I have Transform cases, Tokenize and Filter Stopwords(dictionary) respectively. The results were tokenized and the stopwords I created were removed. But the one with hashtags, only the # symbol is removed. For example, original text is #vscocam the result is vscocam while the url it is not removed. It was just tokenized too.
Tagged:

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    hello@fangirl96- welcome to the community. I think I understand and believe you just need to adjust your regex. Can you give some examples and the process you're using (see instructions "Read Before Posting" on the right).


    Scott

  • fangirl96fangirl96 MemberPosts:2Contributor I

    This is the full xml of my process.

















































    <关键= " replace_d列表ictionary">
    @[a-zA-Z]*"value=" "/>



    <运营商激活= " true " class = "文本:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="112" y="136"/>

































    The links are not removed but the hashtags were removed.

    PS. The links included in my data is starting with https

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    thank you@fangirl96- can you share one of those excel sheets as well?

    Scott

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @fangirl96take a look at my tutorial process here:http://www.neuralmarkettrends.com/blog/entry/use-rapidminer-discover-twitter-content

    I extract hashtags and drop https: to a generic word called 'link'

    sgenzer
Sign InorRegisterto comment.