List of words that are filtered with Stopwords, Stemming and Tokenizing?

Jonas97Jonas97 MemberPosts:2Newbie
Hello,

is there a function in Rapid Miner that I can use to create a list of words or the number of words, which the Process Steps Filter Stopwords, Stemming and Tokenizing has identiefied and excluded from the analyse of the Textcorpus?

Thank you in advance!

Jonas

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
    I am not sure if there is a direct way to view this, but you could accomplish this if you first run your document through and just tokenize, then run it through a 2nd time and tokenize as well as the other text processing options you want (stopwords, stemming, etc.) and then take both resulting wordlist datasets and use Set Minus (join type) to get the non-matches.
    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    Jonas97
Sign InorRegisterto comment.