Question about stopword list and word stemming (german)

thomas_wiedmannthomas_wiedmann MemberPosts:60Guru
edited December 2018 inHelp

This is my first try to use Stopword Filter (german) and word stemming (german). I try to understand whats going on. I put some (german) Text inside. Result Input und Output looks nearly like the same. So I get some questions:

输入:

Dies ist ein Text mit einigen Worten und einem Punkt. Gestern bin ich gegangen, morgen werde ich gehen.

Output:

dies ist ein text mit einigen worten und einem punkt. gestern bin ich gegangen, morgen werde ich gehen.

a) Is there a list of which words are filtered by the stopword filter operator?

b) What is Stem (German) do?

RapidMiner.JPG

Process





<宏/ >



























谢谢!

Thomas

Tagged:

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    I believe you need to use these operators inside a larger "process documents" operator where you perform tokenizing first, so they have some discrete word tokens to operate on. Currently these operators are not doing anything because they are trying to operate on the entire document text at once, which is not possible for either stemming or stopword removal.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
    thomas_wiedmann
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn

    @Telcontar120beat me to it! Here's a sample process@thomas_wiedmann





    <宏/ >





























    sgenzer thomas_wiedmann
  • thomas_wiedmannthomas_wiedmann MemberPosts:60Guru

    Ok, I try this one...

    RapidMiner4.JPG

    Result:

    tex wor punk gegang

    Uuuh. True, but first have to meditate about this result... ;-)

    I had very much to learn...

    谢谢!

    Thomas

    sgenzer
标志InorRegisterto comment.