"[SOLVED] Stemming: Keep Information {original word, stem}"

UrselinhoUrselinho MemberPosts:10Contributor II
edited June 2019 inHelp
Hi there,
I'm currently doing some text processing using the different stemming operators. Right now I'm wondering if there is a way to keep/show the information which words are conflated to which stem. Without doing any adjustment the results of stemming (wordlist, example set) only contain the stems and the associated information like occurences.

What I primaliry need is something like {original word, stem}.

I'm sure there is a quite easy task, but as I'm not that familiar with RM yet I don't see it. Any idea how to do this?

Many thanks in advance,
Regards,
Urs

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi Urs,

    actually, the stemming operators dismiss the original tokens, such that it is not possible to see which stem results from which token. The only solution may be to compare the stemmed document with the original document token-wise in a rather complex process and write the mapping manually into an example set.

    Best, Marius
  • UrselinhoUrselinho MemberPosts:10Contributor II
    Hi Marius,
    that's quite unpleasent. But OK I do see the workaround. Thanks for your help.

    Best,
    Urs
  • UrselinhoUrselinho MemberPosts:10Contributor II
    Hi Marius,
    me once again. I really have to ask. Otherwise it will take me a long time to find the right operators/functions.

    How can I use the Stemming-Operator in a way that words are "replaced" within a given document rather than "conflated". Because right now if I, for example, do have a document with the words "Autos" and "Auto" the wordlist will only contain the stem "auto".

    Thanks in advance,
    Urs
Sign InorRegisterto comment.