Stem (dictionary) for greek language

slimik · 2016年12月

Hello to the community of rapidminer,

i'm trying to create a stemmer for greek language but i can't implement a more general rule for removing punctuations. For example i want words like "fishes","fished","fishing","fishery" to be reduced to "fish". Due to the wide range of punctuations in greek language is too dificult to map every possible punctuation with the origin of the word. So i tried a rule like this:

fish:fish.*

but it didn't work out. Is there any way to do that ?

thank you in advance

Thomas_Ott · 2016年12月

That should work, can you post your process?

slimik · 2016年12月



























































<运营商激活= " true " class = "文本:filter_by_length" compatibility="7.3.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="112" y="238">





<参数键= "文件" value="C:\Users\klimi\Desktop\Thesis\stemmer.txt"/>

<运营商激活= " true " class = "文本:filter_stopwords_dictionary" compatibility="7.3.000" expanded="true" height="82" name="Filter Stopwords (Dictionary)" width="90" x="514" y="238">
<参数键= "文件" value="C:\Users\klimi\Desktop\Thesis\gr_stopwords.txt"/>



<运营商激活= " true " class = "文本:filter_stopwords_english" compatibility="7.3.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="514" y="34"/>































<操作符= " true " class = " support_vector_m激活achine" compatibility="7.3.001" expanded="true" height="124" name="SVM (2)" width="90" x="179" y="34">































































































<运营商激活= " true " class = "文本:filter_by_length" compatibility="7.3.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="112" y="238">





<参数键= "文件" value="C:\Users\klimi\Desktop\Thesis\stemmer.txt"/>

<运营商激活= " true " class = "文本:filter_stopwords_dictionary" compatibility="7.3.000" expanded="true" height="82" name="Filter Stopwords (2)" width="90" x="514" y="187">
<参数键= "文件" value="C:\Users\klimi\Desktop\Thesis\gr_stopwords.txt"/>



<运营商激活= " true " class = "文本:filter_stopwords_english" compatibility="7.3.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="514" y="34"/>































< portSpacing端口= " source_input 1”间隔= " 0 " / >

after Process Documents from Data the WordList (Process Documents from Data) result window is empty so it can't continue to the Validation procedure because it hasn't any attribute. I've tried the same process with and without the stem (dictionary) and the problem is with the stemmer of greek words.

Thomas_Ott · 2016年12月

The way you have it should work. I wonder if there's a bug in those stemmer/stopword dictionary operators because they ask you enter the file path and name to the txt file.

Try it with an Open File operator attached to them and let's see if that works.
































<运营商激活= " true " class = "文本:filter_by_length" compatibility="7.3.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="313" y="34">




<参数键= "文件name" value="C:\Users\klimi\Desktop\Thesis\stemmer.txt"/>


<参数键= "文件" value="C:\Users\klimi\Desktop\Thesis\stemmer.txt"/>


<参数键= "文件name" value="C:\Users\klimi\Desktop\Thesis\gr_stopwords.txt"/>

<运营商激活= " true " class = "文本:filter_stopwords_dictionary" compatibility="7.3.000" expanded="true" height="82" name="Filter Stopwords (Dictionary)" width="90" x="648" y="289">
<参数键= "文件" value="C:\Users\klimi\Desktop\Thesis\gr_stopwords.txt"/>

<运营商激活= " true " class = "文本:filter_stopwords_english" compatibility="7.3.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="782" y="187"/>
























<操作符= " true " class = " support_vector_m激活achine" compatibility="7.3.001" expanded="true" height="124" name="SVM (2)" width="90" x="179" y="34">

















































<运营商激活= " true " class = "文本:filter_by_length" compatibility="7.3.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="112" y="238">




<参数键= "文件name" value="C:\Users\klimi\Desktop\Thesis\stemmer.txt"/>


<参数键= "文件" value="C:\Users\klimi\Desktop\Thesis\stemmer.txt"/>


<参数键= "文件name" value="C:\Users\klimi\Desktop\Thesis\gr_stopwords.txt"/>

<运营商激活= " true " class = "文本:filter_stopwords_dictionary" compatibility="7.3.000" expanded="true" height="82" name="Filter Stopwords (2)" width="90" x="648" y="391">
<参数键= "文件" value="C:\Users\klimi\Desktop\Thesis\gr_stopwords.txt"/>

<运营商激活= " true " class = "文本:filter_stopwords_english" compatibility="7.3.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="514" y="34"/>






























< portSpacing端口= " source_input 1”间隔= " 0 " / >

slimik · 2016年12月

with your solution the process bypasses the previous error that i mention. But still the stemmer doesn't work no matter what rule i give. Is there any way to implemet a python based stemmer as an rapidminer operator?

我ngoRM · 2016年12月

Should be possible but might require some work on your side of course. You need to install the Python extension from the Marketplace (https://marketplace.www.turtlecreekpls.com/UpdateServer/faces/product_details.xhtml?productId=rmx_python_scripting) and then implement the function yourself.

Cheers,

我ngo

slimik · 2016年12月

ok. Thank you i'll give it a try!

todimary · November 2017

Hello, any news with your work?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Stem (dictionary) for greek language

Answers