Text mining in utf-8

i_anicka · November 2016

Hello all,

I need to use RapidMiner for text mining in Cyrilic.
I tried setting the encoding to utf-8. It gives me some results which are displayed in characters instead of cyrilic words.

Thanks,

i_anicka · November 2016

Hi guys,

I have solved my problem.

I had set the utf-8 encoding everywhere except on the process level.

I changed this and it works!

Thank you all for your replies.

Ana,

MartinLiebig · November 2016

Hi,

could you maybe post an example?
~Martin

JEdward · November 2016

It could be that your original document isn't in UTF-8, but in another encoding.

One way to be absolutely sure is to create a loop which changes the encoding parameter in your process documents using macros and to look at all the resulting outputs. The one that looks 'right'.

sgenzer · November 2016

agreed. Just did a quick check and there's no problem with Cyrillic in UTF-8.





<宏/ >

Scott

arunasethupathy · 2017年12月

I want to use Tamil language for text mining

Where you have change the UTF-8 option for this

I have tried in process level but unable to get

Plz anybody give the answer

arunasethupathy · 2017年12月

for changing the unicode option to UTF-8 ( for processing tamil language)

I have changed in the Rapidminer studio preference - encoding to UTF-8

I have simply read the document using ReadDocument operator in Text mining extension

But it is not working, the screen shot is attached ( doc7.docx)

Kindly help me to sort out this problem

Tahnk you

sgenzer · 2017年12月

Hello@arunasethupathy- so Tamil is not a language I have worked with before. Could you please post your XML process AND your text document (in Tamil) so I can take a look?

Thank you.

Scott

arunasethupathy · 2017年12月

Sir,

Kindly find the attached for the sample tamil text document

sgenzer · 2017年12月

thank you@arunasethupathy. Can you please also post your XML process?

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Text mining in utf-8

Best Answer

Answers