Text mining in utf-8
Hello all,
I need to use RapidMiner for text mining in Cyrilic.
I tried setting the encoding to utf-8. It gives me some results which are displayed in characters instead of cyrilic words.
Thanks,
Tagged:
0
Best Answer
-
i_anicka MemberPosts:2Contributor I
Answers
Hi,
could you maybe post an example?
~Martin
Dortmund, Germany
It could be that your original document isn't in UTF-8, but in another encoding.
One way to be absolutely sure is to create a loop which changes the encoding parameter in your process documents using macros and to look at all the resulting outputs. The one that looks 'right'.
agreed. Just did a quick check and there's no problem with Cyrillic in UTF-8.
Scott
I want to use Tamil language for text mining
Where you have change the UTF-8 option for this
I have tried in process level but unable to get
Plz anybody give the answer
for changing the unicode option to UTF-8 ( for processing tamil language)
I have changed in the Rapidminer studio preference - encoding to UTF-8
I have simply read the document using ReadDocument operator in Text mining extension
But it is not working, the screen shot is attached ( doc7.docx)
Kindly help me to sort out this problem
Tahnk you
Hello@arunasethupathy- so Tamil is not a language I have worked with before. Could you please post your XML process AND your text document (in Tamil) so I can take a look?
Thank you.
Scott
Sir,
Kindly find the attached for the sample tamil text document
thank you@arunasethupathy. Can you please also post your XML process?
Scott