Split a single xml file into several docs or example set
mohammadreza
MemberPosts:23Contributor I
Hi. I am new to RapidMiner text plugin.
I have an XML file consisting of elements. Each document tag contains one document as follows:
I have an XML file consisting of
I think I have to split them first and extract documents to be able to construct the word vector. Is there any way to do that?1 ............... 1 ...............
...
0
Answers
Dortmund, Germany
我想读取XML操作符是明智的选择,但是I need to do some text classification after that. That's why I wanted to work with documents through text plugin. Assuming that according to your explanation I use Read XML, is this any way to work with text plugin? I mean how should I connect the output of read XML to some operator like "Process Document" or any other operator to allow me do the tokenization, stemming and make word vector?
Thanks
Thanks in advance.
looks to me like a xpath can solve this.
Have you tried the import wizard?
Sadly i got no time to try it myself. But i guess it works
best
Martin
Dortmund, Germany
The wizard might get slow, because it caches the file at some point. But it still works
Dortmund, Germany
http://stackoverflow.com/questions/700213/xml-split-of-a-large-file/7823719#7823719