"TextMining using LibSVMLearner -- does sort order of Excel input file matter?"

wotsiznamiz · 2009年3月

I am using the following code to text-mine a ~10,000 row Excel Record Set. The Excel file has three columns: (1) the label, (2) the text, and (3) the ID.

I have noticed something peculiar -- when I sort the Excel file differently, the model that is produced is dramatically different. For example, if I sort on the label column, RapidMiner produces much better results than if I sort on ID. Should I always be sorting on the label column? I would have thought that RapidMiner would produce the same results on inputs sorted in any manner. Is this a bug? Can I rely on my results after seeing this behavior?

<参数键=“resultfile”值= " C: \ RapidMiner \ NPS_PaymentStatus\Result_file.res"/>

<参数键= "的例子_set_file" value="C:\RapidMiner\NPS_PaymentStatus\EXAMPLE_SET_FILE.dat"/>

<参数键= "的例子_set_file" value="C:\RapidMiner\NPS_PaymentStatus\EXAMPLE_SET_FILE_MODEL.dat"/>

IngoRM · 2009年3月

Hi,

your process in general looks good to me (at least from viewing at the XML code alone

)

I would have thought that RapidMiner would produce the same results on inputs sorted in any manner.

Not necessarily. This completely depends on the learning scheme. However, with a 2-fold cross validation alone you can probably not really take any definite statement about the performance of the models. If the dramatic change in prediction performance still is true for a 10 times 10-fold cross validation I would be more worried

Cheers,
Ingo

wotsiznamiz · April 2009

THX!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"TextMining using LibSVMLearner -- does sort order of Excel input file matter?"

Answers