"Different results for X-Validation (libSVM) in version 4.6
Dear community,
I am upgrading from rapidminer version 4.6 to 5 and I'm having some difficulties that I hope maybe someone can help me with.
I am using a data set consisting of 40 example set rows with 73 attributes (72 numerical + 1 numerical label). If anyone wants to reproduce the steps, here is the data in Excel format:http://jump.fm/PFMGS.
In rapidminer 4.6 I start the wizard, open x-validation with svm, import my data, and start the process. The result is 100% accuracy. Here are some screenshots:http://img696.imageshack.us/img696/2939/rapidminer4results.png
I tried to reconstruct this in rapidminer 5:
- I imported the data into my repository and created a new process
- Since the imported data was marked nominal by rm, I use Nominal to Numerical converter for the complete dataset
- the output goes into X-Validation module (default parameters as in rm 4.6). from there ave-output goes to results
- in the Validation module it looks like this
-- in training module there is the libSVM module (C-SVC, rbf kernel, gamma=0, C=32, epsilon = 0.0010, same as in rm 4.6)
-- in testing module I use Apply Model and then Performance Module (same default values as in rm 4.6
executing the process results in 90% accuracy. Screenshots:http://img42.imageshack.us/img42/9720/rapidminer5results.png
Did I make a mistake? Thanks for your help.
Alex
I am upgrading from rapidminer version 4.6 to 5 and I'm having some difficulties that I hope maybe someone can help me with.
I am using a data set consisting of 40 example set rows with 73 attributes (72 numerical + 1 numerical label). If anyone wants to reproduce the steps, here is the data in Excel format:http://jump.fm/PFMGS.
In rapidminer 4.6 I start the wizard, open x-validation with svm, import my data, and start the process. The result is 100% accuracy. Here are some screenshots:http://img696.imageshack.us/img696/2939/rapidminer4results.png
I tried to reconstruct this in rapidminer 5:
- I imported the data into my repository and created a new process
- Since the imported data was marked nominal by rm, I use Nominal to Numerical converter for the complete dataset
- the output goes into X-Validation module (default parameters as in rm 4.6). from there ave-output goes to results
- in the Validation module it looks like this
-- in training module there is the libSVM module (C-SVC, rbf kernel, gamma=0, C=32, epsilon = 0.0010, same as in rm 4.6)
-- in testing module I use Apply Model and then Performance Module (same default values as in rm 4.6
executing the process results in 90% accuracy. Screenshots:http://img42.imageshack.us/img42/9720/rapidminer5results.png
Did I make a mistake? Thanks for your help.
Alex
Tagged:
0
Answers
did you ever try to set gamma != 0? As i understand correctly gamma=0 means, that it will be effectively set to 1 / num_attributes. I would recommend to set it fixed in both versions for comparable results (1/72). Also I recognized a difference in the random_seed parameter of the X-Validation operator which could affect the process.
I'm curious if this changes anything!
Just my two cents
Greetings, Harald
I used different values for gamma and played around with with random seed settings. Still the accuracy results from version 4.6 and 5 differ a lot using same input. Does anyone know why?
if you don't do an XValidation - just build one model: Does it differ? - that would implicate the learner (as opposed to the applier).
Stefan
thanks for your input. If I use the same random seed parameters on both versions, I should get the same results in my understanding. Anyway, the results differ not just slightly (100% in RM 4 vs 90% in RM 5).
If you import the xls and run the following you'll see what the problem is .... The operator "Nominal to Numerical" has replaced each attribute column with 0-39The fact that it still produces 90% satisfies our gullibility.
PS Rather ironically, if you replace the offending operator with a "Guess Types" operator all is well, like this....
Is there any way I can fix the "nominal to numerical" operator in rm5? Or any other workaround?
what exactly is the problem with the nominal to numerical operator? It's behavior is exactly as it was in 4.x if you don't change the default parameter settings. Please remember, that you had to include the nominal to numerical operator in 4.x in an AttributeSubetPreprocessing operator to restrict the attributes it was working on. You might now either use the equivalent Select Subset operator or simply use the built in filter.
Greetings,
Sebastian
thank you for your answer. I imported values from a csv file that looked like this. Unfortunately the real values were recognized as nominal so I wanted to use the nominal to numerical operator to mark them as numerical. But that operator simply converted the values to numerical 1, 2, 3 and so on. So I guess I just misunderstood the intention of the operator. I needed a 'real' converter.
My problem still remains. I cannot import the data as numerical, but at least I could figure out why. My data is in scientific notation (Matlab standard). A value with the exp != 000 is correctly imported as numerical (real), whereas a value with the exponent == 000 is imported as nominal.
so is correctly imported as numerical
and 是不正确的imported as nominal.
I would really appreciate if anyone has a solution for me. Again, RM4 correctly imports those values as numerical
Thanks!
please replace the nominal to numerical operator by the parse numbers operator. That will help you solve your problem.
Greetings,
Sebastian
thanks for your help. Unfortunately that did not solve the problem. The Parse Numbers operator still labels numbers like 2.3647619e+000 as nominal, but I want them to be numerical/real.
See screenshot:http://img684.imageshack.us/img684/7505/nominalnumericalproblem.png
Any idea how I can achieve that?
http://rapid-i.com/rapidforum/index.php/topic,1791.msg7012.html#msg7012
Using the solution so darkly hidden therein on this csv data..
2.6855298e-001,2.3647619e+000
2.3647619e+000,2.6855298e-001
I find that the numbers are read as reals by the following code...
thank you for your help. Your solution works partially... I'm getting weird behavior here:
In your example, the values are labeled as real in the results workspace (screenshot:http://img140.imageshack.us/img140/6470/88436391.png)
but I need to work with the values in the process. THERE the same values in that example are labeled nominal (sreenshot:http://img179.imageshack.us/img179/6517/18861165.png)
So in the process I cannot use the values as input for libSVM etc. I really don't understand this, maybe someone can explain/post a solution?
the reason is quite simple: everything is fine and this is just the way "Guess Types" behaves. It guesses the types but from the real data (which is not available in the meta data transformation) and not from the meta data. That means that the meta data cannot be correctly updated during process design. I would recommend to perform Haddocks process and store the data in the RM repository. There, you will easily see that the type is correct. Just use the data from the respository then and feed it into the learner and everything will be fine.
Alternatively, you could simply feed the data into the LibSVM after the transformation process. It wíll complain but you disable those complains in the preferences: simply activate "general.capabilities.warn". However, the best way is to use the repository here.
Cheers,
Ingo
An importing wizard like used in RM4 would make it a lot easier. Hope something like that will find its way into the new release. I'm very much looking forward to that
-Gagi
sure I could do that. But IMO rapidminer should have no problem reading scientific notation. I'll stick with the headache solution until there is an improved import utility in RM.
Thanks to everyone for helping me out with that one.