"Append-Operator in Testing Phase of X-Validation changes confusion mattrix"
Hi,
I am working on a classification problem where I have 3 classes [good (180), mediocre (4535), bad (183)]. (#number of examples in that class)
In my rapidminer process I only learn a model for "good" and "bad" and in the testing phase I want to modify the prediction depending on the confidence of my classifier. So I am filtering out all examples with low confidence and assign them to the "default class" "mediocre".
In order to do this reassignment I use a "filter example" operator together with a "replace" operator.
My problem is:
If I run my process without my reassignment step (i.e. filtering and replacing) I get the expected values for true good (180), true mediocre(4535) and true bad (183) in my confusion matrix. However, if I do the reassignment my confusion matrix yields unexpected values for true good, mediocre and bad.
Why is that happening?
My process as follows:.
In the above process I first used "Append" and then changed it to the "Union" operator, however I am still having the same problem.
Am I doing anything wrong?
Thanks in advance for your help!!!
I am working on a classification problem where I have 3 classes [good (180), mediocre (4535), bad (183)]. (#number of examples in that class)
In my rapidminer process I only learn a model for "good" and "bad" and in the testing phase I want to modify the prediction depending on the confidence of my classifier. So I am filtering out all examples with low confidence and assign them to the "default class" "mediocre".
In order to do this reassignment I use a "filter example" operator together with a "replace" operator.
My problem is:
If I run my process without my reassignment step (i.e. filtering and replacing) I get the expected values for true good (180), true mediocre(4535) and true bad (183) in my confusion matrix. However, if I do the reassignment my confusion matrix yields unexpected values for true good, mediocre and bad.
Why is that happening?
My process as follows:
Through a bit of debugging the operators I found out that if you just add an "Append" operator with only one input (the actual output of "apply model" nothing else) in the testing phase of X-Validation the confusion matrix yields wrong values for true
<运营商激活= " true " class = " naive_bayes”有限公司mpatibility="5.3.015" expanded="true" height="76" name="Naive Bayes" width="90" x="179" y="30">
<运营商激活= " true " class = "联盟”compatibility="5.3.015" expanded="true" height="76" name="Union" width="90" x="447" y="210"/>
In the above process I first used "Append" and then changed it to the "Union" operator, however I am still having the same problem.
Am I doing anything wrong?
Thanks in advance for your help!!!
Tagged:
0
Answers
I've created an example process with the iris data set where i learn on two classes and assign the "unsure" predictions (between 0.3 and 0.7) to the third This works for me quite well. I hope you can use this as a template
Best,
Martin
Dortmund, Germany
thanks for your reply. Could you please elaborate on your process, i.e. why is at necessary to rename the attributes which where generated by RapidMiner itself?
Also, I tried to adopt your approach to my problem. However, I get same issue.
I found out, that it somehow is related to the "Append" operator.
I created an example using the Weighting data., If you look at this process, please: You will see an "Append"-Operator in the Training-Phase which only has one input - hence it shouldn't do anything. However, if you compare the confusion matrix of the process with and without the "Append"-Operator you will notice a difference.
The correct confusion matrix (in terms of the amount of true positives and true negatives ) is the one of the process without the "Append"-Operator. The other one yields a wrong number of total true positives and true negatives.
Any idea why? Also, what do I need to do to use the Append-Operator on a data set with in total about 5000 data points?
Thanks,
Muhammad
the Append operator is modifing the meta data.. Thus there are some changes - but i am currently not sure how it effects the performance operator
Regarding my process:
Generate attributes can not handle attributes with brackets, minus,plus or whitespaces, because they are interpreted as part of the formula, thus i needed to replace them.
Dortmund, Germany