"Append-Operator in Testing Phase of X-Validation changes confusion mattrix"

MuhammadMuhammad MemberPosts:2Contributor I
edited June 2019 inHelp
Hi,

I am working on a classification problem where I have 3 classes [good (180), mediocre (4535), bad (183)]. (#number of examples in that class)

In my rapidminer process I only learn a model for "good" and "bad" and in the testing phase I want to modify the prediction depending on the confidence of my classifier. So I am filtering out all examples with low confidence and assign them to the "default class" "mediocre".
In order to do this reassignment I use a "filter example" operator together with a "replace" operator.

My problem is:
If I run my process without my reassignment step (i.e. filtering and replacing) I get the expected values for true good (180), true mediocre(4535) and true bad (183) in my confusion matrix. However, if I do the reassignment my confusion matrix yields unexpected values for true good, mediocre and bad.
Why is that happening?
My process as follows:



















































































<运营商激活= " true " class = " naive_bayes”有限公司mpatibility="5.3.015" expanded="true" height="76" name="Naive Bayes" width="90" x="179" y="30">









































<运营商激活= " true " class = "联盟”compatibility="5.3.015" expanded="true" height="76" name="Union" width="90" x="447" y="210"/>







































































Through a bit of debugging the operators I found out that if you just add an "Append" operator with only one input (the actual output of "apply model" nothing else) in the testing phase of X-Validation the confusion matrix yields wrong values for true .
In the above process I first used "Append" and then changed it to the "Union" operator, however I am still having the same problem.

Am I doing anything wrong?

Thanks in advance for your help!!!

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,381RM Data Scientist
    Hello Muhammad,

    I've created an example process with the iris data set where i learn on two classes and assign the "unsure" predictions (between 0.3 and 0.7) to the third














    A cross-validation evaluating a decision tree model.
































    <参数键= value =“replace_by预测(拉贝河l)"/>

    <运营商激活= " true "类= com“性能”patibility="5.0.000" expanded="true" height="76" name="Performance" width="90" x="581" y="30"/>























    This works for me quite well. I hope you can use this as a template


    Best,

    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • MuhammadMuhammad MemberPosts:2Contributor I
    Hi Martin,

    thanks for your reply. Could you please elaborate on your process, i.e. why is at necessary to rename the attributes which where generated by RapidMiner itself?

    Also, I tried to adopt your approach to my problem. However, I get same issue.

    I found out, that it somehow is related to the "Append" operator.

    I created an example using the Weighting data., If you look at this process, please:














    <运营商激活= " true " class = " naive_bayes”有限公司mpatibility="5.3.015" expanded="true" height="76" name="Naive Bayes" width="90" x="45" y="30"/>


































    You will see an "Append"-Operator in the Training-Phase which only has one input - hence it shouldn't do anything. However, if you compare the confusion matrix of the process with and without the "Append"-Operator you will notice a difference.
    The correct confusion matrix (in terms of the amount of true positives and true negatives ) is the one of the process without the "Append"-Operator. The other one yields a wrong number of total true positives and true negatives.

    Any idea why? Also, what do I need to do to use the Append-Operator on a data set with in total about 5000 data points?

    Thanks,
    Muhammad
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,381RM Data Scientist
    Hi,

    the Append operator is modifing the meta data.. Thus there are some changes - but i am currently not sure how it effects the performance operator

    Regarding my process:
    Generate attributes can not handle attributes with brackets, minus,plus or whitespaces, because they are interpreted as part of the formula, thus i needed to replace them.
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign InorRegisterto comment.