Process help: Extract ID wise prediction performance after Cross Validation

varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
edited October 2019 inHelp
Hello,

I currently have multiple observation predictions for each subject from a cross-validation method (Binary Classification). I am trying to extract subject wise prediction performances from the predictions made by CV. To do this, I am counting the number of prediction labels per subject based on the ID and then create attributes that have a number of predictions for label 1 and the number of predictions for label 2. Then the prediction per subject is assigned based on a threshold of 0.5, for example, if more than 50 percent of subject 1 samples are labeled as label 1, then that subject will be assigned label 1. Similarly for all the subjects based on the set threshold. Once I get the subject wise predictions, I try to calculate the performance using the performance operator.

Issue: Everything works well when I have predictions for both labels, but when I have only a single label predicted for all subjects (less accurate algorithm) based on a threshold, my process fails as my process design to calculate performance involved both classes. I am missing logic to bypass this issue and create an attribute with zero values for the other label for all subjects.

I attached repository files in this thread, you can run the process to check this error. Any help would be much appreciated.

@mschmitz @lionelderkrikor @yyhuang @kayman

Regards,
Varun
https://www.varunmandalapu.com/

Be Safe. Follow precautions and Maintain Social Distancing

Best Answer

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Solution Accepted
    Varun,

    Bug fixed !!!

    An error was raising because in yourGenerates attributesyou try to calculate the最终的预测n according to other attributes
    but in the cases of the 2 datasets mentionned above , these attributes have missing values, so the calculus is impossible and RapidMiner is raising an error.
    So I added aReplace Missing Valuesoperator (with replacement value = 0) before theGenerate Attributesoperator.

    Here the working process in attached file.

    如果这个过程是某个我requ回答irements...

    Regards,

    Lionel


    varunm1

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Hi,
    maybe you can use 'handle Exception' to handle the case of one class not present?
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    varunm1
  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    edited October 2019
    Hello Martin,

    I am looking for a way to add the column with zero values incase that label column is not present. Any pointers on how check if the column is present or not and generate attribute based on that? If the column is present it should bypass that, if not it should be created with zero values. My mind is stuck, may be I am missing something simple here.

    If you observe in the below image, the left part is when an error occurs as my following operator is expecting two columns. The right side of the image is when my process works fine. So as informed earlier I should be able to check if the second column is present and create one column with zero values if it is not present.


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Hi Varun,

    Maybe you can try to use theBranch(IF/ELSE statement) operator with :
    -condition type = attribute_available

    Hope this helps,

    Regards,

    Lionel
  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    Thanks@lionelderkrikorI think this is the one I am missing. I will try and inform if there is an issue
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Varun,

    Here a release of your process which works...using 3Branchoperators to handle all the cases.
    But now, I face to an unknown bug.
    After investigations, it seems linked to the 2 following datasets :
    -DT_Test_Data_Multi_Class_10
    - SVM_Test_Data_Multi_Class_10

    (because when there are only the 3 others datasets in the repository, eveything works all right..)

    I will continue to investigate...

    The process in attached file.

    Regards,

    Lionel
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    Varun,

    !!!!!!!!!!!! Erratum !!!!!!!!!!!!!!!!!!!!:

    2数据ets which are raising an errors are :

    -DT_Test_Data_Multi_Class_10
    - RF_Test_Data_Multi_Class_10


  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    edited October 2019
    Hello@lionelderkrikor

    Thanks a lot for spending time on this. It works and I am building a nested branch operator instead of connecting them in series like you did. First time playing with this operator.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    lionelderkrikor
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
    You're welcome Varun !

    Regards,

    Lionel
    varunm1
Sign InorRegisterto comment.