Inconsistency of ROC curves
bernardo_pagnon
Member, University ProfessorPosts:60University Professor
Hello,
I generated a ROC curve for a logistic regression with a data set by using the performance operator, then clicking on criterion and AUC. Fine.
Then I used the same data set and use the Compare ROCs operator, picking logistic regression and decision tree as models. The ROC curves appear, and the ROC curve for the logistic regression is different from the one I obtained before! How can this be?
Best,
Bernardo
I generated a ROC curve for a logistic regression with a data set by using the performance operator, then clicking on criterion and AUC. Fine.
Then I used the same data set and use the Compare ROCs operator, picking logistic regression and decision tree as models. The ROC curves appear, and the ROC curve for the logistic regression is different from the one I obtained before! How can this be?
Best,
Bernardo
Tagged:
0
Best Answer
-
bernardo_pagnon Member, University ProfessorPosts:60University Professor
Answers
Can you share the process here? You can download it using FILE --> Export process and attach .rmp file here. Please also attach the data. I suspect change in some samples of test data. Are you using same type of validation for both compare ROC and regular model with performance metric and with a random seed? I will check and let you know if provided with details of the process and data.
If you cant share it here, you can send me a PM with requested files
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thanks for sharing your process files and data.
I used the complete datasheet in the excel file attached, I believe its the correct file. Now coming to the problem.
Case 1 Process: In the case-1 process, I can see that you are training and testing on the same data. This is is not correct as you need to test on data that is independent of training data. If you are purposefully doing this for your requirement then it's fine.
Case-2 Process: In case 2, you were using compare ROC operator. Based on the parameter settings as shown below, it uses 10 fold cross-validation that divided your dataset into 10 subsets and train on 9 subsets and test on 1 subset. This will happen until all subsets were tested and final performance is an aggregate of performance from all subsets.
This is the reason you are getting different ROC curves. As your test data are different and processes are different in both cases the results (AUC and ROC) are different.
I modified your case 1 to 10 fold cross-validationand now you can see in below image that the ROC curves of case 1 and case 2 are similar. The left side is for case 1 and the right side is for case 2. I attached the modifed process, you can open them in your rapidminer using FILE --> Import Process.
Modified Case 1 process image: Added 10 fold cross-validation with Local random seed in parameters. I also added local random seed for compare ROC operator in Case 2 process with roc bias set to neutral
Hope this helps. Please let us know if you need more information
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing