Show prevalence of largest class in Performance (Classification) and similar operators
When doing classification tasks, I normally use the prevalence (frequency) of the largest (modal) class as the naïve benchmark against which to compare if a single model is useful or not. For example, if my label is binary yes and no, with yes comprising 9% of the dataset and no comprising 91%, then I would expect the accuracy of a model to be at least 91%. If not, the model is no better than naively assigning all predictions to the larger class. The same logic applies for multiple categories (e.g. three or four classes for prediction). For example, if there were three classes A, B and C distributed 30%, 40% and 30%, then the prevalence of the largest class (B) would be 40%.
My request is that the Performance (Classification) and Performance (Binominal Classification) operators would add this as an option for criteria that they output.I am not sure, but I think the formal name for this measure is "prevalence of largest class" (c.f.https://en.wikipedia.org/wiki/Prevalenceandhttps://en.wikipedia.org/wiki/Confusion_matrix#Table_of_confusion。Because the calculation is so simple, I hope it would be easy to implement. Yet having this handy as an output option would be more convenient than pulling out a calculator each time, which is what I have to do now.
Tagged:
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,355RM Data ScientistHi Chitu,it is of course possible to add new performance measures to the operators. I can of course open a ticket for this feature request, but please do not expect this to be solved in the next weeks. As you know RapidMiner has release schedules and it is not likely this will be of top priority for us.
我也问这个问题:这是一个性能measure? Isn't this a constant value for each data set? Don't you want to have something like accuracy-prevalence or so? So how many percentage points are you above the prevalence?
In any case, you can easily use custom operators to build yourself your own operator calculating prevalence [without any coding]
Best,Martin
- Head of Data Science Services at RapidMiner -
Dortmund, Germany5
Answers
The other thing I am frequently doing is to calculate the accuracy/ROI of a default model. The default model maybe the 'naive' prediction of predicting the majority class. Have a look at the Default Model operator for it.
Dortmund, Germany
Dortmund, Germany
<参数键= " momentum_stable“value="0.0"/>
Dortmund, Germany
Dortmund, Germany