non-binomial target label column in decision tree to measure accuracy
how i want to measure the accuracy of my model if my target label column is not a binomial attributes? it is not in (yes/no) type. but it consists of crime types such as burglary, robbery, fraud, assault etc.
Tagged:
0
Best Answers
-
lionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195UnicornHi@koknayaya,
There is nothing to do (apart put aPerformance (Classification)operator in your process). The accuracy is calculated although your target attribute has N values in RapidMiner.
In this case, the dimensions of your confusion matrix are N x N.
For example, here a confusion matrix with a target attribute with 3 values :
I hope it helps,
Regards,
Lionel7 -
IngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM FounderHi,This cannot be said in general and depends on your business problem and how you solve things today. Think about predicting the outcome of a coin flip. If you make random guesses, your accuracy would be 50%. If by using machine learning you can predict the outcome with 51% accuracy, this would be sufficient. Why? It does not sound like a good model with only 51% accuracy? Wrong! Because you can now start betting against people without this model (who only have 50% accuracy) and will become rich over time
It looks like you have multiple classes, let's say 5 for the sake of the argument. If the classes would be equally distributed, a random guess would lead to 20% accuracy or 80% error rate. Getting 62% accuracy (or 38% error rate) might be a fantastic result already - you just have been cutting your error rate down by 50%! Or not. Again, without understanding the business problem you want to solve this is impossible to say.If, however, you have your 5 classes and one of the classes is the correct class in 62% of all the cases, then a model with 62% is not very impressive in any case since always predicting that class (and never anything else) would lead do 62% accuracy already.You see there is no easy answer to this and only your or the owner of the business problem can decide if that is good or not. But comparing the value to the distribution of the class is at least a first step to determine if the model learned anything at all or not.Hope this helps,
Ingo
1 -
IngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder再一次,没有人能够告诉你如果这是good or not. Only you can decide (or whoever the business owner is).A couple of observations though: the classes RAPE and KIDNAPPING are (fortunately!) very rare events. There is only one "kidnapping" case in the whole test set and I would assume it is extremely rare in the training set as well. It is very unlikely that any model will ever be able to pick up this pattern if it is that rare. I would consider removing the class altogether.Although the class "rape" is more frequent, the problem here is similar and you again may decide to remove the class from the predictions altogether. If you do that, you would end up with only four classes ROBBERY, VEHICLE (something), BURGLARY, and DANGEROUS DRUGS. There is less chance that models are confused if the tiny classes are removed although it will likely not move the needle a lot. Anyway, every little bit may help.
Now I would try a couple of different model types (starting with Auto Model first) and see where this gets you. You can then try to improve the performance of the best model(s) further with additional parameter optimization, feature engineering, or ensemble learning by opening the processes generated by Auto Model as a starting point for those optimizations.Finally, out of the roughly 9,000 examples in your test set above, about 4,000 have the true class DANGEROUS DRUGS. So always predicting this class is already delivering roughly 44% accuracy. A model with 62% is already much better than that obviously, but, again, if it is good enough depends on the underlying problem and its owners and is not a data science question per se. Also keep in mind that some prediction errors may be more costly than others. So accuracy is not the only thing which may be of importance here.
Welcome to the world of data science - this is where the fun begins nowBest,
Ingo
1
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts