Classification model problem
happy_neid
MemberPosts:10Contributor I
Hello everyone.
I made a classification model using the decison tree. But when i apply it, it gives me the same prediction with the same confidence level, for every example, like in this picture i posted. Can anyone tell which mistake could cause thisto happen? Thank you.
Tagged:
0
Answers
Hi,
can you have alook at your tree? Could it be that it simply does not split? What happens if you deactivate pruning and prepruning in the tree?
~Martin
Dortmund, Germany
I can't tell from your snapshot but does your label column contain all missing values?
Thank you for your answer. There are no missing values.
Thank you for your answer. I will try that.
Here you can see how my process looks like, and decesion tree as well. I'm trying to do text mining but i am a beginer, so i don't know too much about it.
First i tried to run a process without connecting wordlist from the first Process Docs from Data operator to the second one, and then i've got an error message that says- atributes dont match. And then i connected those two, so now i have a problem that made me come hereProcess without connectionProcess with connectiontree descriptionTree graph
根据什么you show, your Decision Tree doesn't split. Wrap that DT into a Cross Validation operator and measure hte performance. My guess is that it'll classify the majority of your "0" class incorrectly.
That's true. But can you tell me why does that happen, is there a way to fix it?
Thank you very much.
Hi,
just google for decision tree and pruning. Your tree got simply too much pruned. Most likely you need to reduce the min_gain to 0.001.
~Martin
Dortmund, Germany
Thank you very much!
Regards,
Nada
Hi, me again.
I tried to set min. gain to 0.01, it still won't split. Also i tried to turn off pruning and prepruning, but it still won't work. I have no idea what can do about that?
Regards,
Nada
Now i see that i misunderstood you. Does my Unlabeled dataset should or should not contain label column with empty cell?
Your tree sounds like it is failing to find any attributes that provide a meaningful split to separate the labels. Did you try any of the other criterion (information gain, gini index) and also the confidence parameter?
You might want to see whether your attributes have any predictive relationship with your label. Try a simpler approach like some of the "weight" operators first, like weight by information gain or weight by gini index. That will show you whether you have attributes that can separate your classes at all. You can also run a simple Naive Bayes model and look at the output, which will show the class distributions. If they are not distinct, then your decision tree is not going to find anything to use for a split.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts