Help with correct understanding results of classification
Hi, I have such table with results of classifications:
I have 4 algorithms. Classification was made for 16 different training sets:
- all => all 15 predictors were used
- 1-15 => each set contains 14 predictors and in each set one different type of predictor was removed
Example of set is in attachment.
Type of excluded predictor | column name in csv
1 - characters_number
2 - sentences_number
3 - words_number
4 - average_sentence_length
5 - average_sentence_words_number
6 - ratio_unique_words
7 - average_word_length
8 - ratio_word_length_[1-16]
9 - ratio_special_characters
10 - ratio_numbers
11 - ratio_punctuation_characters
12 - most_used_word_[1-4]
13 - ratio_letter_[a-z]
14 - ratio_questions;
15 - ratio_exclamations;
I have to samehow conclude why results for 1-15 for each algorithm and each set are better/worse than results in column "ALL".
But I don't have any idea why. I know that in most cases, when difference between column ALL and column [1-15] is very small (like < 1%) it is just a luck and randomness. But in cases when difference is higher, probably it is caused by something.
The most important thing - I don't know why for k-NN algorithm results are the same for columns 9-15...
好将知道,为什么朴素贝叶斯best (54%) and k-NN is a bad algorithm for this task (20%).
Can someone help me with that?
Tagged:
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949UnicornHi!
Some partial answers:
k-NN might maily learn from a single or few attributes only if you don't normalize the data. (It's comparing values of different attributes directly, so an attribute with a high scale (like 1000) will dominate attributes on small scales (like nominal attributes encoded as 0 or 1).) If this attribute is still in your data, the result will stay more or less the same.
Naive Bayes is frequently a good algorithm without tuning. On the other hand, there's not a lot to tune, so it's seldom the best.
If you try different pruning settings in the decision tree, you might even get a better result. You can use a building block to do it:
https://community.www.turtlecreekpls.com/discussion/33910/optimize-decision-tree-and-optimize-svm
Regards,
Balázs11
Answers
There is a concept in machine learning known as the interaction effect. When you analyze your predictors/features, it is not just the independent features that have an impact on algorithm learning but also due to the interaction effect. For example, let's think there are two features A and B, Now if you run your machine learning model on only A and only B then you might get average performance. If you run the algorithm on A & B in combination, you might get a good result or a really worse result, this means that A and B acted independently in a different way compared to the both A and B when combined given to an algorithm.
This is one reason to check your features using feature selection methods like forward selection or backward elimination. You can also use automatic feature engineering to check this. In your method you tried adding one feature after the other, but what if feature 3 and feature 6 works better in combination than having 1,2,3,4,5,6. This is one important reason we use feature selection. The interaction effect plays major role in traditional algorithms.
Also, did you tune the hyperparameters of these algorithms? For example, in KNN, how did you choose K value? There is an elbow technique that can be used to determine good K-Value. As@BalazsBaranymentioned, it is also important to check the hyperparameters of decision trees, like criterion, pruning (pre and post).
KNN is a lazy algorithm and depends on the K-value. If your labels or data cannot be separated in feature space, KNN misclassifies a lot. Also, you need to check what is the best value for K.
Hope this helps.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
https://stats.stackexchange.com/questions/287425/why-do-you-need-to-scale-data-in-knn
From my experience, there won't be much difference (normalization) in the decision tree as they calculate the impurity index for each attribute and branch down.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing