DT : different attribute weights with/without cross-validation
Hi,
I created 2 processes including a decision tree model from the "Golf" dataset.
1. First a classic DT model :
In this case, for the attribute weights, i get :
Here the process :
<运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">
2.A DT model with a cross validation :
In this case, for the attribute weights, i get :
Here the process :
<运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">
In both cases, as expected, the two DT models are strictly the same. Why the attributes
weights are not equals ?
NB :In case ofsplit validation, I retrieve the attribute weights of case 1.
Thanks you for your feedback,
Regards,
Lionel
Best Answer
-
Pavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:123RM Data Scientist
HiLionel,
交叉验证将建立k (k = n + 1模型umber of folds) and the attribute weights i.e outputted is for the last iteration. As you would know in each iteration the training set and testing sets will have different subsets of data. Hence the weight output of classic DT model (where entire data is consumed by the model at a time) and CV DT model are not same.
Also, it's always good to generate weights of the attributes using entire dataset (i.e classic model) rather than the subset of the data (i.e via cross-validation/split validation).
Cheers,
1
Answers
HiPavithra,
Thank you for this clear explanation. Now I understand better these differences of results.
So, in practice, i have to duplicate my model outside the cross-validation operator to generate
the "good weights".
Thanks you,
Best regards,
Lionel