DT : different attribute weights with/without cross-validation

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
edited December 2018 inHelp

Hi,

I created 2 processes including a decision tree model from the "Golf" dataset.

1. First a classic DT model :

In this case, for the attribute weights, i get :

DT_weight_1.png

Here the process :







<运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">

















2.A DT model with a cross validation :

In this case, for the attribute weights, i get :

DT_weight_2.png

Here the process :







<运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">























































In both cases, as expected, the two DT models are strictly the same. Why the attributes

weights are not equals ?

NB :In case ofsplit validation, I retrieve the attribute weights of case 1.

Thanks you for your feedback,

Regards,

Lionel

Best Answer

  • Pavithra_RaoPavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:123RM Data Scientist
    Solution Accepted

    HiLionel,

    交叉验证将建立k (k = n + 1模型umber of folds) and the attribute weights i.e outputted is for the last iteration. As you would know in each iteration the training set and testing sets will have different subsets of data. Hence the weight output of classic DT model (where entire data is consumed by the model at a time) and CV DT model are not same.

    Also, it's always good to generate weights of the attributes using entire dataset (i.e classic model) rather than the subset of the data (i.e via cross-validation/split validation).

    Cheers,

    sgenzer

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    HiPavithra,

    Thank you for this clear explanation. Now I understand better these differences of results.

    So, in practice, i have to duplicate my model outside the cross-validation operator to generate

    the "good weights".

    Thanks you,

    Best regards,

    Lionel

    Pavithra_Rao
Sign InorRegisterto comment.