Entropy and Gain for Decision Tree more than 1

kinglaplacekinglaplace MemberPosts:3Contributor I
edited December 2018 inHelp

Hi,

I am a newbie in data mining. I am interested to implement decision tree to predict my case. My case has 9 output prediction. When I try to calculate manually, entropy and gain value more than 1. How to solve it?Then, where can I see the entropy and gain result in rapidminer, so I can compare with manual calculation?

Thank you.

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager

    hello@kinglaplace- welcome to the community. Can you please post your XML process (see "Read before Posting on right when you reply)? And have you looked at the videos on decision tree modeling (see"Creating a Decision Tree Model" here)?

    Scott

  • kinglaplacekinglaplace MemberPosts:3Contributor I

    Thank you for your help. Here are I send the data train. How to choose the best model for my data?

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@kinglaplace,

    To choose the best model for your data, I recommend you the toolAutomatic model selection and optimization

    Pavithra_Rao).

    This tool help to choose the best model (the model which has the best performances) between several optimized models.

    我执行这个工具与您的数据基准3models (decision tree, Random Forest, Gradient Boosted Tree).

    It seems that Gradient Boosted Tree is the best : Accuracy = correct predictions /total predictions = 89.60% (mean), but it is very close of the performance of the Decision Tree.

    注:你必须考虑我的其他性能trics like recall, precision too.

    Here the process :







    <运营商激活= " true " class = "过程”兼容ibility="8.0.001" expanded="true" name="Process">










    <参数键= " 0 " value = " 1. true.real.attribut气体e"/>




















    <操作符= " true " class = " optimize_paramet激活ers_grid" compatibility="8.0.001" expanded="true" height="145" name="Optimize Parameters (Grid)" width="90" x="581" y="136">








































    Cross validation subprocess to to build learner model and validate it's performance













    <操作符= " true " class = " optimize_paramet激活ers_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters DT" width="90" x="112" y="34">



















































    Optimize the parameters of the model and performance parameters





    Picks up the optimized parameters and applies a set of parameters to the specified operators


















    <操作符= " true " class = " optimize_paramet激活ers_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters RF" width="90" x="179" y="34">


















































































    <操作符= " true " class = " optimize_paramet激活ers_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters GBT" width="90" x="179" y="34">




































































    Subprocess to Optimize number of models and its performance












    Actomatically picks the process which produces the optimized model













    This process automatically picks the optimized model out of the number of models built inside Select subprocess operator<br/>The outer optimize operator, optimizes on the Select subprocess parameter to pick a process(insideselect subprocess operator) which has optimized model results for the given input data



    Now you can experiment by yourself with other models and/or other optimization settings of the actual models.

    Regards,

    Lionel

    sgenzer
  • kinglaplacekinglaplace MemberPosts:3Contributor I

    Thank you for your information. For decision tree, I've tried to implement by manually calculate for entropy and gain. But the value are more then 1. I always get maximum value for both maximum 1 in every references. How to get entropy and gain display in rapid miner?So I can compare with manual result that have been calculated. Then, I also always got in a lot of example of tree decision for two condition. But in my case there are 8 output condition. Is tree decision can be implemented in more than two output condition?

    Thank you.

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn

    Hi@kinglaplace,

    It seems to me that RapidMiner did not display the entropy and the gain in the results. There is the "cross-entropy" which is calculed byPerformance (Classification)operator, but it is a measure of the performance of the model and different from what you are looking for, in my opinion.

    Decision tree can of course be implemented in case of 8 output conditions.

    Regards,

    Lionel

Sign InorRegisterto comment.