Attribute weight help

页 MemberPosts:9Contributor II
edited November 2018 inHelp
Hi,
I'm working with decision trees, I’m trying to understand the factors must important that conducts to a sewer pipe failure, my records have attributes like diameter , length, etc. I have an attribute whit the number of failures in that pipe, I think this information can be use like a weight. Because the data is unbalanced most of the examples have a value 0. If I use “set rule” to set this attribute as weight, the tree becomes completely trained and useless, so I have to get out this attribute to get results. So my question is, is there a way to use this information without over training the tree.

Thanks for your attention,
Paulo Praça
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Have you considered to put the role of that attribute to weight?

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • 页 MemberPosts:9Contributor II
    yes I put, but the tree get overtraining, when I did that the tree only have two leaves. The failure range frequency varies between 1 and 6, maximum i have 6 failures in one pipe but for almost every one I have only one failure. IF I put this attribute as ‘weight’ he shadows the others attributes.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    I think we have a different understanding of overtraining. I guess your tree simply gets worse by this.

    Have you tried to change the minimal gain?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    如果你离散场和使用它as the label?

    For example:
    Faults
    0
    1
    2-3
    4-5
    6+

    Or even more simply Faults: Low, Medium, High.
Sign InorRegisterto comment.