"MetaCost vs Performance(Costs) operator"

Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn
edited May 2019 inHelp

我想知道是否有任何差异the implementation of the loss optimization function of the MetaCost operator vs the Performance(Costs) operator. I would not expect there to be. However, I am also seeing significant differences in outcomes when comparing a single DT learner using the Performance(Costs) operator with a cost matrix vs using the MetaCost operator with 1 iteration with an inner DT using the same cost matrix. There are wide divergences not only in the cost outcome but also other performance metrics such as accuracy and AUC, as well as the resulting models. See the attached example process:














<参数键= value =“use_local_random_seed真的"/>




<参数键= value =“use_local_random_seed真的"/>







































<参数键= value =“use_local_random_seed真的"/>


















<连接from_op = "应用模型(2)”from_port="labelled data" to_op="Performance (Cost DT)" to_port="example set"/>


































@mschmitzany ideas on the underlying algorithms that would be relevant here, or other reasons these might be so different?

Brian T.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist

    Hi@Telcontar120,

    there is a severe difference. Performance Cost is "just" a performance measure. MetaCost is an ensemble learner which is i think tuning itself to work better on the cost metric.

    From the docu:

    The MetaCost operator makes its base classifier cost-sensitive by using the cost matrix specified in the cost matrix parameter. The method used by this operator is similar to the MetaCost method described by Pedro Domingos (1999).

    The code for it is available here:https://github.com/rapidminer/rapidminer-studio/blob/master/src/main/java/com/rapidminer/operator/learner/meta/MetaCost.java

    Btw,@hhomburgis the author:)

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    Yes, I understand that is the case and I know the difference between an ensemble and a base learner :-).

    However, if you set the iterations of MetaCost to 1, then it should be using only one version of the inner learner, which in the example process I supplied is a DT with the same parameters as the second model which uses the same DT learner and the same cost matrix via Performance(Costs). In that case, why would the results be so different?

    @hhomburgany ideas here?

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    @mschmitz@hhomburg@sgenzer@IngoAny ideas about this one? I'm still puzzling over why the differences are so great when the iterations for MetaCost = 1. Thanks for taking a look at it!

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
Sign InorRegisterto comment.