Optimize grid with X-validation and Performance Costs with wrong optimum

Steffen_wSteffen_w MemberPosts:3Contributor I
edited November 2018 inHelp

Hi!

I'm trying to optimize a k-nearest neighbour inside a x-validation. The performance is measured by the performance cost operator and the x-validation delivers an average value for the missclassification costs.

When i put the whole process inside the optimize grid operator, the performance seem not to have any impact on the selection of optimal parameters. By logging every run of the process, i can identify much better average cost results. Can anyone give me a hint on what i'm doing wrong?

Thanks in advance

Steffen

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3,404RM Data Scientist
    Solution Accepted

    Dear Steffen,

    you need to set the class ordering. The behaviour without that is odd. I would have expected an error and not that it does something wrong. If you set the class ordering in Performance (Costs) it works. See attached process.

    Another comment: The minimum found by log and by optimize are different. The reason is because one is logging the macro the other the micro performance (weighted and unweighted average of the k-folds). On a bigger data set, this should not make a difference.

    ~Martin

    Spoiler







































































    <连接from_op = "优化Parameters (Grid)" from_port="performance" to_port="result 1"/>
    <连接from_op = "优化Parameters (Grid)" from_port="parameter" to_port="result 2"/>







    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:363RM Data Scientist

    Could you post your process here for us to investigate that? Thx

  • Steffen_wSteffen_w MemberPosts:3Contributor I
    I created a test-process on the golf-dataset with the same situation. In the log, the optimum costs are at 0.350, the optimizer shows 10.2






































































    <连接from_op = "优化Parameters (Grid)" from_port="performance" to_port="result 1"/>
    <连接from_op = "优化Parameters (Grid)" from_port="parameter" to_port="result 2"/>






  • Steffen_wSteffen_w MemberPosts:3Contributor I

    Dear Martin,

    thank you for your help! Now it's working!

    -Steffen

Sign InorRegisterto comment.