Optimise Parameters

k_vishnu772k_vishnu772 MemberPosts:34Contributor I
edited November 2018 inHelp

Hi All,

I am new to rapid miner ,i am using optimise parameter to get the best parameters for my gradient boosting trees on maximum depth and no of trees and i got the maximim depth as 2 and num of trees as 220, i am wondering how would i know if my model is overfitting .

Can i trust the result of my optimise parameters would take care of over fitting also?

Tagged:

Best Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,761Unicorn
    Solution Accepted

    @Telcontar120@k_vishnu772If I remember how GBT works is that it creates mutliple trees like Random Forest and then averages them together (some trees may overfit and some trees may underfit). This is done to miminize overfitting. In conjuction with Cross Validation I think the probability of overfitting is greatly reduced.

    Additionally, you're using a max depth of 2 so that really generalizes the tree too.

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:578Unicorn
    Solution Accepted

    One way you can also help to check for overfitting is to have a separate test dataset & use this to validate your model predictions.

    This should not be part of the modelling or optimization stages. You can sample this test set also & use averages to estimate model accuracy, but it's not strictly necessary... I just liked adding an extra loop in this example.

    With a significance test between training and testing performance you can then see how much the models differ.

    Is your optimized model performing significantly better than your test data performance? If so, maybe your model is overfitting and will degrade faster in real-world usage.

    Have a play with the example below. Note, you do need more data to have a viable hold-out test set so it might not be practical for every use case.











    <描述一致="center" color="transparent" colored="false" width="126">More data means less chance of overfitting. <br/>Especially if having a separate holdout. Try using a smaller value &amp; watch the results.







    <运营商激活= " true " class = "并发:optimize_parameters_grid" compatibility="8.2.000" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="313" y="34">


































    <描述一致="center" color="transparent" colored="false" width="126">Only a 5-fold because I want it to finish quickly.





















    <描述一致="center" color="transparent" colored="false" width="126">Random Subsets of test data. Random Seed is based on iteration.
















    <描述一致="center" color="transparent" colored="false" width="126">There are much better ways to do this loop, but I haven't had enough caffeine yet.



    <描述一致="center" color="transparent" colored="false" width="126">A significant difference between training &amp; test data indicates the model might be overfitted.









    <连接from_op = " t " from_port = " 1性能" to_port="result 2"/>









    sgenzer Thomas_Ott

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:1,635Unicorn

    Parameter optimization is definitely not a preventative measure against overfitting. In fact, arguably it may be more likely to find an overfit model, depending on the complexity of the algorithm you are using and the number of parameters being tuned. The best solution against overfitting is the robust and thorough practice of cross-validation, as detailed in many blog posts and articles on the RapidMiner website and the community posts. Cross-validation is essential to avoiding overfitting.

    Brian T.
    Lindon Ventures
    Data Science Consulting from Certified RapidMiner Experts
  • k_vishnu772k_vishnu772 MemberPosts:34Contributor I

    I did use the cross validation inside the opbitmise parameter to get the best set of parameter so ,in this case i am out of over fitting ?

Sign InorRegisterto comment.