"[Solved] Choosing best parameters resulted by Cross validation"

njasajnjasaj MemberPosts:18Contributor II
edited June 2019 inHelp
Hi all,
I need to build a model by SVM. I have used grid search and cross validation (k=2 to 20) in order to find best parameters. The problem is it that when i log cross validation accuracy, there is a lot of parameter combination which has same accuracy and same confusion matrix but when I apply those parameters on test data set i get very different accuracies (from 90 to 60). In real world problems we have no acess to test data set, So how should i select the best combination?
Thanks.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi,

    what do you mean by "k"? The SVM does not have a "k" parameter. Please post your process setup - to see how to do that, please have a look at the post linked in my signature.

    Best, Marius
  • njasajnjasaj MemberPosts:18Contributor II
    Dear Marius,
    谢谢你的关注。k是折叠的cross validation.I choose the parameters which lead to _best correlation coefficient_ but unfortunately this combination of C and gamma doesn't have proper result on unseen data and there is about 20 difference between model application on correlation coefficient of training data and unseen data. If i choose another combination from log of parameter optimization (parameter with just a bit lower correlation coefficient or same correlation coefficient), the model will have much better performance on unseen data. How should i choose the best parameters form cross validation and parameter optimization? Is the parameters which lead to best performance should be selected or there is a rule for selection of best parameters?
    Thanks.














































































    < from_op = " Optimiz连接e Parameters (Grid)" from_port="performance" to_port="result 1"/>
    < from_op = " Optimiz连接e Parameters (Grid)" from_port="parameter" to_port="result 2"/>







    [ /code]
  • MariusHelfMariusHelf RapidMiner Certified Expert, MemberPosts:1,869Unicorn
    Hi,

    the default of 10 folds is usually a good choice.

    Optimizing the parameters C and gamma is correct, but you should set the range from 1e-6 to 1e3 (1*10^-6 - 1*10^3) on a logarithmic scale for both parameters.

    You chose to log the values of the Performance operator - that will give you only the performance of the last fold. Instead, you want to log the performance of the complete X-Validation, so change the Log operator to log the value "performance" of the Validation. Additionally you may want to log the deviation value, which will give the standard deviation of the performance over the 10 folds.
    The Validation provides 4 performance values: "performance" is the criterion which you selected as "main_criterion" in the Performance operator. "performance1,2,3" deliver the first 3 criteria (from top to bottom) activated in the performance operator. The deviation always refers to the main criterion.

    For the interpretation and choice of parameters, not always the parameters which lead to the highest performance are the best. You also should consider the deviation. If the deviation is high, then there is a big probabilty, that the performance on new data will differ significantly from the estimated performance. So if the second-best parameter combination has a much lower standard deviation, you should consider to use that one instead.

    Happy Mining!
    ~Marius
  • njasajnjasaj MemberPosts:18Contributor II
    Hi Marius,
    It was a complete and nice answer and helped me a lot, actually solved my problem.Thank you very much.
  • mafern76mafern76 MemberPosts:45Contributor II
    Hi,

    I'm reviving this because it came up in my search and I think it is relevant to my question.

    I have already been doing logging to select parameters based on high performance and low deviation, but what if you actually have high deviation-deviation? I mean, for example, you run 10 10-folds x validations and you get deviation values from 0.003 to 0.3. I came to this problem when looping 100 NNetworks with parameters obtained from a 0.003 deviation x-fold: AUCs ranging from 0.7 to 0.73.

    I did ran 5 folds for that parameter search instead of 10, would you attribute the issue to only that? Or are there also other common, overlooked factors?

    I'm thinking sample size, but assuming that is "OK", could it be possible to blame something else? Algorithm related maybe, regardless of parameters?

    Thanks for your insight and the parameters range suggestion for SVM!


  • mafern76mafern76 MemberPosts:45Contributor II
    Well I just tested how deviation was altered while changing number of folds, and I could observe less folds drastically reduced deviation.

    关于样本量:样本量少折叠= for training. Is it reasonable to rule out sample size as a problem, or could increasing testing size actually be part of the solution? Regarding this, I'm wondering, when I establish a minimum sample size, shouldn't that be multiplied by the number of folds my x-val has? In my particular case I'm using a very basic rule of thumb: number of attributes x 10 x 2 (2 for binominal label). I'm thinking about this in order to minimize deviation due to small test sample.

    As you can see I'm exploring, any help would be much appreciated, thanks.
Sign InorRegisterto comment.