"How do I split the data into training, validation and testing subsets?"

好奇的好奇的 MemberPosts:12Newbie
edited June 2019 inHelp
How do I split the data into training, validation and testing subsets? (Not just training and testing)
AndyJ

Best Answer

Answers

  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    Hi@Curious

    As@mschmitzinformed you can split using split data operator. You can provide the ratio of splits like 0.7 for training, 0.1 for validation and 0.2 for testing. You can see the sample code. The order in which you give this ratio defines the order of outputs are well.

    < ?xml version = " 1.0 " encoding = " utf - 8 " ?> <过程版本sion="9.1.000">                                                

    Thanks,
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    lionelderkrikor sgenzer AndyJ
  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    2019年1月编辑
    Hi@Telcontar120

    I just want to clarify if there is any use of validation set when we apply cross validation? I get this question a lot in deep learning when i skip validation set in training because I apply cross validation most of the time. As the main use of validation set is not to overfit during training but I think cross validation reduces over fitting as well.

    Thanks,
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    sgenzer AndyJ
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM Founder
    2019年1月编辑
    Hi,
    If there is a cross validation as the most outer step including all preprocessing and modeling, then an additional validation set would indeed not be necessary. However, this is not always feasible - most often for runtime reasons, sometimes the complexity of the processes gets a bit out of control.
    In those cases, I would still keep some fraction of the original data (before I do anything to it!) as a validation set to make sure that I did not accidentally leak any information as part of my data processing.
    Hope this helps,
    Ingo
    varunm1 sgenzer AndyJ
  • 好奇的好奇的 MemberPosts:12Newbie
    Thank you so much everyone!
    sgenzer
Sign InorRegisterto comment.