Cross validation
I have a question about the output of cross validation. If we take 90% for training and 10% for testing, then why the result shows the whole data and doesn't show just 10% of test part?
I'll be thankful if someone answers my question.
Yasmin
Best Answers
-
lionelderkrikor Moderator, RapidMiner Certified Analyst, MemberPosts:1,195Unicorn
Hi@Yasmin,
Legitimate question !
Here a possible element of answer :
In reality for a 10-fold cross validation, RapidMiner performs 11 iterations.
During the last iteration, RapidMiner applies the model to the whole training Dataset. So the length of the training set and the
length of the test set are the same.
Regards,
Lionel
NB : You can visualize this behaviour by setting a "Breakpoint After" on theApply Modeloperator (inside theCross Validationoperator)
1 -
tftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, MemberPosts:164RM Research
Hi@Yasmin,
As it is true that the Cross Validation operator builds the final model on the whole data set (and thus performs a 11th iteration of the Training subprocess, in case the model port is connected), the Test process is only performed 10 times. But that is also the reason you have all your input data at the test result port. For every iteration step 10% of your input data is used in the test set. So within the Cross Validation all Examples of your input data are used once for testing.
For the outer result port all test sets are appended together, so you have again your whole input data set. You can visualize this by adding a Generate Attribute operator in the Test subprocess of the Cross Validation and generate an attributeiterationwith the valueeval(%{a})(宏%{}包含的次数current operator was applied).Best regards,
Fabian1 -
Yasmin MemberPosts:5Contributor II0