"Customized X-fold cross-validation"

_paul__paul_ MemberPosts:14Contributor II
edited May 2019 inHelp
Hi,

I want to perform an X-fold cross-validation which however does not
operate on sets that are defined by RapidMiner's XValidation "sampling_type"
parameter but on sets which are constructed using a "marker" in the
examples provided by an ExampleSource operator.

To be more accurate, my input examples (pairs of feature vectors and
labels) used for classification contain an attribute that defines the
application this particular example was extracted from. Let's say the
examples come from three applications "A", "B", and "C" and each
example contains an attribute holding one of the three characters.

在此基础上,我想执行三倍cross-validation where in
a first run, examples from "A" are excluded and tested on examples from
"B" and "C" ...

Is there an operator for that in RapidMiner?

Regards,
Paul

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Paul,
    there is a special variant of the XValidation called BatchXValidation, where it uses an attribute with the special role batch to define the splitting sets. I post a process below, making use of this operator.






























    Greetings,
    Sebastian
  • _paul__paul_ MemberPosts:14Contributor II
    Hi Sebastian,

    sorry for the late answer. ;-)

    I didn't really get the idea of your model. How does the BatchXValidation operator
    work? I assume that it relies on the operator ChangeAttributeRole (also on
    AttributeSubsetProcessing ?),但不清楚to me how the operators
    communicate.

    Let's say I've this example set:

    att1;att2;att3;label
    1; 2; A; YES
    2; 2; A; NO
    3; 4; B; YES
    1,4; C; NO
    2,4; C; NO
    4,4; C; YES

    and I would like to have a 3-fold cross-validation where in each
    run of the validation I want to exclude the examples belonging
    to the class (A,B,C) specified by attribute "att3".

    Thus, the cross-validation would look something like:
    1. step: Exclude examples from class A, learn model for examples
    from class B and C, and apply this model to examples from class A
    2. step: Exclude B, learn for A and C, apply to B
    3. step: Exlcude C, learn for A and B, apply to C

    How can I model this type of validation?
    And is there a way to figure out within the BatchXValidation operator
    which examples are currently excluded (like att3=A in 1. step)?

    Thank you.

    Regards,
    Paul
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, MemberPosts:2,531Unicorn
    Hi Paul,
    the BatchXValidation does not divide examples of the same batch over folds. Instead the batches are always completedly swapped into one fold.
    So, if you define your attribute att3 as the batch attribute and set the number of validations of the BatchXValidation on the numbers of different values in att3, this should do the trick.
    In the first round the first fold is removed, containing all As and learning will be carried out on the remaining folds. And so on...

    I hope this clarifies it?

    Greetings,
    Sebastian
  • _paul__paul_ MemberPosts:14Contributor II
    Hi Sebastian,

    yes, the x-validation is now clear. Thank you.

    Best,
    Paul
Sign InorRegisterto comment.