Wrapper Split Validation

Synopsis

A simple validation method to check the performance of a feature weighting or selection wrapper.

Description

This operator evaluates the performance of feature weighting algorithms including feature selection. The first inner operator is the weighting algorithm to be evaluated itself. It must return an attribute weights vector which is applied on the data. Then a new model is created using the second inner operator and a performance is retrieved using the third inner operator. This performance vector serves as a performance indicator for the actual algorithm. This implementation is described for theRandomSplitValidationChain.

Input

example set in

This input port expects an ExampleSet. Subsets of this ExampleSet will be used as training and testing data sets.

Output

performance vector out

The Model Evaluation subprocess must return a Performance Vector in each iteration. This is usually generated by applying the model and measuring its performance. Please note that the statistical performance calculated by this estimation scheme is only an estimate (instead of an exact calculation) of the performance which would be achieved with the model built on the complete delivered data set.

attribute weights out

The Attribute Weighting subprocess must return an attribute weights vector in each iteration. Please note that the attribute weights vector built on the complete input ExampleSet is delivered from this port.

Parameters

Split ratio

Relative size of the training set.

Sampling type

The Wrapper Split Validation operator can use several types of sampling for building the subsets. Following options are available:

linear_sampling: The linear sampling simply divides the ExampleSet into partitions without changing the order of the examples i.e. subsets with consecutive examples are created.
shuffled_samplingt:重组构建随机抽样的子集he ExampleSet. Examples are chosen randomly for making subsets.
stratified_sampling: The stratified sampling builds random subsets and ensures that the class distribution in the subsets is the same as in the whole ExampleSet. For example, in the case of a binominal classification, stratified sampling builds random subsets such that each subset contains roughly the same proportions of the two values of classlabels.
automatic: The automated mode uses stratified sampling per default. If it isn't applicable, e.g., if the ExampleSet doesn't contain a nominal label, shuffled sampling will be used instead.

Use local random seed

This parameter indicates if alocal random seedshould be used for randomizing examples of a subset. Using the same value of thelocal random seedwill produce the same subsets. Changing the value of this parameter changes the way examples are randomized, thus subsets will have a different set of examples. This parameter is available only if shuffled, stratified or automatic sampling is selected. It is not available for linear sampling because it requires no randomization, examples are selected in sequence.

Local random seed

This parameter specifies thelocal random seed. This parameter is available only if theuse local random seedparameter is set to true.