Skip to main content

Forward Selection

Synopsis

This operator selects the most relevant attributes of the given ExampleSet through a highly efficient implementation of the forward selection scheme.

Description

The Forward Selection operator is a nested operator i.e. it has a subprocess. The subprocess of the Forward Selection operator must always return a performance vector. For more information regarding subprocesses please study theSubprocessoperator.

The Forward Selection operator starts with an empty selection of attributes and, in each round, it adds each unused attribute of the given ExampleSet. For each added attribute, the performance is estimated using the inner operators, e.g. a cross-validation. Only the attribute giving the highest increase of performance is added to the selection. Then a new round is started with the modified selection. This implementation avoids any additional memory consumption besides the memory used originally for storing the data and the memory which might be needed for applying the inner operators. Thestopping behaviorparameter specifies when the iteration should be aborted. There are three different options:

  • without increase : The iteration runs as long as there is any increase in performance.
  • without increase of at least: The iteration runs as long as the increase is at least as high as specified, either relative or absolute. Theminimal relative increaseparameter is used for specifying the minimal relative increase if the使用相对increase参数设置为true。Otherwise, theminimal absolute increaseparameter is used for specifying the minimal absolute increase.
  • without significant increase: The iteration stops as soon as the increase is not significant to the level specified by thealphaparameter.

Thespeculative roundsparameter defines how many rounds will be performed in a row, after the first time the stopping criterion is fulfilled. If the performance increases again during the speculative rounds, the selection will be continued. Otherwise all additionally selected attributes will be removed, as if no speculative rounds had executed. This might help avoiding getting stuck in local optima.

Feature selection i.e. the question for the most relevant features for classification or regression problems, is one of the main data mining tasks. A wide range of search methods have been integrated into RapidMiner including evolutionary algorithms. For all search methods we need a performance measurement which indicates how well a search point (a feature subset) will probably perform on the given data set.

Differentiation

Optimize Selection

The Backward Elimination operator starts with the full set of attributes and, in each round, it removes each remaining attribute of the given ExampleSet. For each removed attribute, the performance is estimated using the inner operators, e.g. a cross-validation. Only the attribute giving the least decrease of performance is finally removed from the selection. Then a new round is started with the modified selection.

Input

example set

This input port expects an ExampleSet. This ExampleSet is available at the first port of the nested chain (inside the subprocess) for processing in the subprocess.

Output

example set

The feature selection algorithm is applied on the input ExampleSet. The resultant ExampleSet with reduced attributes is delivered through this port.

attribute weights

The attribute weights are delivered through this port.

performance

This port delivers the Performance Vector for the selected attributes. A Performance Vector is a list of performance criteria values.

Parameters

Maximal number of attributes

This parameter specifies the maximal number of attributes to be selected through Forward Selections.

Speculative rounds

This parameter specifies the number of times, the stopping criterion might be consecutively ignored before the elimination is actually stopped. A number higher than one might help avoiding getting stuck in local optima.

Stopping behavior

Thestopping behaviorparameter specifies when the iteration should be aborted. There are three different options:

  • without_increase: The iteration runs as long as there is any increase in performance.
  • without_increase_of_at_least: The iteration runs as long as the increase is at least as high as specified, either relative or absolute. Theminimal relative increaseparameter is used for specifying the minimal relative increase if the使用相对increase参数设置为true。Otherwise, theminimal absolute increaseparameter is used for specifying the minimal absolute increase.
  • without_significant_increase: The iteration stops as soon as the increase is not significant to the level specified by thealphaparameter.

使用相对增加

This parameter is only available when thestopping behaviorparameter is set to 'without increase of at least'. If the使用相对increaseparameter is set to true theminimal relative increaseparameter will be used otherwise theminimal absolute increaseparameter will be used.

Minimal absolute increase

This parameter is only available when thestopping behaviorparameter is set to 'without increase of at least' and the使用相对increaseparameter is set to false. If the absolute performance increase to the last step drops below this threshold, the selection will be stopped.

Minimal relative increase

This parameter is only available when thestopping behaviorparameter is set to 'without increase of at least' and the使用相对increase参数设置为true。如果相对性能nce increase to the last step drops below this threshold, the selection will be stopped.

Alpha

This parameter is only available when thestopping behaviorparameter is set to 'without significant increase'. This parameter specifies the probability threshold which determines if differences are considered as significant.

Optimize Selection