Skip to main content

Backward Elimination

Synopsis

This operator selects the most relevant attributes of the given ExampleSet through an efficient implementation of the backward elimination scheme.

Description

逆向消除操作符是一个嵌套的加工ator i.e. it has a subprocess. The subprocess of the Backward Elimination operator must always return a performance vector. For more information regarding subprocesses please study theSubprocessoperator.

The Backward Elimination operator starts with the full set of attributes and, in each round, it removes each remaining attribute of the given ExampleSet. For each removed attribute, the performance is estimated using the inner operators, e.g. a cross-validation. Only the attribute giving the least decrease of performance is finally removed from the selection. Then a new round is started with the modified selection. This implementation avoids any additional memory consumption besides the memory used originally for storing the data and the memory which might be needed for applying the inner operators. Thestopping behaviorparameter specifies when the iteration should be aborted. There are three different options:

  • with decrease: The iteration runs as long as there is any increase in performance.
  • with decrease of more than: The iteration runs as long as the decrease is less than the specified threshold, either relative or absolute. Themaximal relative decreaseparameter is used for specifying the maximal relative decrease if theuse relative decreaseparameter is set to true. Otherwise, themaximal absolute decreaseparameter is used for specifying the maximal absolute decrease.
  • with significant decrease: The iteration stops as soon as the decrease is significant to the level specified by thealphaparameter.

Thespeculative roundsparameter defines how many rounds will be performed in a row, after the first time the stopping criterion is fulfilled. If the performance increases again during the speculative rounds, the elimination will be continued. Otherwise all additionally eliminated attributes will be restored, as if no speculative rounds had executed. This might help avoiding getting stuck in local optima.

Feature selection i.e. the question for the most relevant features for classification or regression problems, is one of the main data mining tasks. A wide range of search methods have been integrated into RapidMiner including evolutionary algorithms. For all search methods we need a performance measurement which indicates how well a search point (a feature subset) will probably perform on the given data set.

Differentiation

Optimize Selection

The Forward Selection operator starts with an empty selection of attributes and, in each round, it adds each unused attribute of the given ExampleSet. For each added attribute, the performance is estimated using the inner operators, e.g. a cross-validation. Only the attribute giving the highest increase of performance is added to the selection. Then a new round is started with the modified selection.

Input

example set

This input port expects an ExampleSet. This ExampleSet is available at the first port of the nested chain (inside the subprocess) for processing in the subprocess.

Output

example set

The feature selection algorithm is applied on the input ExampleSet. The resultant ExampleSet with reduced attributes is delivered through this port.

attribute weights

The attribute weights are delivered through this port.

performance

这个港口交付the Performance Vector for the selected attributes. A Performance Vector is a list of performance criteria values.

Parameters

Maximal number of eliminations

This parameter specifies the maximal number of backward eliminations.

Speculative rounds

This parameter specifies the number of times, the stopping criterion might be consecutively ignored before the elimination is actually stopped. A number higher than one might help avoiding getting stuck in local optima.

Stopping behavior

Thestopping behaviorparameter specifies when the iteration should be aborted. There are three different options:

  • with_decrease: The iteration runs as long as there is any increase in performance.
  • with_decrease_of_more_than: The iteration runs as long as the decrease is less than the specified threshold, either relative or absolute. Themaximal relative decreaseparameter is used for specifying the maximal relative decrease if theuse relative decreaseparameter is set to true. Otherwise, themaximal absolute decreaseparameter is used for specifying the maximal absolute decrease.
  • with_significant_decrease: The iteration stops as soon as the decrease is significant to the level specified by thealphaparameter.

Use relative decrease

This parameter is only available when thestopping behaviorparameter is set to 'with decrease of more than'. If theuse relative decreaseparameter is set to true themaximal relative decreaseparameter will be used otherwise themaximal absolute decreaseparameter.

Maximal absolute decrease

This parameter is only available when thestopping behaviorparameter is set to 'with decrease of more than' and theuse relative decreaseparameter is set to false. If the absolute performance decrease to the last step exceeds this threshold, the elimination will be stopped.

最大相对减少

This parameter is only available when thestopping behaviorparameter is set to 'with decrease of more than' and theuse relative decreaseparameter is set to true. If the relative performance decrease to the last step exceeds this threshold, the elimination will be stopped.

Alpha

This parameter is only available when thestopping behaviorparameter is set to 'with significant decrease'. This parameter specifies the probability threshold which determines if differences are considered as significant.

Optimize Selection