Skip to main content

Optimize Selection (Brute Force)

Synopsis

This operator selects the most relevant attributes of the given ExampleSet by trying all possible combinations of attribute selections.

Description

The Optimize Selection (Brute Force) operator is a nested operator i.e. it has a subprocess. This subprocess must always return a performance vector. This operator selects the feature set with the best performance vector. You need to have basic understanding of subprocesses in order to apply this operator. Please study the documentation of theSubprocessoperator for basic understanding of subprocesses.

Feature selection i.e. the question for the most relevant features for classification or regression problems, is one of the main data mining tasks. A wide range of search methods have been integrated into RapidMiner including evolutionary algorithms. For all search methods we need a performance measurement which indicates how well a search point (a feature subset) will probably perform on the given data set.

This feature selection operator selects the best attribute set by trying all possible combinations of attribute selections. It returns the ExampleSet containing the subset of attributes which produced the best performance. As this operator works on the power-set of the attribute set, it has exponential runtime.

Differentiation

Optimize Selection (Evolutionary)

This is also an attribute set reduction operator but it uses a genetic algorithm for this purpose.

Input

example set in

预计一个ExampleSet这个输入端口。这ExampleSet is available at the first port of the nested chain (inside the subprocess) for processing in the subprocess.

through

This operator can have multiplethroughports. When one input is connected with thethroughport, anotherthroughport becomes available which is ready to accept another input (if any). The order of inputs remains the same. The Object supplied at the firstthroughport of this operator is available at the firstthroughport of the nested chain (inside the subprocess). Do not forget to connect all inputs in correct order. Make sure that you have connected the right number of ports at the subprocess level.

Output

example set out

The feature selection algorithm is applied on the input ExampleSet. The resultant ExampleSet with reduced attributes is delivered through this port.

weights

The attribute weights are delivered through this port.

performance

This port delivers the Performance Vector for the selected attributes. A Performance Vector is a list of performance criteria values.

Parameters

Use exact number of attributes

This parameter determines if only combinations containing exact numbers of attributes should be tested. The exact number is specified by theexact number of attributesparameter.

Exact number of attributes

This parameter is only available when theuse exact number of attributesparameter is set to true. Only combinations containing this numbers of attributes would be generated and tested.

Restrict maximum

If set to true, the maximum number of attributes whose combinations will be generated and tested can be restricted. Otherwise all combinations of all attributes are generated and tested. This parameter is only available when theuse exact number of attributesparameter is set to true.

Min number of attributes

This parameter determines the minimum number of features used for the combinations to be generated and tested.

Max number of attributes

This parameter determines the maximum number of features used for the combinations to be generated and tested. This parameter is only available when therestrict maximumparameter is set to true.

Normalize weights

This parameter indicates if the final weights should be normalized. If set to true, the final weights are normalized such that the maximum weight is 1 and the minimum weight is 0.

Use local random seed

This parameter indicates if alocal random seedshould be used for randomization. Using the same value oflocal random seedwill produce the same randomization.

Local random seed

This parameter specifies thelocal random seed,是啊nly available if theuse local random seedparameter is set to true.

Show stop dialog

This parameter determines if a dialog with astopbutton should be displayed which stops the search for the best feature space. If the search for best feature space is stopped, the best individual found till then will be returned.

User result individual selection

If this parameter is set to true, it allows the user to select the final result individual from the last population.

Show population plotter

This parameter determines if the current population should be displayed in the performance space.

Plot generations

This parameter is only available when theshow population plotterparameter is set to true. The population plotter is updated in these generations.

Constraint draw range

This parameter is only available when theshow population plotterparameter is set to true. This parameter determines if the draw range of the population plotter should be constrained between 0 and 1.

Draw dominated points

This parameter is only available when theshow population plotterparameter is set to true. This parameter determines if only points which are not Pareto dominated should be drawn on the population plotter.

Population criteria data file

This parameter specifies the path to the file in which the criteria data of the final population should be saved.

Maximal fitness

This parameter specifies the maximal fitness. The optimization will stop if the fitness reaches this value.

Optimize Selection (Evolutionary)