Weight by Tree Importance

Synopsis

This operator calculates the weight of the attributes by analyzing the split points of a Random Forest model. The attributes with higher weight are considered more relevant and important.

Description

这个权重模式将使用一个给定的随机st to extract the implicit importance of the used attributes. Therefore each node of each tree is visited and the benefit created by the respective split is retrieved. This benefit is summed per attribute, that had been used for the split. The mean benefit over all trees is used as importance.

This algorithm is implemented following the idea from "A comparison of random forest and its gini importance with standard chemometric methods for the feature selection and classification of spectral data" by Menze, Bjoen H et all (2009). It has been extended by additional criterias for computing the benefit created from a certain split. The original paper only mentioned Gini Index, this operator additionally supports the more reliable criterions Information Gain and Information Gain Ratio.

Input

random forest

The input port expects a Random Forest model which is a voting model of random trees. It is output of the Random Forest operator in the attached Example Process.

Output

weights

This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.

random forest

The Random Forest model that was given as input is passed without changing to the output through this port. This is usually used to reuse the same model in further operators or to view the model in the Results Workspace.

Parameters

Criterion

This parameter specifies the criterion to be used for weighting the attributes. It can have one of the following values: information gain, gain ratio, gini index or accuracy.

Normalize weights

This parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in a range from 0 to 1.