Performance (Classification)

Synopsis

This operator is used for statistical performance evaluation of classification tasks. This operator delivers a list of performance criteria values of the classification task.

Description

This operator should be used for performance evaluation of only classification tasks. Many other performance evaluation operators are also available in RapidMiner e.g. Performance operator, Performance (Binominal Classification) operator, Performance (Regression) operator etc. The Performance (Classification) operator is used with classification tasks only. On the other hand, the Performance operator automatically determines the learning task type and calculates the most common criteria for that type. You can use the Performance (User-Based) operator if you want to write your own performance measure.

Classification is a technique used to predict group membership for data instances. For example, you may wish to use classification to predict whether the train on a particular day will be 'on time', 'late' or 'very late'. Predicting whether a number of people on a particular event would be 'below- average', 'average' or 'above-average' is another example. For evaluating the statistical performance of a classification model the data set should be labeled i.e. it should have an attribute withlabelrole and an attribute withprediction的角色。Thelabelattribute stores the actual observed values whereas thepredictionattribute stores the values oflabelpredicted by the classification model under discussion.

Input

labeled data

This input port expects a labeled ExampleSet. The Apply Model operator is a good example of such operators that provide labeled data. Make sure that the ExampleSet has alabelattribute and apredictionattribute. See theSet Roleoperator for more details regardinglabelandpredictionroles of attributes.

performance

This is an optional parameter. It requires a Performance Vector.

Output

performance

This port delivers a Performance Vector (we call itoutput-performance-vectorfor now). The Performance Vector is a list of performance criteria values. The Performance vector is calculated on the basis of thelabelattribute and thepredictionattribute of the input ExampleSet. Theoutput-performance-vector包含性能rmance criteria calculated by this Performance operator (we call itcalculated-performance-vectorhere). If a Performance Vector was also fed at theperformanceinput port (we call itinput-performance-vectorhere), criteria of theinput-performance-vectorare also added in theoutput-performance-vector. If theinput-performance-vectorand thecalculated-performance-vectorboth have the same criteria but with different values, the values ofcalculated-performance-vectorare delivered through the output port. This concept can be easily understood by studying the attached Example Process.

example set

ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

Main criterion

The main criterion is used for comparisons and needs to be specified only for processes where performance vectors are compared, e.g. attribute selection or other meta optimization process setups. If nomain criterionis selected, the first criterion in the resulting performance vector will be assumed to be themain criterion.

Accuracy

Relative number of correctly classified examples or in other words percentage of correct predictions

Classification error

Relative number of misclassified examples or in other words percentage of incorrect predictions.

Kappa

The kappa statistics for the classification. It is generally thought to be a more robust measure than simple percentage correct prediction calculation since it takes into account the correct prediction occurring by chance.

Weighted mean recall

The weighted mean of all per class recall measurements. It is calculated through class recalls for individual classes. Class recalls are mentioned in the last row of the matrix displayed in the Results Workspace.

Weighted mean precision

The weighted mean of all per class precision measurements. It is calculated through class precisions for individual classes. Class precisions are mentioned in the last column of the matrix displayed in the Results Workspace.

Spearman rho

The rank correlation between the actual and predictedlabels, using Spearman's rho. Spearman's rho is a measure of the linear relationship between two variables. The two variables in this case arelabelattribute andpredictionattribute.

Kendall tau

The rank correlation between the actual and predictedlabels, using Kendall's tau. Kendall's tau is a measure of correlation, and so measures the strength of the relationship between two variables. The two variables in this case are thelabelattribute and thepredictionattribute.

Absolute error

Average absolute deviation of the prediction from the actual value. The values of the label attribute are the actual values.

Relative error

Average relative error is the average of the absolute deviation of the prediction from the actual value divided by the actual value. The values of thelabelattribute are the actual values.

Relative error lenient

Average lenient relative error is the average of the absolute deviation of the prediction from the actual value divided by the maximum of the actual value and the prediction. The values of thelabelattribute are the actual values.

Relative error strict

Average strict relative error is the average of the absolute deviation of the prediction from the actual value divided by the minimum of the actual value and the prediction. The values of thelabelattribute are the actual values.

Normalized absolute error

The absolute error divided by the error made if the average would have been predicted.

Root mean squared error

The averaged root-mean-squared error.

Root relative squared error

The averaged root-relative-squared error.

Squared error

The averaged squared error.

Correlation

Returns the correlation coefficient between thelabelandpredictionattributes.

Squared correlation

Returns the squared correlation coefficient between thelabelandpredictionattributes.

Cross entropy

The cross-entropy of a classifier, defined as the sum over the logarithms of the true label's confidences divided by the number of examples.

Margin

The margin of a classifier, defined as the minimal confidence for the correct label.

Soft margin loss

The average soft margin loss of a classifier, defined as the average of all 1 - confidences for the correctlabel

Logistic loss

The logistic loss of a classifier, defined as the average of ln(1+exp(-[conf(CC)])) where 'conf(CC)' is the confidence of the correct class.

Skip undefined labels

If set to true, examples with undefinedlabelsare skipped.

Comparator class

This is an expert parameter. The fully qualifiedclassnameof thePerformanceComparatorimplementation is specified here.

Use example weights

This parameter allows exampleweights to be used for statistical performance calculations if possible. This parameter has no effect if no attribute hasweight的角色。In order to considerweights例子attribut ExampleSet应该e withweight的角色。Several operators are available that assignweightse.g. Generate Weights operator. Study the Set Roles operator for more information regardingweight的角色。

Class weights

This is an expert parameter. It specifies the weights 'w' for all classes. TheEdit Listbutton opens a new window with two columns. The first column specifies the class name and the second column specifies theweightfor that class. If theweightof a class is not specified, that class is assignedweight = 1.