Skip to main content

MetaCost

Synopsis

This metaclassifier makes its base classifier cost-sensitive by using the given cost matrix to compute label predictions according to classification costs.

Description

The MetaCost operator makes its base classifier cost-sensitive by using the cost matrix specified in thecost matrixparameter. The method used by this operator is similar to the MetaCost method described by Pedro Domingos (1999).

The MetaCost operator is a nested operator i.e. it has a subprocess. The subprocess must have a learner i.e. an operator that expects an ExampleSet and generates a model. This operator tries to build a better model using the learner provided in its subprocess. You need to have basic understanding of subprocesses in order to apply this operator. Please study the documentation of theSubprocessoperator for basic understanding of subprocesses.

Most classification algorithms assume that all errors have the same cost, which is seldom the case. For example, in database marketing the cost of mailing to a non-respondent is very small, but the cost of not mailing to someone who would respond is the entire profit lost. In general, misclassification costs may be described by an arbitrary cost matrixC, withC(i,j)being the cost of predicting that an example belongs to classiwhen in fact it belongs to classj. Individually making each classification learner cost-sensitive is laborious, and often non-trivial. MetaCost is a principled method for making an arbitrary classifier cost-sensitive by wrapping a cost-minimizing procedure around it. This procedure treats the underlying classifier as a black box, requiring no knowledge of its functioning or change to it. MetaCost is applicable to any number of classes and to arbitrary cost matrices.

Input

training set

预计一个ExampleSet这个输入端口。这是阿utput of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

model

The meta model is delivered from this output port which can now be applied on unseen data sets for prediction of thelabelattribute.

example set

The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

Cost matrix

This parameter is used for specifying the cost matrix. The cost matrix is similar in structure to a confusion matrix because it has predicted classes in one dimension and actual classes on the other dimension. Therefore the cost matrix can denote the costs for every possible classification outcome: predicted label vs. actual label. Actually this matrix is a matrix of misclassification costs because you can specify different weights for certain classes misclassified as other classes. Weights can also be assigned to correct classifications but usually they are set to 0. The classes in the matrix are labeled as Class 1, Class 2 etc where classes are numbered according to their order in the internal mapping.

Use subset for training

This parameter specifies the fraction of examples to be used for training. Its value must be greater than 0 (i.e. zero examples) and should be lower than or equal to 1 (i.e. entire data set).

Iterations

This parameter specifies the maximum number of iterations of the MetaCost algorithm.

Sampling with replacement

This parameter indicates if sampling with replacement should be used. In sampling with replacement, at every step all examples have equal probability of being selected. Once an example has been selected for the sample, it remains candidate for selection and it can be selected again in any other coming steps. Thus a sample with replacement can have the same example multiple number of times.

Use local random seed

This parameter indicates if alocal random seedshould be used for randomization. Using the same value oflocal random seedwill produce the same sample. Changing the value of this parameter changes the way examples are randomized, thus the sample will have a different set of values.

Local random seed

This parameter specifies thelocal random seed. This parameter is only available if theuse local random seedparameter is set to true.