Cost-Sensitive Scoring

Synopsis

A new approach for cost-sensitive learning similar to the idea of Meta Cost (Domingo 1999) but without the need for creating multiple models and it still works for more than two classes.

Description

This operator implements a novel algorithm for cost-sensitive learning. It shares all the advantages of Meta Cost (Domingo 1999) without its disadvantages. The problem with Meta Cost is that it requires to create bagged models to work properly. This means that the models become more complex (users need to understand 10+ models) and training times are increased. This operator uses a similar idea but without this drawback.

The basic idea of this approach is that only one model is trained (instead of multiple models during the bagging learning). This model is used as input of this operator. Having only one model also makes it easier to understand for users. But we still need to create a distribution of confidence values for each data point to be predicted. The original Meta Cost was achieving this by using the multiple bagged models.

This algorithms, however, is following a different approach by generating a number of artificial data points close to the one to be predicted. This is a idea which recently became more popular as part of the LIME algorithm for explaining predictions of models. We then use the the confidences for those similar but generated data points as distribution for the confidences for the point to be predicted.

Those confidences are then averaged and used as input to calculate the expected cost exactly like it is done for the original Meta Cost. The prediction with lowest expected cost, i.e. confidences times cost / benefit for this prediction and the possible errors, is chosen as the ultimate prediction. This is not necessarily the class with the highest confidence, but that is the whole point of cost-sensitive learning.

Please note that for best results the confidences should be as close to probabilities as possible. Since this is not the case for most machine learning models, you can use the operator Rescale Confidences (Logistic) to achieve this in a post-processing step. Please note also that while the training time is not increased, the scoring time is slowed down though. This slow-down factor is depending on the number of artificial data points created. We made good experiences with 10 points for each prediction which would result in a slowdown of scoring times by a factor of 10x. However, since scoring is typically very fast, this is not a problem for most use cases.

Input

model

This port expects a model which should be adapted for cost-sensitive scoring.

training set

This port expects an ExampleSet, this should be the same ExampleSet that was used to create the model. It is used to generate statistics which define the possible values and boundaries for the artificially generated data points.

Output

model

该模型which will now produce predictions based on the confidences and the supplied cost matrix.

optimal data

The original training data.

Parameters

Classes

A list of possible classes for the model to predict. The order of the classes in this list also defines the order of classes in the cost matrix.

Cost matrix

The costs and benefits for errors and correct predictions. Costs should be positive numbers and gains / benefits should be negative numbers.

Number of variants

The number of variants created for each data point to be scored. You should use at least 5 variants and we made good experiences with values around 10. More is better but please be aware that this increases the scoring times by the same factor.