Skip to main content

Generate Outlier Flag

Synopsis

This operator allows you to turn a outlier score attribute into a flag (Outlier, No Outlier).

Description

Most outlier algorithms provide a score which indicates outlierness. The higher, the more likely is a potential outlier. The threshold when exactly a given value is considered to be an outlier is often not determined by the given outlier detection algorithm.

这个操作符提供es different methods to accomplish that. It creates an additional column "outlier flag" which contains the strings "Outlier" and "No Outlier" indicating it. Please check the documentation for the method parameter for a description of the different methods.

Input

example set

The input ExampleSet, which has a score attribute

Output

example set

The resulting output ExampleSet with scores

model

The flag model which can be used to be applied on other data sets.

Parameters

Method

The different methods how to determine the threshold.

contamination: Assumes that the training set has a given percentage p of outliers in it. Then the 1-pth percentile is calculated and defined it as a threshold. This is equivalent of saying: The top p % of the data set is considered to be an outlier.

manual: The user defines the value of the given threshold

z-score: The score is transformed by a z-score by determening the mean and the standard deviation of the score. We then calculate z = (score-mean)/standard_deviation. The user can now define a threshod in this new metric.

Define score column

If set to true we use the first column which has the role confidence as the score column. If set to false the user can specify the score column manually

Score column

Column which contains the score. Only useable if define_score_column is set to false

Contamination threshold

Threshold for the contamination method.

Manual threshold

Threshold for the manual method.

Zscore threshold

Threshold for the zscore method.