Univariate Anomaly Detection

Synopsis

This operator calculates univariate (i.e. one attribute at a time) outlier scores for each attribute in your ExampleSet and provides an aggregrated outlier score for each row of data.

Description

This operator calculates univariate (i.e. one attribute at a time) outlier scores for each attribute in your ExampleSet. In a second step it aggregates the individual outlier scores into one score. Note: this method assumes that your attributes are statistically independent from one another. If this is not the case, your results will not necessarily reflect a true outlier score.

Input

example set

The input ExampleSet.

Output

example set output

The resulting output ExampleSet with the anomaly score(s).

进行预处理ing model

A preprocessing model which allows you to apply the same method on a different ExampleSet.

Parameters

Method

This parameter allows you to select the method you want to use to calculate univariate outlier scores.

Quartiles: The Quartiles method calculates the anomaly score as: score = (value - median)/IQR, where IQR is the interquartile range (difference between the 25th and the 75th percentile). The Quartiles method can be seen as a more robust version of the z-Score method (see below).
柱状图: The Histogram method constructs a histogram for each attribute. The number of bins are automatically determined via Freedman-Diaconis. For each bin in the histogram it calculates the "probability" as: (frequency+1)/size (+1 is used to avoid divisions by zeros). The anomaly score for a given bin is then calculated as 1/probability.
z-Score: The z-Score method calculates the anomaly score as: score = (value - mean)/standard deviation. This can be interpreted as the distance of the current value to the mean, measured in z standard deviations.

Aggregation method

This parameter allows you to select how you want to aggregate (combine) the different outlier scores from the individual attributes.

Average: Calculates the average (arithmetic mean) of all univariate outlier scores for one row of data.
Maximum: Finds the maximum of all univariate outlier scores for one row of data.
Product: Calculates a normalized product of all univariate outlier scores for one row of data (the product of outlier scores for one row divided by the number of scores). Note: for stability reasons we use sum of logs internally.

Show individual scores

If the show individual scores parameter is set to true, the operator creates a new outlier score attribute for each attribute selected. If set to false, only the aggregated outlier score is shown.