Univariate Anomaly Detection
Synopsis
This operator calculates univariate (i.e. one attribute at a time) outlier scores for each attribute in your ExampleSet and provides an aggregrated outlier score for each row of data.
Description
This operator calculates univariate (i.e. one attribute at a time) outlier scores for each attribute in your ExampleSet. In a second step it aggregates the individual outlier scores into one score. Note: this method assumes that your attributes are statistically independent from one another. If this is not the case, your results will not necessarily reflect a true outlier score.
Input
example set
The input ExampleSet.
Output
example set output
The resulting output ExampleSet with the anomaly score(s).
进行预处理ing model
A preprocessing model which allows you to apply the same method on a different ExampleSet.
Parameters
Method
This parameter allows you to select the method you want to use to calculate univariate outlier scores.
- Quartiles: The Quartiles method calculates the anomaly score as: score = (value - median)/IQR, where IQR is the interquartile range (difference between the 25th and the 75th percentile). The Quartiles method can be seen as a more robust version of the z-Score method (see below).
- 柱状图: The Histogram method constructs a histogram for each attribute. The number of bins are automatically determined via Freedman-Diaconis. For each bin in the histogram it calculates the "probability" as: (frequency+1)/size (+1 is used to avoid divisions by zeros). The anomaly score for a given bin is then calculated as 1/probability.
- z-Score: The z-Score method calculates the anomaly score as: score = (value - mean)/standard deviation. This can be interpreted as the distance of the current value to the mean, measured in z standard deviations.
Aggregation method
This parameter allows you to select how you want to aggregate (combine) the different outlier scores from the individual attributes.
- Average: Calculates the average (arithmetic mean) of all univariate outlier scores for one row of data.
- Maximum: Finds the maximum of all univariate outlier scores for one row of data.
- Product: Calculates a normalized product of all univariate outlier scores for one row of data (the product of outlier scores for one row divided by the number of scores). Note: for stability reasons we use sum of logs internally.
Show individual scores
If the show individual scores parameter is set to true, the operator creates a new outlier score attribute for each attribute selected. If set to false, only the aggregated outlier score is shown.