Skip to main content

Detect Outliers (Time Series)

Synopsis

This operator detects outliers in time series data. So far only univariate outlier techniques are supported.

Description

Time Series data is special in terms of anomalies, since they have to be seen in their temporal nature. Especially the question: Is this data point an anomaly with respect to the previous k data points is something very specific to time series.

This operator has two ways of operating, depending if the ref port is connected or not. If the ref port is not connected the operator applys a sliding window approach. It always uses the training_size number of data points to train a given algorithm, and then applies it on the test_size succeeding ones. The used method can be defined in the 'method' parameter.

If the ref port is connected the training data set is always the one given at the ref port.

All currently implemented algorithm are univariate algorithms. They create one score per column. The individual scores need to be aggregated, in order to get one score for the complete example. The aggregation function can be selected using the aggregation_method parameter.

Input

example set

The input ExampleSet.

ref

参考数据集。如果端口连接,the data given at this port is used as a training data set. Otherwise the operator uses a sliding window approach

Output

sco

The resulting output ExampleSet with the anomaly score(s).

ori

The original data set

Parameters

Method

This parameter allows you to select the method you want to use to calculate outlier scores.

  • z-score: In this method we calculate the mean and the standard deviation of the training set. We then calculate the z-score as: (value-mean)/std_dev for each value in the testing set. The higher the absolute value, the higher the likelihood of an outlier
  • Standard Deviation: In this method we calculate the standard deviation of the training set. We then calculate also the standard deviation of the test set. The delivered score is the ratio: std_dev_testing/std_dev_training. This score can only be evaluated for test_sizes bigger than 1. Also notice than the anomaly score is the same over the whole test window.
  • Linear Regression: In this method we fit a line through the training data points. We then extrapolate the line and evaluate it at the next data points. The forecasted value is then compared to the real value. The score is either the relative or the absolute difference between the two values, depending on how normalize_regression_scores is set.

Aggregation method

This parameter allows you to select how you want to aggregate (combine) the different outlier scores from the individual attributes.

  • Average: Calculates the average (arithmetic mean) of all univariate outlier scores for one row of data.
  • Maximum: Finds the maximum of all univariate outlier scores for one row of data.
  • Product: Calculates a normalized product of all univariate outlier scores for one row of data (the product of outlier scores for one row divided by the number of scores). Note: for stability reasons we use sum of logs internally.

Training size

Size of the training window. Only used if the reference port is not connected.

Test size

Size of the test window.

Show individual scores

If the show individual scores parameter is set to true, the operator creates a new outlier score attribute for each attribute selected. If set to false, only the aggregated outlier score is shown.

Use absolutes in aggregation

If set to true absolutes of the scores are used in the aggregation.

Normalize regression scores

If set to true the Linear Regression method will calculate relative differences. Otherwise absolute differences are used.