Detect Outlier (Densities)
Synopsis
这个操作符识别异常值given ExampleSet based on the data density. All objects that have at least
pproportion of all objects farther away than distanceDare considered outliers.
Description
The Detect Outlier (Densities) operator is an outlier detection algorithm that calculates theDB(p,D)-outliersfor the given ExampleSet. ADB(p,D)-outlieris an object which is at leastDdistance away from at leastpproportion of all objects. The two real-valued parameterspandDcan be specified through theproportionanddistanceparameters respectively. TheDB(p,D)-outliersare distance-based outliers according to Knorr and Ng. This operator implements a global homogenous outlier search.
This operator adds a new boolean attribute named 'outlier' to the given ExampleSet. If the value of this attribute is true, that example is an outlier and vice versa. Different distance functions are supported by this operator. The desired distance function can be selected by thedistance functionparameter.
An outlier is an example that is numerically distant from the rest of the examples of the ExampleSet. An outlying example is one that appears to deviate markedly from other examples of the ExampleSet. Outliers are often (not always) indicative of measurement error. In this case such examples should be discarded.
Input
example set input
This input port expects an ExampleSet. It is the output of the Generate Data operator in the attached Example Process. The output of other operators can also be used as input.
Output
example set output
A new boolean attribute 'outlier' is added to the given ExampleSet and the ExampleSet is delivered through this output port.
original
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
Distance
This parameter specifies the distanceDparameter for calculation of theDB(p,D)-outliers.
Proportion
This parameter specifies the proportionpparameter for calculation of theDB(p,D)-outliers.
Distance function
This parameter specifies the distance function that will be used for calculating the distance between two examples.