Skip to main content

Covariance Matrix

Synopsis

This operator calculates the covariance between all attributes of the input ExampleSet and returns a covariance matrix giving a measure of how much two attributes change together.

Description

Covariance is a measure of how much two attributes change together. If the greater values of one attribute mainly correspond with the greater values of the other attribute, and the same holds for the smaller values, i.e. the attributes tend to show similar behavior, the covariance is a positive number. In the opposite case, when the greater values of one attribute mainly correspond to the smaller values of the other, i.e. the attributes tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. For two attributesxandyhaving meansE{x}andE{y}, the covariance is defined as:

Cov(x,y) = E{[ x - E(x) ][ y - E(y) ]}

The covariance calculation begins with pairs ofxandy, takes their differences from their mean values and multiplies these differences together. For instance, if forx1andy1this product is positive, for that pair of data points the values ofxandyhave varied together in the same direction from their means. If the product is negative, they have varied in opposite directions. The larger the magnitude of the product, the stronger the strength of the relationship. The covariance is defined as the mean value of this product, calculated using each pair of data pointsx(i)andy(i). If the covariance is zero, then the cases in which the product was positive were offset by those in which it was negative, and there is no linear relationship between the two attributes.

The value of the covariance is interpreted as follows:

  • Positive covariance: indicates thathigherthan average values of one attribute tend to be paired with higher than average values of the other attribute.
  • Negative covariance: indicates thathigherthan average values of one attribute tend to be paired with lower than average values of the other attribute.
  • Zero covariance: if the two attributes are independent, the covariance will be zero. However, a covariance of zero does not necessarily mean that the variables are independent. A nonlinear relationship can exist that still would result in a covariance value of zero.

Because the number representing covariance depends on the units of the data, it is difficult to compare covariances among data sets having different scales. A value that might represent a strong linear relationship for one data set might represent a very weak one in another.

Input

example set

This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

example set

The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

covariance

所有的属性输入穰的协方差mpleSet are calculated and the resultant covariance matrix is returned from this port.