重量Correlation
Synopsis
This operator calculates the relevance of the attributes by computing the value of correlation for each attribute of the input ExampleSet with respect to the label attribute. This weighting scheme is based upon correlation and it returns the absolute or squared value of correlation as attribute weight.
Description
The Weight by Correlation operator calculates the weight of attributes with respect to the label attribute by using correlation. The higher the weight of an attribute, the more relevant it is considered. Please note that the Weight by Correlation operator can be applied only on ExampleSets with numerical or binominal label. It cannot be applied on Polynominal attributes because the polynominal classes provide no information about their ordering, therefore the weights are more or less random depending on the internal numerical representation of the classes. Binominal labels work because of the representation as 0 and 1, as do numerical ones.
A correlation is a number between -1 and +1 that measures the degree of association between two attributes (call them X and Y). A positive value for the correlation implies a positive association. In this case large values of X tend to be associated with large values of Y and small values of X tend to be associated with small values of Y. A negative value for the correlation implies a negative or inverse association. In this case large values of X tend to be associated with small values of Y and vice versa.
Suppose we have two attributes X and Y, with means X' and Y' and standard deviations S(X) and S(Y) respectively. The correlation is computed as summation from 1 to n of the product(X(i)-X').(Y(i)-Y')and then dividing this summation by the product(n-1).S(X).S(Y)wherenis the total number of examples andiis the increment variable of summation. There can be other formulas and definitions but let us stick to this one for simplicity.
As discussed earlier a positive value for the correlation implies a positive association. Suppose that an X value was above average, and that the associated Y value was also above average. Then the product(X(i)-X').(Y(i)-Y')would be the product of two positive numbers which would be positive. If the X value and the Y value were both below average, then the product above would be of two negative numbers, which would also be positive. Therefore, a positive correlation is evidence of a general tendency that large values of X are associated with large values of Y and small values of X are associated with small values of Y.
As discussed earlier a negative value for the correlation implies a negative or inverse association. Suppose that an X value was above average, and that the associated Y value was instead below average. Then the product(X(i)-X').(Y(i)-Y')将产品的积极的和消极的number which would make the product negative. If the X value was below average and the Y value was above average, then the product above would also be negative. Therefore, a negative correlation is evidence of a general tendency that large values of X are associated with small values of Y and small values of X are associated with large values of Y.
Input
example set
This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.
Output
weights
This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.
example set
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
Normalize weights
This parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in range from 0 to 1.
Sort weights
This parameter indicates if the attributes should be sorted according to their weights in the results. If this parameter is set to true, the order of the sorting is specified using the排序方向parameter.
Sort direction
This parameter is only available when thesort weightsparameter is set to true. This parameter specifies the sorting order of the attributes according to their weights.
Squared correlation
This parameter indicates if the squared correlation should be calculated instead of simple correlation. If set to true, the attribute weights are calculated as squares of correlations instead of simple correlations.