Weight by Chi Squared Statistic
Synopsis
This operator calculates the relevance of the attributes by computing for each attribute of the input ExampleSet the value of the chi-squared statistic with respect to the class attribute.
Description
The Weight by Chi Squared Statistic operator calculates the weight of attributes with respect to the class attribute by using the chi-squared statistic. The higher the weight of an attribute, the more relevant it is considered. Please note that the chi-squared statistic can only be calculated for nominal labels. Thus this operator can be applied only on ExampleSets with nominal label.
The chi-square statistic is a nonparametric statistical technique used to determine if a distribution of observed frequencies differs from the theoretical expected frequencies. Chi-square statistics use nominal data, thus instead of using means and variances, this test uses frequencies. The value of the chi-square statistic is given by
X2 = Sigma[ (O-E)2/ E ]
whereX2卡方统计量,Ois the observed frequency andEis the expected frequency. Generally the chi-squared statistic summarizes the discrepancies between the expected number of times each outcome occurs (assuming that the model is true) and the observed number of times each outcome occurs, by summing the squares of the discrepancies, normalized by the expected numbers, over all the categories.
Input
example set
This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.
Output
weights
This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.
example set
ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
Normalize weights
This parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in range from 0 to 1.
Sort weights
This parameter indicates if the attributes should be sorted according to their weights in the results. If this parameter is set to true, the order of the sorting is specified using the排序方向parameter.
Sort direction
This parameter is available only when thesort weightsparameter is set to true. This parameter specifies the sorting order of the attributes according to their weights.
Number of bins
This parameter specifies the number of bins used for discretization of numerical attributes before the chi-squared test can be performed.