ANOVA
Synopsis
This operator is used for comparison of performance vectors. It performs an analysis of variance (ANOVA) test to determine the probability for the null hypothesis i.e. 'the actual means are the same'.
Description
ANalysis Of VAriance (ANOVA) is a statistical model in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t-test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. For this reason, ANOVA is useful in comparing two, three, or more means. 'False positive' or Type I error is defined as the probability that a decision to reject the null hypothesis will be made when it is in fact true and should not have been rejected. RapidMiner provides the T-Test operator for performing the t-test. Paired t-test is a test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero.
Differentiation
T-Test
Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. For this reason, ANOVA is useful in comparing two, three, or more means.
Input
performance
This operator expects performance vectors as input it can have multiple inputs. When one input is connected, anotherperformance输入端口可用acc的准备ept another input (if any). The order of inputs remains the same. The performance vector supplied at the firstinputport of this operator is available at the firstperformanceoutput port of the operator.
Output
significance
The given performance vectors are compared and the result of the significance test is delivered through this port.
performance
This operator can have multipleperformanceoutput ports. When one output is connected, anotherperformanceoutput port becomes available which is ready to deliver another output (if any). The order of outputs remains the same. The performance vector delivered at firstperformanceinput port of this operator is delivered at the firstperformanceoutput port of the operator.
Parameters
Alpha
This parameter specifies the probability threshold which determines if differences are considered as significant. If a test of significance gives a p-value lower than the significance levelalpha, the null hypothesis is rejected. It is important to understand that the null hypothesis can never be proven. A set of data can only reject a null hypothesis or fail to reject it. For example, if comparison of two groups reveals no statistically significant difference between the two, it does not mean that there is no difference in reality. It only means that there is not enough evidence to reject the null hypothesis (in other words, the experiment fails to reject the null hypothesis).