Independent Component Analysis
Synopsis
This operator performs the Independent Component Analysis (ICA) of the given ExampleSet using the FastICA-algorithm of Hyvärinen and Oja.
Description
Independent component analysis (ICA) is a very general-purpose statistical technique in which observed random data are linearly transformed into components that are maximally independent from each other, and simultaneously have "interesting" distributions. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction. ICA is used for revealing hidden factors that underlie sets of random variables or measurements. ICA is superficially related to principal component analysis (PCA) and factor analysis. ICA is a much more powerful technique, however, capable of finding the underlying factors or sources when these classic methods fail completely. This operator implements the FastICA-algorithm of A. Hyvärinen and E. Oja. The FastICA-algorithm has most of the advantages of neural algorithms: It is parallel, distributed, computationally simple, and requires little memory space.
Input
example set input
This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The Retrieve operator provides meta data along with the data. Please note that this operator cannot handle nominal attributes; it works on numerical attributes.
Output
example set output
The Independent Component Analysis is performed on the input ExampleSet and the resultant ExampleSet is delivered through this port.
original
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
preprocessing model
This port delivers the preprocessing model, which has information regarding the parameters of this operator in the current process.
Parameters
Dimensionality reduction
This parameter indicates which type of dimensionality reduction (reduction in number of attributes) should be applied.
- none: if this option is selected, dimensionality reduction is not performed.
- fixed_number: if this option is selected, only a fixed number of components are kept. The number of components to keep is specified by thenumber of componentsparameter.
Number of components
This parameter is only available when thedimensionality reductionparameter is set to 'fixed number'. The number of components to keep is specified by thenumber of componentsparameter.
Algorithm type
This parameter specifies the type of algorithm to be used.
- parallel: If parallel option is selected, the components are extracted simultaneously.
- deflation: If deflation option is selected, the components are extracted one at a time.
Function
This parameter specifies the functional form of the G function to be used in the approximation to neg-entropy.
Alpha
This parameter specifies the alpha constant in range[1, 2]which is used in approximation to neg-entropy.
Row norm
This parameter indicates whether rows of the data matrix should be standardized beforehand.
Max iteration
This parameter specifies the maximum number of iterations to perform.
Tolerance
This parameter specifies a positive scalar giving the tolerance at which the un-mixing matrix is considered to have converged.
Use local random seed
This parameter indicates if alocal random seedshould be used for randomization. Using the same value oflocal random seedwill produce the same randomization.
Local random seed
This parameter specifies thelocal random seed. This parameter is only available if theuse local random seedparameter is set to true.