Data to Similarity
Synopsis
This operator measures the similarity of each example of the given ExampleSet with every other example of the same ExampleSet.
Description
The Data to Similarity operator calculates the similarity among examples of an ExampleSet. Same comparisons are not repeated again e.g. if examplex相比之下,例子yto compute similarity then exampleywill not be compared again with examplexto compute similarity because the result will be the same. Thus if there arenexamples in the ExampleSet, this operator does not returnn^2similarity comparisons. Instead it returns(n)(n-1)/2similarity comparisons. This operator provides many different measures for similarity computation. The measure to use for calculating the similarity can be specified through the parameters. Four types of measures are provided:mixed measures,名义上的措施,numerical measuresandBregman divergences.
The behavior of this operator can be considered close to a certain scenario of the Cross Distances operator, if the same ExampleSet is provided at both inputs of the Cross Distances operator and thecompute similaritiesparameter is also set to true. In this case the Cross Distances operator behaves similar to the Data to Similarity operator. There are a few differences though e.g. in this scenario examples are also compared with themselves and secondly the signs (i.e.+ive or -ive) of the results are also different.
Differentiation
Data to Similarity Data
The Data to Similarity Data operator calculates the similarity among all examples of an ExampleSet. Even examples are compared to themselves. Thus if there arenexamples in the ExampleSet, this operator returnsn^2similarity comparisons. The Data to Similarity Data operator returns an ExampleSet which is merely a view, so there should be no memory problems.
Input
example set
This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.
Output
similarity
A similarity measure object that contains the calculated similarity between each example of the given ExampleSet with every other example of the same ExampleSet is delivered through this port.
example set
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
Measure types
This parameter is used for selecting the type of measure to be used for calculating similarity. following options are available:mixed measures,名义上的措施,numerical measuresandBregman divergences.
Mixed measure
This parameter is available if themeasure typeparameter is set to 'mixed measures'. The only available option is the 'Mixed Euclidean Distance'
Nominal measure
This parameter is available if themeasure typeparameter is set to 'nominal measures'. This option cannot be applied if the input ExampleSet has numerical attributes. In this case the 'numerical measure' option should be selected.
Numerical measure
This parameter is available if themeasure typeparameter is set to 'numerical measures'. This option cannot be applied if the input ExampleSet has nominal attributes. In this case the 'nominal measure' option should be selected.
Divergence
This parameter is available if themeasure typeparameter is set to 'bregman divergences'.
Kernel type
This parameter is only available if thenumerical measureparameter is set to 'Kernel Euclidean Distance'. The type of the kernel function is selected through this parameter. Following kernel types are supported:
- dot: The dot kernel is defined byk(x,y)=x*yi.e.it is the inner product ofxandy。
- radial: The radial kernel is defined byexp(-g ||x-y||^2)wheregis thegammathat is specified by thekernel gammaparameter. The adjustable parametergammaplays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand.
- polynomial:我多项式内核s defined byk(x,y)=(x*y+1)^dwheredis the degree of the polynomial and it is specified by thekernel degreeparameter. The Polynomial kernels are well suited for problems where all the training data is normalized.
- neural: The neural kernel is defined by a two layered neural nettanh(a x*y+b)whereaisalphaandbis theintercept constant. These parameters can be adjusted using thekernel aandkernel bparameters. A common value foralphais 1/N, where N is the data dimension. Note that not all choices ofaandblead to a valid kernel function.
- sigmoid: This is the sigmoid kernel. Please note that thesigmoidkernel is not valid under some parameters.
- anova: This is the anova kernel. It has the adjustable parametersgammaanddegree.
- epachnenikov: The Epanechnikov kernel is this function(3/4)(1-u2)forubetween -1 and 1 and zero foruoutside that range. It has the two adjustable parameters内核sigma1andkernel degree.
- gaussian_combination: This is the gaussian combination kernel. It has the adjustable parameters内核sigma1, kernel sigma2andkernel sigma3.
- multiquadric: The multiquadric kernel is defined by the square root of||x-y||^2 + c^2. It has the adjustable parameters内核sigma1andkernel sigma shift.
Kernel gamma
This is the SVM kernel parameter gamma. This parameter is only available when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set toradialoranova.
Kernel sigma1
This is the SVM kernel parameter sigma1. This parameter is only available when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set toepachnenikov,gaussian combinationormultiquadric.
Kernel sigma2
This is the SVM kernel parameter sigma2. This parameter is only available when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set togaussian combination.
Kernel sigma3
This is the SVM kernel parameter sigma3. This parameter is only available when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set togaussian combination.
Kernel shift
This is the SVM kernel parameter shift. This parameter is only available when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set tomultiquadric.
Kernel degree
This is the SVM kernel parameter degree. This parameter is only available when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set topolynomial,anovaorepachnenikov.
Kernel a
This is the SVM kernel parameter a. This parameter is only available when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set toneural.
Kernel b
This is the SVM kernel parameter b. This parameter is only available when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set toneural.