Cross Distances
Synopsis
This operator calculates the distance between each example of a 'request set' ExampleSet to each example of a 'reference set' ExampleSet. This operator is also capable of calculating similarity instead of distance.
Description
The Cross Distances operator takes two ExampleSets as input i.e. the 'reference set' and 'request set' ExampleSets. It creates an ExampleSet that contains the distance between each example of the 'request set' ExampleSet to each example of the 'reference set' ExampleSet. Please note that both input ExampleSets should have the same attributes and in the same order. This operator will not work properly if the order of the attributes is different. This operator is also capable of calculating similarity instead of distance. If thecompute similaritiesparameter is set to true, similarities are calculated instead of distances. Please note that both input ExampleSets should haveidattributes. Ifidattributes are not present, this operator automatically createsidattributes for such ExampleSets. The measure to use for calculating the distances can be specified through the parameters. Four type of measures are provided:米ixed measures,nominal measures,numerical measuresandBregman divergences.
If data is imported from two different sources that are supposed to represent the same data but which have columns in different orders, the Cross Distances operator will not behave as expected. It is possible to work round this by using the Generate Attributes operator to recreate attributes in both ExampleSets in the same order.
Input
request set
This input port expects an ExampleSet. This ExampleSet will be used as the 'request set'. Please note that both input ExampleSets ( 'request set' and 'reference set') should have the same attributes and in the same order. This operator will not work properly if the order of the attributes is different. Also note that both input ExampleSets should haveidattributes. Ifidattributes are not present, this operator automatically createsidattributes for such ExampleSets.
reference set
This input port expects an ExampleSet. This ExampleSet will be used as the 'reference set'. Please note that both input ExampleSets ( 'request set' and 'reference set') should have same attributes and in the same order. This operator will not work properly if the order of the attributes is different. Also note that both input ExampleSets should haveidattributes. Ifidattributes are not present, this operator automatically createsidattributes for such ExampleSets.
Output
result set
An ExampleSet that contains the distance (or similarity, if thecompute similaritiesparameter is set to true) between each example of the 'request set' ExampleSet to each example of the 'reference set' ExampleSet is delivered through this port.
request set
The 'request set' ExampleSet that was provided at therequest setinput port is delivered through this port. If the input ExampleSet had anidattribute then the ExampleSet is delivered without any modification. Otherwise anidattribute is automatically added to the input ExampleSet.
reference set
The 'reference set' ExampleSet that was provided at thereference setinput port is delivered through this port. If the input ExampleSet had anidattribute then the ExampleSet is delivered without any modification. Otherwise anidattribute is automatically added to the input ExampleSet.
Parameters
Measure types
这帕拉米eter is used for selecting the type of measure to be used for calculating distances (or similarity).The following options are available:米ixed measures,nominal measures,numerical measuresandBregman divergences.
Mixed measure
这帕拉米eter is available when the米easure typeparameter is set to 'mixed measures'. The only available option is the 'Mixed Euclidean Distance'
Nominal measure
这帕拉米eter is available when the米easure typeparameter is set to 'nominal measures'. This option cannot be applied if the input ExampleSet has numerical attributes. If the input ExampleSet has numerical attributes the 'numerical measure' option should be selected.
Numerical measure
这帕拉米eter is available when the米easure typeparameter is set to 'numerical measures'. This option cannot be applied if the input ExampleSet has nominal attributes. If the input ExampleSet has nominal attributes the 'nominal measure' option should be selected.
散度
这帕拉米eter is available when the米easure typeparameter is set to 'bregman divergences'.
Kernel type
这帕拉米eter is available only when thenumerical measureparameter is set to 'Kernel Euclidean Distance'. The type of the kernel function is selected through this parameter. Following kernel types are supported:
- dot: The dot kernel is defined byk(x,y)=x*yi.e.it is inner product ofxandy.
- radial: The radial kernel is defined byexp(-g ||x-y||^2)wheregis thegammathat is specified by thekernel gammaparameter. The adjustable parametergammaplays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand.
- polynomial:定义的多项式内核k(x,y)=(x*y+1)^dwheredis the degree of the polynomial and it is specified by thekernel degreeparameter. The Polynomial kernels are well suited for problems where all the training data is normalized.
- neural: The neural kernel is defined by a two layered neural nettanh(a x*y+b)whereaisalphaandbis theintercept constant. These parameters can be adjusted using thekernel aandkernel bparameters. A common value foralphais 1/N, where N is the data dimension. Note that not all choices ofaandblead to a valid kernel function.
- sigmoid: This is the sigmoid kernel. Please note that thesigmoidkernel is not valid under some parameters.
- anova: This is the anova kernel. It has adjustable parametersgammaanddegree.
- epachnenikov: The Epanechnikov kernel is this function(3/4)(1-u2)forubetween -1 and 1 and zero foruoutside that range. It has two adjustable parameterskernel sigma1andkernel degree.
- gaussian_combination: This is the gaussian combination kernel. It has adjustable parameterskernel sigma1, kernel sigma2andkernel sigma3.
- 米ultiquadric: The multiquadric kernel is defined by the square root of||x-y||^2 + c^2. It has adjustable parameterskernel sigma1andkernel sigma shift.
Kernel gamma
This is the SVM kernel parameter gamma. This parameter is available when only thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set toradialoranova.
Kernel sigma1
This is the SVM kernel parameter sigma1. This parameter is available only when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set toepachnenikov,gaussian combinationor米ultiquadric.
Kernel sigma2
This is the SVM kernel parameter sigma2. This parameter is available only when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set togaussian combination.
Kernel sigma3
这是SVM sigma3内核参数。这帕拉米eter is available only when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set togaussian combination.
Kernel shift
This is the SVM kernel parameter shift. This parameter is available only when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set to米ultiquadric.
Kernel degree
This is the SVM kernel parameter degree. This parameter is available only when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set topolynomial,anovaorepachnenikov.
Kernel a
This is the SVM kernel parameter a. This parameter is available only when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set toneural.
Kernel b
This is the SVM kernel parameter b. This parameter is available only when thenumerical measureparameter is set to 'Kernel Euclidean Distance' and thekernel typeparameter is set toneural.
Only top k
这帕拉米eter indicates if only theknearest to each request example should be calculated.
K
这帕拉米eter is only available when theonly top kparameter is set to true. It determines how many of the nearest examples should be shown in the result.
Search for
这帕拉米eter is only available when theonly top kparameter is set to true. It determines if the nearest or the farthest distances should be selected.
Compute similarities
If this parameter is set true, similarities are computed instead of distances. All measures will still be usable, but measures that are not originally distance or respective similarity measure are transformed to match optimization direction.