Skip to main content

K-Means (Kernel)

Synopsis

This operator performs clustering using the kernel k-means algorithm. Clustering is concerned with grouping objects together that are similar to each other and dissimilar to the objects belonging to other clusters. Kernel k-means uses kernels to estimate the distance between objects and clusters. K-means is an exclusive clustering algorithm.

Description

This operator performs clustering using the kernel k-means algorithm. The k-means is an exclusive clustering algorithm i.e. each object is assigned to precisely one of a set of clusters. Objects in one cluster are similar to each other. The similarity between objects is based on a measure of the distance between them. Kernel k-means uses kernels to estimate the distance between objects and clusters. Because of the nature of kernels it is necessary to sum over all elements of a cluster to calculate one distance. So this algorithm is quadratic in number of examples and does not return a Centroid Cluster Model contrary to the K-Means operator. This operator creates a cluster attribute in the resultant ExampleSet if theadd cluster attributeparameter is set to true.

Clustering is concerned with grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters. Clustering is a technique for extracting information from unlabeled data. Clustering can be very useful in many different scenarios e.g. in a marketing application we may be interested in finding clusters of customers with similar buying behavior.

Differentiation

k-Means

Kernel k-means uses kernels to estimate the distance between objects and clusters. Because of the nature of kernels it is necessary to sum over all elements of a cluster to calculate one distance. So this algorithm is quadratic in number of examples and does not return a Centroid Cluster Model which does the K-Means operator.

Input

example set

预计一个ExampleSet输入端口。这是欧tput of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

cluster model

This port delivers the cluster model which has information regarding the clustering performed. It tells which examples are part of which cluster.

clustered set

The ExampleSet that was given as input is passed with minor changes to the output through this port. An attribute withidrole is added to the input ExampleSet to distinguish examples. An attribute withclusterrole may also be added depending on the state of theadd cluster attributeparameter.

Parameters

Add cluster attribute

If enabled, a new attribute withclusterrole is generated directly in this operator, otherwise this operator does not add theclusterattribute. In the latter case you have to use the Apply Model operator to generate theclusterattribute.

Add as label

If true, the cluster id is stored in an attribute with thelabelrole instead ofclusterrole (seeadd cluster attributeparameter).

Remove unlabeled

If set to true, unlabeled examples are deleted.

Use weights

This parameter indicates if the weight attribute should be used.

K

This parameter specifies the number of clusters to form. There is no hard and fast rule of number of clusters to form. But, generally it is preferred to have small number of clusters with examples scattered (not too scattered) around them in a balanced way.

Max optimization steps

This parameter specifies the maximal number of iterations performed for one run of k-Means

Use local random seed

This parameter indicates if alocal random seedshould be used for randomization.

Local random seed

This parameter specifies thelocal random seed. This parameter is only available if theuse local random seedparameter is set to true.

Kernel type

The type of the kernel function is selected through this parameter. Following kernel types are supported:dot, radial, polynomial, neural, anova, epachnenikov, gaussian combination, multiquadric

  • dot: The dot kernel is defined byk(x,y)=x*yi.e. it is inner product ofxandy.
  • radial: The radial kernel is defined byexp(-g ||x-y||^2)wheregis thegamma, it is specified by thekernel gammaparameter. The adjustable parametergammaplays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand.
  • polynomial: The polynomial kernel is defined byk(x,y)=(x*y+1)^dwheredis the degree of polynomial and it is specified by thekernel degreeparameter. The polynomial kernels are well suited for problems where all the training data is normalized.
  • neural: The neural kernel is defined by a two layered neural nettanh(a x*y+b)whereaisalphaandbis theintercept constant. These parameters can be adjusted using thekernel aandkernel bparameters. A common value foralphais 1/N, where N is the data dimension. Note that not all choices ofaandblead to a valid kernel function.
  • anova: The anova kernel is defined by raised to powerdof summation ofexp(-g (x-y))wheregisgammaanddisdegree. gamma and degree are adjusted by thekernel gammaandkernel degreeparameters respectively.
  • epachnenikov: The epachnenikov kernel is this function(3/4)(1-u2)forubetween -1 and 1 and zero foruoutside that range. It has two adjustable parameterskernel sigma1andkernel degree.
  • gaussian_combination: This is the gaussian combination kernel. It has adjustable parameterskernel sigma1, kernel sigma2andkernel sigma3.
  • multiquadric: The multiquadric kernel is defined by the square root of||x-y||^2 + c^2. It has adjustable parameterskernel sigma1andkernel sigma shift.

Kernel gamma

This is the kernel parameter gamma. This is only available when thekernel typeparameter is set toradialoranova.

Kernel sigma1

This is the kernel parameter sigma1. This is only available when thekernel typeparameter is set toepachnenikov,gaussian combinationormultiquadric.

Kernel sigma2

This is the kernel parameter sigma2. This is only available when thekernel typeparameter is set togaussian combination.

Kernel sigma3

这是sigma3内核参数。这只是available when thekernel typeparameter is set togaussian combination.

Kernel shift

This is the kernel parameter shift. This is only available when thekernel typeparameter is set tomultiquadric.

Kernel degree

This is the kernel parameter degree. This is only available when thekernel typeparameter is set topolynomial,anovaorepachnenikov.

Kernel a

This is the kernel parameter a. This is only available when thekernel typeparameter is set toneural.

Kernel b

This is the kernel parameter b. This is only available when thekernel typeparameter is set toneural.

k-Means