Support Vector Clustering
Synopsis
This operator performs clustering with support vectors. Clustering is concerned with grouping objects together that are similar to each other and dissimilar to the objects belonging to other clusters. Clustering is a technique for extracting information from unlabeled data.
Description
This operator is an implementation of Support Vector Clustering based on Ben-Hur et al (2001). In this Support Vector Clustering (SVC) algorithm data points are mapped from data space to a high dimensional feature space using a Gaussian kernel. In feature space the smallest sphere that encloses the image of the data is searched. This sphere is mapped back to data space, where it forms a set of contours which enclose the data points. These contours are interpreted as cluster boundaries. Points enclosed by each separate contour are associated with the same cluster. As the width parameter of the Gaussian kernel is decreased, the number of disconnected contours in data space increases, leading to an increasing number of clusters. Since the contours can be interpreted as delineating the support of the underlying probability distribution, this algorithm can be viewed as one identifying valleys in this probability distribution.
Clustering is concerned with grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters. It is a technique for extracting information from unlabeled data and can be very useful in many different scenarios e.g. in a marketing application we may be interested in finding clusters of customers with similar buying behavior.
Input
example set
This input port expects an ExampleSet. It is output of the Generate Data operator in the attached Example Process.
Output
集群model
This port delivers the cluster model. It has information regarding the clustering performed. It tells which examples are part of which cluster.
集群ed set
The ExampleSet that was given as input is passed with minor changes to the output through this port. An attribute withidrole is added to the input ExampleSet to distinguish examples. An attribute with集群role may also be added depending on the state of theadd cluster attributeparameter.
Parameters
Add cluster attribute
If this parameter is set to true, a new attribute with集群role is generated in the resultant ExampleSet, otherwise this operator does not add the集群attribute. In the latter case you have to use the Apply Model operator to generate the集群attribute.
Add as label
If this parameter is set to true, the cluster id is stored in an attribute with thelabelrole instead of集群role (seeadd cluster attributeparameter).
Remove unlabeled
If this parameter is set to true, unlabeled examples are deleted from the ExampleSet.
Min pts
这个参数指定了poi的最小数量nts in each cluster.
Kernel type
The type of the kernel function is selected through this parameter. Following kernel types are supported:点、径向多项式,神经
- dot: The dot kernel is defined byk(x,y)=x*yi.e. it is inner product ofxandy.
- radial: The radial kernel is defined byexp(-g ||x-y||^2)wheregis thegamma, it is specified by thekernel gammaparameter. The adjustable parametergammaplays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand.
- polynomial:聚nomial kernel is defined byk(x,y)=(x*y+1)^dwheredis the degree of polynomial and it is specified by thekernel degreeparameter. The polynomial kernels are well suited for problems where all the training data is normalized.
- neural: The neural kernel is defined by a two layered neural nettanh(a x*y+b)whereaisalphaandbis theintercept constant. These parameters can be adjusted using thekernel aandkernel bparameters. A common value foralphais 1/N, where N is the data dimension. Note that not all choices ofaandblead to a valid kernel function.
Kernel gamma
This is the SVM kernel parameter gamma. This is available only when thekernel typeparameter is set toradial.
Kernel degree
This is the SVM kernel parameter degree. This is available only when thekernel typeparameter is set topolynomial.
Kernel a
This is the SVM kernel parameter a. This is available only when thekernel typeparameter is set toneural.
Kernel b
This is the SVM kernel parameter b. This is available only when thekernel typeparameter is set toneural.
Kernel cache
This is an expert parameter. It specifies the size of the cache for kernel evaluations in megabytes.
Convergence epsilon
This is an optimizer parameter. It specifies the precision on the KKT conditions.
Max iterations
This is an optimizer parameter. It specifies to stop iterations after a specified number of iterations.
P
This parameter specifies the fraction of allowed outliers.
R
If this parameter is set to -1 then the the calculated radius is used as radius. Otherwise the value specified in this parameter is used as radius.
Number sample points
This parameter specifies the number of virtual sample points to check for neighborhood.