User:Ke CHEN/Proposed/Support vector domain description

Dr. Robert. P.W. Duin accepted the invitation on 24 September 2007 (self-imposed deadline: 24 November 2007).

This article will briefly cover: a support vector approach for novelty (or outlier) detection.

Simple description

The support vector domain description (SVDD) is a classifier that tries to solve the [outlier detection] problem. It distinguishes a set of target objects (called the target class) from the outlier class. It does it by computing a spherically shaped decision boundary around a training set of target objects. The sphere is put such that the volume of the sphere is minimized, while still keeping all (or most of the) target objects inside. The sphere is actually determined by just a few objects from the target set, located on the sphere boundary. The sphere center and radius can therefore be expressed in terms of these objects. Analogous to the Support Vector Machine, these objects are called the support objects. A new object is classified by computing the distance to the sphere center. When the distance is larger than the sphere radius, the object will be classified as outlier, otherwise it is being accepted as a genuine target object.

The mathematical formulation

Hard-ball solution

The most simple formulation of the SVDD is the hard hypersphere, where all target objects are inside the sphere. Assume we have a set of target objects $$\{ x_i \in R^p,\, i=1,..., n\}\ .$$ To find the center and radius $$a, R$$ of the sphere with minimal volume, we have to solve$\min_{a,R}R^2$ such that $$(x_i-a)^2\leq R^2 \,\forall i$$

The constraints that the objects are inside the sphere can be incorporated in the optimization by using Lagrange Multipliers. The so-called dual representation becomes$\min_{\alpha_i} \sum_i \alpha_i x_i *x_i - \sum_{ij} \alpha_i \alpha_j x_i *x_j$ such that $$\sum_i \alpha_i = 1, \alpha_i\geq 0$$ where $$\alpha_i$$ are the Lagrange multipliers corresponding to each of the constraints. It appears that the sphere center can be written as a linear combination of the original objects $$a = \sum_i \alpha_i x_i \ .$$

The sphere radius $$R$$ does not appear directly from the optimization. For this, the distance of one of the support vectors to the sphere centers has to be computed. For the hard-ball solution, this distance is the same for all support vectors.

To evaluate a new object $$z\ ,$$ the (squared) distance to the sphere center has to be computed$(z-a)^2 = (z-\sum_i \alpha_i x_i)^2 = z*z -2\sum_i \alpha_i z*x_i + \sum_{i,j} \alpha_i \alpha_j x_i*x_j \ .$ The object is rejected when this squared distance is larger than the sphere radius $$R\ .$$

Soft-ball solution

The formulation can be made more robust by allowing some target objects to fall outside the description. Objects that are far from the bulk of the data have a large influence on the sphere solution. Therefore a tradeoff between the distance of an object to the sphere center and the sphere volume has to be made. This is done by introducing so-called slack variables $$\xi_i$$for each of the training objects. The constraints of some objects may be slackened when the sphere volume will decrease sufficiently by it. This can be formulated as follows$\min_{a,R}R^2+C\sum_i \xi_i$ such that $$(x_i-a)^2\leq R^2+\xi_i, \,\xi_i>0\, \forall i$$ where $$C$$ controls the tradeoff between the sphere volume and the object distance.

The dual representation is remarkably similar to the original one$\sum_i \alpha_i x_i *x_i - \sum_{ij} \alpha_i \alpha_j x_i *x_j$ such that $$\sum_i \alpha_i = 1, \,\, 0 \leq \alpha_i \leq C \ .$$ The only difference is the upper bound on $$\alpha_i\ ,$$ that indicates that the influence of a single training object is bounded. Objects for which $$\alpha_i=C$$ are called bounded support vectors, and they are outside the sphere. Objects with $$0 < \alpha_i < C$$ are the unbounded support vectors and are located on the decision boundary. To compute the final sphere radius $$R$$ from the solution $$\{\alpha_i\}$$ one now has to take care that one has to compute the distance from the center to an unbounded support vector.

Kernelized solution

The spherical description can be made more flexible by introducing kernel functions, analogous to the Support Vector Machines. The inner products $$x_i*x_j$$ are replaced by a general kernel function $$K(x_i,x_j)\ .$$ When a Gaussian kernel is used (with an extra free parameter $$s$$) $$K(x_i,x_j) = \exp((x_i-x_j)^2/s^2)$$ solutions ranging form a Parzen density estimation to the original spherical description are obtained. Also a procedure for choosing the appropriate value for s is given such that for all types of data a tight description can be obtained.

Remarks

This tool can now answer other questions as well. It can solve the classification problems in which the different classes are very poorly balanced (or in which one of the classes is completely absent). This happens in medial applications where the population of normal, healthy people is far bigger than the abnormal population. It also opens the possibility to give an indication that a test set is sufficiently similar to the training set.

The support vector data description does not depend on a density estimation of the target data, it only covers an area where target objects appear. In most practical applications it is not only very hard to obtain a genuine sampling of the outlier objects, also the true object distribution of the target object is up to the practitioners ability and skill.