Support Vector Machines
From RapidWiki
Support Vector Machines (SVM) are a set of related supervised learning methods used for classification and regression. They belong to a family of generalized linear classifiers. A special property of SVMs is that they simultaneously minimize the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers.
Contents |
Introduction
Support vector machines map input vectors to a higher dimensional space where a maximal separating hyperplane is constructed. The separating hyperplane is the hyperplane that maximises the distance between the two parallel hyperplanes. It can be shown that the larger the margin or distance between these parallel hyperplanes the better the generalisation error of the classifier will be.
Motivation
Often we are interested in classifying data as a part of a machine-learning process. Each data point will be represented by a p-dimensional vector (a list of p numbers). Each of these data points belongs to only one of two classes. We are interested in whether we can separate them with an "p minus 1" dimensional hyperplane. This is a typical form of linear classifier. There are many linear classifiers that might satisfy this property. However, we are additionally interested in finding out if we can achieve maximum separation (margin) between the two classes. By this we mean that we pick the hyperplane so that the distance from the hyperplane to the nearest data point is maximized. That is to say that the nearest distance between a point in one separated hyperplane and a point in the other separated hyperplane is maximized. This ensures the maximum amount of safety margin if new points following the same distribution are drawn and presented for classification.
Non-linear classification
The original optimal hyperplane algorithm proposed by Vladimir Vapnik in 1963 was a linear classifier. However, in 1992, Bernhard Boser, Isabelle Guyon and Vapnik suggested a way to create non-linear classifiers by applying the kernel trick (originally proposed by Aizerman) to maximum-margin hyperplanes. The resulting algorithm is formally similar, except that every dot product is replaced by a non-linear kernel function. This allows the algorithm to fit the maximum-margin hyperplane in the transformed feature space. The transformation may be non-linear and the transformed space high dimensional; thus though the classifier is a hyperplane in the high-dimensional feature space it may be non-linear in the original input space. This is referred to as the "kernel trick".
Regression
A version of a SVM for regression was proposed in 1996 by Vladimir Vapnik, Harris Drucker, Chris Burges, Linda Kaufman and Alex Smola. This method is called support vector regression (SVR). The model produced by support vector classification (as described above) only depends on a subset of the training data, because the cost function for building the model does not care about training points that lie beyond the margin. Analogously, the model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that are close (within a threshold ε) to the model prediction.
