top of page
svm-example-1024x709.png.webp

SUPPORT VECTOR MACHINES(SVM)

OVERVIEW

 

Support Vector Machines (SVMs) are a powerful class of supervised machine learning algorithms used for classification and regression tasks. The fundamental idea of SVM is to find the hyperplane that best divides a dataset into two classes. 

SVMs are commonly used within classification problems, as they distinguish between two classes by finding the optimal hyperplane that maximizes the margin between the closest data points of opposite classes. The number of features in the input data determine if the hyperplane is a line in a 2-D space or a plane in a n-dimensional space. Since multiple hyperplanes can be found to differentiate classes, maximizing the margin between points enables the algorithm to find the best decision boundary between classes. This, in turn, enables it to generalize well to new data and make accurate classification predictions. The lines that are adjacent to the optimal hyperplane are known as support vectors as these vectors run through the data points that determine the maximal margin.

The SVM algorithm is widely used in machine learning as it can handle both linear and nonlinear classification tasks. However, when the data is not linearly separable, kernel functions are used to transform the data higher-dimensional space to enable linear separation. This application of kernel functions can be known as the “kernel trick”, and the choice of kernel function, such as linear kernels, polynomial kernels, radial basis function (RBF) kernels, or sigmoid kernels, depends on data characteristics and the specific use case.

​

LINEAR SEPARATORS

​

SVMs are often considered linear classifiers because, in their basic form, they attempt to separate classes using a straight line (in two dimensions), a plane (in three dimensions), or a hyperplane (in higher dimensions). The hyperplane is chosen to maximize the margin between the two classes, which is the distance from the nearest points of each class to the hyperplane. These nearest points are called support vectors, as they support or define the hyperplane.

SVM1.png

THE KERNEL TRICK

​

In many real-world scenarios, data isn't linearly separable. The kernel trick allows SVMs to operate in a transformed feature space without explicitly computing the coordinates of the data in that higher-dimensional space. This is done through the use of kernel functions, which compute the inner product of data points in the transformed feature space.

​

WHY IS THE DOT PRODUCT SO CRITICAL?

​

The dot product is a fundamental operation in the computation of kernel functions. It helps in measuring the similarity or the angle between two vectors in the feature space. For linear SVMs, the decision function is a linear combination of the dot products of support vectors and the data points being classified. By using a kernel function, this dot product is computed in a higher-dimensional space, enabling non-linear separation in the original input space.

​

COMMON KERNEL FUNCTIONS

​

  1. Polynomial Kernel:

Screenshot 2024-04-27 at 5.59.36 PM.png

Where x and y are data points, r (coefficient) and d (degree) are parameters of the kernel. This kernel maps the input features into a polynomial feature space.

​

2. Radial Basis Function(RBF) Kernel:

Screenshot 2024-04-27 at 6.05.37 PM.png

Where γ is a parameter that defines how much influence a single training example has. The larger  γ is, the closer other examples must be to affect the model.

​

EXAMPLE OF A POLYNOMIAL KERNEL WITH r=1 AND d=2

​

Screenshot 2024-04-27 at 6.15.23 PM.png
30927SVM 1.png
bottom of page