A Basic understanding to Support Vector Machine
As we all know Machine Learning consists of three types of learning; Supervised learning, Unsupervised learning, Reinforcement learning, and
Support Vector Machine or SVM lies in the category of Supervised learning.
SVM is a kind of supervised learning used for Classification and Regression analysis.
To understand how it helps in Classification, let’s consider a simple data set where we have cats and fishes.
First, we segregate the data set into training and test. The training data which we have is labelled data.
(Labeled data is a designation for pieces of data that have been tagged with one or more labels identifying certain properties or characteristics, or classifications or contained objects.)
Our data must be labeled because SVM follows supervised learning.
The labeled data is assigned for training in order to build a model. Once the model is ready it has to be given for testing. In the testing phase, it has to predict.
Predict as in, when a test data or new data is given to the model, the model needs to predict or we can say, classify whether the data point belongs to the class of cat or class of fish.
If the new data is a cat, and when the new data is assigned to the model, the output must give the result as cat; provided if the model is accurate.
Plotting our data points from the training data set consisting of cats and fish will somehow look like this,
Now, our model needs to separate the data points or classify the data points into two classes namely cats and fish.
What’s an ideal way for separating the data points here? A simple straight line
This line is known as Decision Boundary or to say in a more generalized form, Hyperplane.
It is called as the Decision Boundary because it is kind of a boundary in between the two classes which decides whether the new data belongs to the class of cats or to the class of fish.
The next thing to do here is to find a data point in the considered class which is nearest to the opponent class.
For e.g. if our considered class is cat, we have to find a point in that particular class which is nearest to the class of fish(opponent class).
As we can see in the picture, this particular cat is nearest to the class of fish.
So, we will draw a line touching the point parallel to the Hyperplane.
Now, our considered class is fish, and we need to find a data point in this class, which is nearest to our opponent class(cat).
And we will draw another line parallel to the hyperplane parallel to this point.
These two parallel lines will generate two new distances, D- and D+.
The sum of D- and D+ will give us a new distance altogether called Margin.
Margin = (D-) + (D+)
The Margin plays a crucial role in deciding the Hyperplane.
After all this explanation, it is obvious to wonder that, what is this terminology “Support Vector” in this concept.
The points considered from the considered and opponent class to draw the lines parallel to the decision boundary are our Support Vectors.
Another question which can be raised is,
Why was the Hyperplane drawn in this tilted manner???
Our hyperplane was drawn with an intent to separate the two classes, and that could have been done in any ways, a horizontal line or a vertical line or any line for that matter.
The question here is why have we not chosen the other lines as our Hyperplane?
Here, another concept comes into play which is called as
Linearly Separable Data.
Cases where only a single straight line is enough for classifying the data points into two classes, when we plot our data points in a x-y plot, then that sort of data can be regarded as Linearly separable data.
So, when we have linearly separable data, then we can apply the Linear Support Vector Machine algorithm.
Non-linearly separable data
In this graph, the data points are plotted in a tangled manner, where a straight line is not enough to separate the data points into different classes. Or, we can say the data can’t be separated linearly.
A straight line can lead to Misclassification here as the data is Non-linear which can lead to a less accurate model.
Significance of Margin
We see two graphs here.
The only difference here is the alignment of the Hyperplane, i.e. the angle made by the Hyperplane or the Decision Boundary on the x-axis.
We can see the length of the margins as m1, m2.
And it can also be observed that
m2 > > m1
For better classification, which margin to choose?
The standardization is, the hyperplane to be considered and the subsequent margin generated by considering the nearest point to the opponent class should have the maximum width.
As m2 has the maximum width, we will use that respective hyperplane.
The maximum the width is, the better the model for future prediction.
In general, we say it as “Maximal Margin Hyperplane”.
Maximal margin hyperplane should get selected as it will increase the accuracy rates and decrease the error rate.