Prasoon Goyal's answer to What is the difference between Linear SVMs and Logistic Regression?

What is the difference between Linear SVMs and Logistic Regression?

PhD in AI, YouTuber · Author has 731 answers and 4.9M answer views · Updated 5y ·

Given a binary classification problem, the goal is to find the “best” line that has the maximum probability of classifying unseen points correctly. How you define this notion of “best” gives you different models like SVM and logistic regression (LR).

In SVM, line [math]\ell_1[/math] is better than line [math]\ell_2[/math] if the “margin” of [math]\ell_1[/math] is larger, that is, it is farther from both classes. In LR, a line [math]\ell[/math] defines a probability distribution over the input space. Line [math]\ell_1[/math] is better than line [math]\ell_2[/math] if the the distribution defined by [math]\ell_1[/math] is low at class [math]-1[/math] points and high at class [math]+1[/math] points on average, compared to the distribution defined by [math]\ell_2[/math].

This definition of “best” results in different loss functions. If you look at the optimization problems of linear SVM and (regularized) LR, they are very similar:

[math]\min_{w} \lambda \| w\|^2 + \sum_{i} \max\{0, 1 - y_{i} w^Tx_{i}\}[/math]

[math]\min_{w} \lambda \| w\|^2 + \sum_{i} \log(1 + \exp(1 -y_{i} w^Tx_{i}))[/math]

That is, they only differ in the loss function — SVM minimizes hinge loss while logistic regression minimizes logistic loss.

Let’s take a look at the loss functions:

There are 2 differences to note:

Logistic loss diverges faster than hinge loss. So, in general, it will be more sensitive to outliers.
Logistic loss does not go to zero even if the point is classified sufficiently confidently. This might lead to minor degradation in accuracy.

So, you can typically expect SVM to perform marginally better than logistic regression.

Some other points of comparison:

Logistic regression has a probabilistic interpretation. So LR can be integrated into other probabilistic frameworks much more seamlessly than SVMs.
While both models can be “kernelized”, SVM leads to sparser solutions due to complementary slackness.
SVM has a very efficient SMO algorithm for optimizing the kernelized model. Further, there is LibSVM, an implementation of SMO, that allows training non-linear SVMs very easily.

(Image source: Loss Functions for Ordinal regression)

27.3K views ·

View upvotes

View 1 share

1 of 15 answers

Something went wrong. Wait a moment and try again.

View 14 other answers to this question

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·