Raj Bhuptani's answer to What is the difference between linear regression and least squares?

What is the difference between linear regression and least squares?

Harvard '13 (Statistics), Two Sigma Investments · Author has 178 answers and 1.9M answer views · 11y ·

In common parlance, they seem to be used interchangeably. But more precisely, least squares is a method for performing linear regression.

For now*, we can think of linear regression as the task of fitting a straight line (or, in the case of multiple linear regression, a "hyperplane") through a set of points. But there are many possible strategies to fit a line through a set of points:

you could take the leftmost point and the rightmost point and draw a line between them
you could compute the slopes of the lines connecting each pair of points and calculate the average slope, drawing a line with this slope that passes through the point at the average of the "x values" and the average of the "y values"
you could find the line for which there are an equal number of points above the line and below the line
you could draw a line, and then for each of the data points, measure the vertical distance between the point and the line, and add these up; the fitted line would be the one where this sum of distances is as small as possible
you could draw a line, and then for each of the data points, measure the vertical distance between the point and the line, square it, and add these up; the fitted line would be the one where this sum of distances is as small as possible

The last strategy is called "ordinary least squares" (because you are trying to minimize the sum of squared prediction errors) and it is the most commonly used (wondering why least squares is most commonly used? see * below) . But as far as fitting a line through a set of points goes, any of the other strategies are equally valid. The first three strategies I made up as examples and probably do not perform well, but the fourth one is a real strategy called "least absolute deviations" that some people prefer over least squares.

* I want to clear up an important distinction that I glossed over in the above:

First, precisely speaking there is a difference between linear regression and curve fitting. Everything I said above is actually talking about curve fitting, not linear regression: you have a set of points, and you want to draw a curve (line) through them that fits as well as possible. This is a purely geometric problem.
The x and y axes have no interpretation, and the "data" are just points in Cartesian space.

On the other hand, linear regression is a statistical inference problem. The "y values" take on the interpretation of data you wish to model, and the "x values" take on the interpretation of extra information you have about each data point that might be helpful in predicting their "y values". You are trying to build a probabilistic model that describes "y" while taking into account "x", and a linear model is one of many ways to do this. A linear model assumes that "y" has a different mean for each possible value of "x", and that these means happen to follow a straight line with a certain intercept and a certain slope. As with any statistical inference problem, you estimate the unknown parameters using maximum likelihood estimation. But since in this case the unknown parameters are an intercept and a slope, the end result of maximum likelihood estimation is basically that you are choosing a straight line that fits the observed data best, so this essentially becomes the curve fitting problem discussed above.

Now we arrive at the question of why least squares, of all possible curve fitting methods, is so commonly used. The reason is that when solving the statistical linear regression problem, a very common modeling assumption is that for every possible value of "x", the quantity "y" is normally distributed with a mean that is linear in "x". Therefore, the likelihood function is essentially a product of PDFs of the normal distribution. As stated above, you estimate the unknown parameters (and therefore find the best fitting line) by maximizing the likelihood function. If you look at what the product of normal PDFs looks like, you will notice that maximizing this expression happens to be equivalent to... you guessed it... minimizing the sum of squared errors!

That is, the line you get performing curve fitting via least squares is equivalent to the line you get performing linear regression using a normal model.

128.5K views ·

View upvotes

View 1 share

1 of 16 answers

Something went wrong. Wait a moment and try again.

View 15 other answers to this question

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·