Regression vs. Machine Learning

What is the difference between other linear regression methods (like ordinary least squares) and machine learning? At what point is a statistical method called ‘machine learning’ instead of being called a statistical method?

If given the same data, would an OLS linear regression and a machine learning linear regression return different parameters?

1 Like

Hey there! This is a great question. I think at some point in our data science journeys we have all pondered the question of how and where statistics and machine learning differ. But before getting at that conundrum, I would love to answer the question on linear regression:

The OLS method gets us the analytical solution for the linear regression problem. But as the problem scales (i.e. more predictor variables or larger datasets), it becomes computationally harder to calculate the coefficients. One way to understand this would be to consider how big the design matrices start to look with tens of thousands of rows of data and the time it would take for a computer to calculate inverses, transposes, etc.
Here’s where gradient descent (and other optimization algorithms) come in. These are approximation methods to get to the coefficients in the linear regression problem faster. A gradient descent algorithm applied to the linear regression problem with the right learning rate should converge to the same solution as OLS, but faster.

Algorithms like gradient descent can handle loss functions other than mean squared error as well. Machine learning models like logistic regression or support vector machines do not have a closed-form solution like linear regression in which case approximate methods are all we have to try and find the coefficients that best model a problem.

Getting back to the ML vs Stats question, linear regression is a statistical model. But it is also a supervised machine learning model as it is a predictive model whose parameters are learned (by minimizing a loss function) from training data and applied to predict the target variable on test/validation/unknown data. OLS method provides an analytical solution for this purpose while the gradient descent algorithm (approximately) gets the best parameters much faster. (contd. below!)

4 Likes

I consulted a colleague who is well-versed in both statistics and ML to get a second opinion on the broader question and they had the following nuanced summary : :slight_smile:

Statistics and machine learning are not mutually exclusive (in other words, most “machine learning models” are also “statistical models”). However, as modern companies (and other entities) collect more and more data and want to analyze/leverage it, data scientists/statisticians/people-who-work-with-data have reached some of the limitations of traditional statistical methods and models. Luckily, advances in computer science and technology have enabled researchers to discover new algorithms and methods for fitting these models. When people use terms like “machine learning” and “data science” they are often referring to these newer algorithms, but the truth is that it’s all a natural extension of statistics to leverage advances in computer science and deal with big data (and a more diverse set of questions/goals).

That said, some people like to make a distinction between data analysis and predictive modeling/machine learning. Many statistical/ML models can be used for both purposes. For example, a linear regression model could be used to predict the amount of money that a particular customer will spend at a restaurant. In this context, it is a “machine learning model” because we are using it to train a computer to predict how much money someone will spend without explicitly telling the computer how to make that prediction. However, the same model can also be used to figure out (“analyze”) which customer attributes are most associated with spending. In that context, we might think of the regression model as a “data analysis technique”. However, the truth is that this step (of inspecting the model and understanding how it works) is an important part of machine learning/prediction anyways (to figure out ways that the model/data might be biased/inaccurate), so it’s kind of silly to try to draw lines in the sand.

4 Likes

Thank you for writing such a thorough & thoughtful response, I really appreciate it. This answers a few big questions for me and I was having difficulty parsing an answer from other resources I’ve looked through.

Is there a section the design matrices article you linked to is a part of? Getting to know the math better does a lot for my understanding of the material, and I don’t think it was in the Data Science career path (which I’m currently working through).

The stats/ML answer is really helpful. I’ve seen many articles where they’re used synonymously and then non-synonymously, so knowing the overlap and slight differences clears a lot of things up. I also found this video lecture from Andreas Müller helpful on explaining the slight distinction between the two, with machine learning being more focused on prediction and statistics focusing more on inference. (He’s the writer of Introduction to Machine Learning with Python from O’Reilly.)

The answer on OLS makes things much clearer, too. Having done a few OLS regressions in college with only ~250 observations and never more than 12 features, it was hard to wrap my head around the loss function and learning rates, which I’ve never encountered before.

Thank you again! This answer really helped. :slightly_smiling_face:

2 Likes

Sure thing, I’m glad it helped! :slight_smile:

So the matrix representation article is not part of the Data Scientist Career Path at the moment. It exists within the linear regression course within the Master Statistics with Python Skill Path. I’d highly recommend it for learning summary statistics, hypothesis testing, inference and linear regression models. If you’d just like a deep dive into Linear Regression, the Linear Regression in Python course is most suited for that.

1 Like

Thanks! I didn’t realize Master Stats with Python was an available skill path, I’m going to look through some of the lessons there as well to supplement the Data Science path. Much appreciated!