Yelp Regression

Hi all,

I’m taking the Basics of Machine Learning in Python course, but got stuck on the project for Yelp Regression, specifically on creating the training and testing datasets. I know there’s a solution for it as well, but although my code is the same as in the solution, I can’t get it to work.
This is how I selected the necessary columns & created a training & test set

ratings = df['stars']
features = df[['average_review_length', 'average_review_age']]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(ratings, features, test_size=0.2, random_state=1)

When I try to fit the model:

from sklearn.linear_model import LinearRegression
model = LinearRegression(), y_train)

I get a ValueError:

ValueError: Expected 2D array, got 1D array instead:
array=[3.  4.5 2.5 ... 2.  2.5 4. ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I’ve tried reshaping with the help of StackOverflow, but it doesn’t seem to work, the first try didn’t work because .reshape() gives an AttributeError when called on a dataframe. The second try, with a workaround for reshaping dataframes seems to work, but doesn’t give the correct results in the rest of the project.

My questions are

  1. Why do I have to reshape, while that’s not the case in the solution?
  2. What exactly does reshape do and why is it necessary?

Thanks a lot!

The error is that ratings is defined as a one dimensional array. In python data frames, getting features and coefficients and all that requires the syntax of df[['feature']] for example.