Hi all,
I’m taking the Basics of Machine Learning in Python course, but got stuck on the project for Yelp Regression, specifically on creating the training and testing datasets. I know there’s a solution for it as well, but although my code is the same as in the solution, I can’t get it to work.
This is how I selected the necessary columns & created a training & test set
ratings = df['stars']
features = df[['average_review_length', 'average_review_age']]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(ratings, features, test_size=0.2, random_state=1)
When I try to fit the model:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
I get a ValueError:
ValueError: Expected 2D array, got 1D array instead:
array=[3. 4.5 2.5 ... 2. 2.5 4. ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I’ve tried reshaping with the help of StackOverflow, but it doesn’t seem to work, the first try didn’t work because .reshape() gives an AttributeError when called on a dataframe. The second try, with a workaround for reshaping dataframes seems to work, but doesn’t give the correct results in the rest of the project.
My questions are
- Why do I have to reshape, while that’s not the case in the solution?
- What exactly does reshape do and why is it necessary?
Thanks a lot!