Test_train_split error:Found input variables with inconsistent numbers of samples

I try to do the linear regression but when I used the test_train_split to create a set of training data. It went wrong. But when I eliminate the function, only use the origin one without splitting. It went well. I don’t understand why the error comes and how to fix it. The code and the origin data set look like this:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
x = visual_data.loc[:,('Relative Humidity AVG', 'Solar Radiation AVG', 'Temperature AVG',  'Wind Speed Daily AVG')]#loc vs iloc:We must convert the boolean Series into a numpy array. loc gets rows (or columns) with particular labels from the index. iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
y = visual_data.loc[:,('pecentage_of_success')]
print(x.shape)
print(y.shape)
x_train, y_train, x_test, y_test = train_test_split(x,y,test_size=0.2,random_state = 0)
print(x_train.shape)
print(y_train.shape)
linreg = LinearRegression()
model = linreg.fit(x_train,y_train)


Result:
(464, 4)
(464,)
(371, 4)
(93, 4)
---> 13 model = linreg.fit(x_train,y_train)
ValueError: Found input variables with inconsistent numbers of samples: [371, 93]

But when I use the following. No error:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
x = visual_data.loc[:,('Relative Humidity AVG', 'Solar Radiation AVG', 'Temperature AVG',  'Wind Speed Daily AVG')]#loc vs iloc:We must convert the boolean Series into a numpy array. loc gets rows (or columns) with particular labels from the index. iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
y = visual_data.loc[:,('pecentage_of_success')]
print(x.shape)
print(y.shape)
x_train, y_train, x_test, y_test = train_test_split(x,y,test_size=0.2,random_state = 0)
print(x_train.shape)
print(y_train.shape)
linreg = LinearRegression()
model = linreg.fit(x,y)

And I look at others’code, they have a use the train_test_split to split their same length set it goes well. I am really confused now.
You must select a tag to post in this category. Please find the tag relating to the section of the course you are on E.g. loops, learn-compatibility

When you ask a question, don’t forget to include a link to the exercise or project you’re dealing with!

If you want to have the best chances of getting a useful answer quickly, make sure you follow our guidelines about how to ask a good question. That way you’ll be helping everyone – helping people to answer your question and helping others who are stuck to find the question and answer! :slight_smile:

Can someone help me please? thank you

The length of x and y are exact the same. However when I use train_test_split to split it, it somehow become different. why?
I am so desperate now

If you need the original data csv I can send them by email

That is such a silly mistake… I figure it out now thank you guys