 # Projet Tennis Ace - Linear Regression - Why are my predicted values so far off the actual values?

I am using linear regression to predict the winnings of a tennis player based off their number of wins. The predicted values variable for this is `winnings_predict`. The actual values are stored in variable `Winnings` When I do `.score(winnings_predict,Winnings)` I get a value of -872039706.5177922.

This large negative value incidates that my predicted Winnings are very far off the actual winnings, right? If so, have I done something wrong? Can I change anything?

This is quite an open-ended question, but any help in the right direction would be so appreciated.

``````import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# load and investigate the data here:

df=pd.DataFrame(csv)

Wins=df['Wins']
Wins=Wins.values.reshape(-1,1)

Winnings=df['Winnings']
Winnings=Winnings.values.reshape(-1,1)

# Dictionaries of the number of wins they have, and their earnings (winnings).

regr = LinearRegression()
regr.fit(Wins,Winnings)
winnings_predict=regr.predict(Wins)

# Here I performed Linear Regression, to predict the winnings from the number of wins each player has.

plt.scatter(Wins,Winnings)
plt.plot(Wins,winnings_predict)
plt.xlabel('Wins')
plt.ylabel('Winnings')
plt.title('Wins vs Winnigns (with line of best fit)')
plt.show()

# Here I plotted the graph of wins vs winnings, with the line of best fit.

R_value_predicted=regr.score(winnings_predict,Winnings)

# >>> This is the relationship between the predicted winnings and the actual winnings. It's not very accurate. But why? <<<
R_value_true=regr.score(Wins,Winnings)
print(R_value_predicted)
print(R_value_true)

``````