https://www.codecademy.com/practice/projects/tennis-ace
I am using linear regression to predict the winnings of a tennis player based off their number of wins. The predicted values variable for this is winnings_predict
. The actual values are stored in variable Winnings
When I do .score(winnings_predict,Winnings)
I get a value of -872039706.5177922.
This large negative value incidates that my predicted Winnings are very far off the actual winnings, right? If so, have I done something wrong? Can I change anything?
This is quite an open-ended question, but any help in the right direction would be so appreciated.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# load and investigate the data here:
csv=pd.read_csv('tennis_stats.csv')
df=pd.DataFrame(csv)
Wins=df['Wins']
Wins=Wins.values.reshape(-1,1)
Winnings=df['Winnings']
Winnings=Winnings.values.reshape(-1,1)
# Dictionaries of the number of wins they have, and their earnings (winnings).
regr = LinearRegression()
regr.fit(Wins,Winnings)
winnings_predict=regr.predict(Wins)
# Here I performed Linear Regression, to predict the winnings from the number of wins each player has.
plt.scatter(Wins,Winnings)
plt.plot(Wins,winnings_predict)
plt.xlabel('Wins')
plt.ylabel('Winnings')
plt.title('Wins vs Winnigns (with line of best fit)')
plt.show()
# Here I plotted the graph of wins vs winnings, with the line of best fit.
R_value_predicted=regr.score(winnings_predict,Winnings)
# >>> This is the relationship between the predicted winnings and the actual winnings. It's not very accurate. But why? <<<
R_value_true=regr.score(Wins,Winnings)
print(R_value_predicted)
print(R_value_true)