I am using linear regression to predict the winnings of a tennis player based off their number of wins. The predicted values variable for this is
winnings_predict. The actual values are stored in variable
Winnings When I do
.score(winnings_predict,Winnings) I get a value of -872039706.5177922.
This large negative value incidates that my predicted Winnings are very far off the actual winnings, right? If so, have I done something wrong? Can I change anything?
This is quite an open-ended question, but any help in the right direction would be so appreciated.
import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # load and investigate the data here: csv=pd.read_csv('tennis_stats.csv') df=pd.DataFrame(csv) Wins=df['Wins'] Wins=Wins.values.reshape(-1,1) Winnings=df['Winnings'] Winnings=Winnings.values.reshape(-1,1) # Dictionaries of the number of wins they have, and their earnings (winnings). regr = LinearRegression() regr.fit(Wins,Winnings) winnings_predict=regr.predict(Wins) # Here I performed Linear Regression, to predict the winnings from the number of wins each player has. plt.scatter(Wins,Winnings) plt.plot(Wins,winnings_predict) plt.xlabel('Wins') plt.ylabel('Winnings') plt.title('Wins vs Winnigns (with line of best fit)') plt.show() # Here I plotted the graph of wins vs winnings, with the line of best fit. R_value_predicted=regr.score(winnings_predict,Winnings) # >>> This is the relationship between the predicted winnings and the actual winnings. It's not very accurate. But why? <<< R_value_true=regr.score(Wins,Winnings) print(R_value_predicted) print(R_value_true)