I’m having trouble interpreting the results from the Yelp Regression Project:
We are asked to analyse the correlation values of the data set then examine the independent variables that affect the ‘stars’ variable. Then we fit the model based on the selected independent variables with the highest correlation values, ‘average_review_length’ and ‘average_review_age’.
We are then asked to score the model on X_train and y_train then X_test and y_test however both resulting values (0.0825 and 0.0809 respectively) indicate that the model does not fit our data as they are nowhere near to 1 and much closer to 0. We then charge on and predict using this model but I don’t understand why when we’ve just demonstrated that the model predicts poorly.
Could someone explain this to me? Am I completely missing something?