 # Yelp Linear Regression

``````from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
x = df[['stars']]
y = df[['alcohol?', 'good_for_kids', 'has_bike_parking', 'has_wifi',
'price_range', 'review_count', 'stars', 'take_reservations',
'takes_credit_cards', 'average_review_age_x', 'average_review_length_x',
'average_review_length_y', 'average_review_sentiment_y',
'average_number_friends', 'average_days_on_yelp', 'average_number_fans',
'average_review_count', 'average_number_years_elite',
'weekday_checkins', 'weekend_checkins', 'average_tip_length',
'number_tips', 'average_caption_length', 'number_pics']]
ols = LinearRegression()
model = ols.fit(x,y)
print(model.coef_)
df.corr()
``````

I have created a dataframe using Yelp data called df.

My question is: what is the difference between printing model.coef_ and df.corr() ?

I don’t really understand what the difference is between these 2 methods.

Also if I use model.coef_ or corr() and I choose some items that have a high number how come sometimes the Rsquared is not high?

The Rsquared is higher for some independent variables that have a lower coefficient than the independent variables that have a higher coefficient.

Isn’t that counter intuitive? Someone told me it could be because the data is not normalized, but I don’t think I’ve seen anywhere in the linear regression lesson about normalizing data. So I don’t know when to do it or even how to do it.