Yelp Linear Regression

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
x = df[['stars']]
y = df[['alcohol?', 'good_for_kids', 'has_bike_parking', 'has_wifi',
       'price_range', 'review_count', 'stars', 'take_reservations',
       'takes_credit_cards', 'average_review_age_x', 'average_review_length_x',
       'average_review_sentiment_x', 'number_funny_votes_x',
       'number_cool_votes_x', 'number_useful_votes_x', 'average_review_age_y',
       'average_review_length_y', 'average_review_sentiment_y',
       'number_funny_votes_y', 'number_cool_votes_y', 'number_useful_votes_y',
       'average_number_friends', 'average_days_on_yelp', 'average_number_fans',
       'average_review_count', 'average_number_years_elite',
       'weekday_checkins', 'weekend_checkins', 'average_tip_length',
       'number_tips', 'average_caption_length', 'number_pics']]
ols = LinearRegression()
model =,y)

I have created a dataframe using Yelp data called df.

My question is: what is the difference between printing model.coef_ and df.corr() ?

I don’t really understand what the difference is between these 2 methods.

Also if I use model.coef_ or corr() and I choose some items that have a high number how come sometimes the Rsquared is not high?

The Rsquared is higher for some independent variables that have a lower coefficient than the independent variables that have a higher coefficient.

Isn’t that counter intuitive? Someone told me it could be because the data is not normalized, but I don’t think I’ve seen anywhere in the linear regression lesson about normalizing data. So I don’t know when to do it or even how to do it.