FAQ: Multiple Linear Regression - Rebuild the Model

This community-built FAQ covers the “Rebuild the Model” exercise from the lesson “Multiple Linear Regression”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Machine Learning

FAQs on the exercise Rebuild the Model

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Working on the Manhattan dataset - I can’t seem to get an R^2 value better than .8, no matter which variables I eliminate. Does someone have a best solution for this?

No. I’m having the same issue. Either something isn’t right with this one, or I really, really don’t understand what they’re trying to teach me.

the same issue, when I delete any columns from df, scores stay the same

Which command is for removing features?

You can drop features by doing:

x.drop([‘size_sqft’, ‘building_age_yrs’], axis=1)

But I also cannot get lower than 0.805

I find the whole concept of removing explanatory variables to increase overall accuracy baffling. If there is even the slightest correlation between a variable and an outcome, shouldn’t the inclusion of this variable by definition improve the accuracy of the overall fit?

Is this maybe the real lesson they are trying to teach here?

I checked the step where we ran the regression on the different columns…one thing that I was unsure about was the “binary” choices…i.e. dishwashed, yes or no…these were not as clear.

Now there were two that seemed to have a negative correlation to me , “min_to_subway” and “building_age_years”.

I just ran them all, using manhattan and found that only min_to_subway (.8085711291628321) really made a difference.

OK, seeing the question above…lower was the goal? I got the impression that higher meant a better correlation. No worries, continuing!