FAQ: The Data Science Process - Modeling and Analysis

This community-built FAQ covers the “Modeling and Analysis” exercise from the lesson “The Data Science Process”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Code Foundations

FAQs on the exercise Modeling and Analysis

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

For some reason on " Modeling and Analysis" when I click on run the green check mark comes up the code is right but nothing shows up in the right windows where the code is suppose to run. Anyone else have this issue?

1 Like

I’m having the same issue right now. Almost a whole year later.

Hi, I get the meaning of the made model but find the x-axis labeling confusing:
it does not refer to the locations or the number of inhabitants by using 0, 0 til 0,8. I cannot see the reason for labeling it like that at all. Also it seems to be flipped in the way that the users mean age “increases” in rural areas according to the data…no?

1 Like

I Have the same confusion. did you figure it out?

It’s not the clearest graph if you’re unfamiliar with mathematical notations but the 1e7 indicates that the x-axis values should be multiplied by 1e7 (which is 1 x 10^7) or 10,000,000 (ten million). So the value of 0.8 would actually indicate a population of 8 million.

Perhaps the following adjusted formatting is clearer-
count

As for the order the population increases as you follow the chart along the x-axis from left to right. If you were to draw a line from the x-axis at 2 million it meets the solid blue fitted line at approximately 30 on the y-axis. If you did the same at 6 million it meets this line halfway between 30 and 35 on the y-axix (we’ll call it 32.5).

So the population of the users location increases (more urban, less rural) we seem to see an increase in user age. So not flipped (though it’s a rather sparse dataset for a strong conclusion).

6 Likes

thanks! I think the sparse dataset and the linear visualization also accounted for the misperception I had.

1 Like

Very useful. Thank you.

I am a bit confused by the choice of the x- and y-variable. I find it much more intuitive to suppose that the location of the users is dependent on their age. Doesn’t there arise the problem of reversed causality when running the regression the other way around?

As you rightfully point out there’s always that fine line of correlation vs. causation (or inverting the cause) but, in my opinion at least, this case you could arguably plot it either way round since it’s just correlation.

As we’re specifically looking at the question of whether or not age is dependent on population location the standard route to plot would be the proposed “independent” variable on the x-axis and the proposed “dependent” variable on the y-axis.

I do understand where you’re coming from though. An option which can help is to write it out as a sentence. From this graph I might say “the average age of a user in a urban area of 6 million people is 32.5 years old”. Flipping that on its head “users with an age of 32.5 most probably reside in an urban area of 6 million people” comes off as a little odd but you could probably re-phrase it to suit your needs.

It’s just a convention though, since there’s no obvious “a” causes “b” for urbanisation & age (without further input) I’d consider this graph on its own just a correlation so personally I’d plot it whichever way round supports the point you’re making.

1 Like

Thank you very much. Yes, I get your point!

1 Like

Do we know from this example if the correlation is statisfically significant? It’s not necessarily telling us what the p-value is etc.?

I seem to be having the same issue some other folks reported over a year ago. When I run the code it evaluates my input and gives a green check mark, but the window on the right stays empty. It does not show a scatterplot or anything. I’ve tried in Chrome and Edge, both updated.