FAQ: The Data Science Process - Communicating Findings

This community-built FAQ covers the “Communicating Findings” exercise from the lesson “The Data Science Process”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Code Foundations

FAQs on the exercise Communicating Findings

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

What happened to the population numbers 100K, 1M, 2M, 4M and 8M in the second and third graphs? Why did it go back to 0.0, 0.2, 0.4, 0.6 and 0.8?

3 Likes

Same thing happened here.
Also the labels in the last table don’t change for Population (ax.set_xlabel(“City Population”)) and Age
(ax.set_ylabel(“User Age”)).

Does anyone know why?
Thanks in advance

Hello! This seemed to work better for me.

Paste code to change the figure style and palette:

plt.close()

sns.set_style(“darkgrid”)
sns.set_palette(“bright”)
sns.despine()

sns.regplot(x=“population_proper”, y=“age”, data=new_df)
ax = plt.subplot(1, 1, 1)
ax.set_xticks([100000, 1000000, 2000000, 4000000, 8000000])
ax.set_xticklabels([“100k”, “1m”, “2m”,“4m”, “8m”])
plt.show()

Paste code to title the axes and the plot:

plt.close()

sns.regplot(x=“population_proper”, y=“age”, data=new_df)
ax = plt.subplot(1, 1, 1)
ax.set_xticks([100000, 1000000, 2000000, 4000000, 8000000])
ax.set_xticklabels([“100k”, “1m”, “2m”,“4m”, “8m”])

ax.set_xlabel(“City Population”)
ax.set_ylabel(“User Age”)
plt.title(“Age vs Population”)

plt.show()

1 Like

Did anyone else think the story told by the histogram and scatter plots was conflicting?

In the scatter plot the trend line showed a correlation between increasing age and increasing population size, but the histograms definitively showed that the mean age of people living in lower population (rural) areas was higher compared to those in higher (urban) population settings.

3 Likes

I was wondering where I might be able to download the csv files used in the exercises, so I could play with the data using Jupyter notebooks and try and replicate the results. Thanks!

1 Like

Understanding this is 2 years old and there may not be a response to my reply, I agree. Isn’t the data initially implying the older the age the more rural the location? Shouldn’t the population of a city decrease with a user’s age? The scatter plot shows a trend line of slightly older folks living in larger populations. Whereas, the histogram showed the opposite.

You have to take into consideration the purpose of the two different charts and the outliers within the data.

For the histogram and box charts the rural area is defined as anything less than 100,000 people and urban is anything greater. Therefore If you look in the plot from this exercise you’d note that in the line representing 100k there are a few outlier areas where the age is rather high and it drags the average for area to be overall just over 30. This is shown in the violin and histograms from earlier exercises.

Now, if you factor in all of the urban areas, every place except that one line on the plot where the population is 100k, the average for all of those is 29ish.

But this plot is not showing the average for all of those individual places, it breaks them down into population based segments, averages for each of those points.

Take into account just the points for the cities where the population is over 8m. It’s a rather high average age there, though it’s only a few points.

The regression line is an attempt to try and fit a straight line from left to right with the least amount of error between the line and the means for each of those points. Clearly at each point there is a margin between what the average is and where the line runs but given the average at 2m 4m and 8m are a fair bit higher than elsewhere, the line is trending up as the city population increases.

Going back to the original point, if we had rural, suburban, and urban, where subruban was say 100k to 2m people, you’d find that the mean age in rural would be 31ish, the mean age in suburban may be 28ish and then the mean for urban, with the new definition of cities with more than 2m people, might be 33

After writing the code there was no change at all, it still remained the same . Did anyone have same challenge?