Trouble understanding the function in the Roller Coaster project

Hi there!

Trying to wrap my head around functions.
In the Roller Coaster project https://www.codecademy.com/practice/projects/roller-coaster, question 3. It is asked to ‘Write a function that will plot the ranking of a given roller coaster over time as a line. Your function should take a roller coaster’s name and a ranking DataFrame as arguments.’

The solution code shows the rows were selected based on requirements, which is fine. And this selection was stored in a variable ‘coaster_rankings’ which is fine too.

However I don’t manage to understand why we use “coaster_rankings[‘Year of Rank’]” as x-values to plot the results (or the same question for “coaster_rankings[‘Rank’]” as y-values) and not rankings_df[‘Year of Rank’]?
How does it allow it ?
Does creating a variable where rows are selected allow to still use the DataFrame it is coming from?

def plot_coaster_ranking(coaster_name, park_name, rankings_df):
coaster_rankings = rankings_df[(rankings_df[‘Name’] == coaster_name) & (rankings_df[‘Park’] == park_name)]
fig, ax = plt.subplots()
ax.plot(coaster_rankings[‘Year of Rank’], coaster_rankings[‘Rank’])
ax.set_xticks(coaster_rankings[‘Year of Rank’].values)
ax.set_yticks(coaster_rankings[‘Rank’].values)
ax.invert_yaxis()
plt.xlabel(‘Year’)
plt.ylabel(‘Ranking’)
plt.title("{} Rankings".format(coaster_name))
plt.show()
plot_coaster_ranking(‘El Toro’, ‘Six Flags Great Adventure’, df_wood)

Thanks a lot for your help!

1 Like

I think it’s because rankings_df contains the whole dataset and the coaster we selected may not appear in every years. For example, the whole dataset rankings_df contains ranking from 2013-2018. What if a roller coaster was built in 2015? It won’t appear in 2013 and 2014. In this case, if you use rankings_df as x-value, which contains 6 data, but the selected coaster only has 4 data (2015-2018). I think that may bring you an error and it’s also meaningless to show the years with no data.

However, I guess you can still show those years as x-value by using range(len(rankings_df[‘Year of Rank’])) then rename tickslabels if you really want to see every years on x-value. I didn’t try this way so I am not very sure.