Roller Coaster Challenge Project (Python, Pandas)

Congratulations on completing your project!

Compare your project to our solution code and share your project below! Your solution might not look exactly like ours, and that’s okay! The most important thing right now is to get your code working as it should (you can always refactor more later). There are multiple ways to complete these projects and you should exercise your creative abilities in doing so.

This is a safe space for you to ask questions about any sample solution code and share your work with others! Simply reply to this thread to get the conversation started. Feedback is a vital component in getting better with coding and all ability levels are welcome here, so don’t be shy!

About community guidelines: This is a supportive and kind community of people learning and developing their skills. All comments here are expected to keep to our community guidelines


How do I share my own solutions?

  • If you completed the project off-platform, you can upload your project to your own GitHub and share the public link on the relevant project topic.
  • If you completed the project in the Codecademy learning environment, use the share code link at the bottom of your code editor to create a gist, and then share that link here.

Do I really need to get set up on GitHub?
Yes! Both of these sharing methods require you to get set up on GitHub, and trust us, it’s worth your time. Here’s why:

  1. Once you have your project in GitHub, you’ll be able to share proof of your work with potential employers, and link out to it on your CV.
  2. It’s a great opportunity to get your feet wet using a development tool that tech workers use on the job, every day.

Not sure how to get started? We’ve got you covered - read this article for the easiest way to get set up on GitHub.

Best practices for asking questions about the sample solution

  • Be specific! Reference exact line numbers and syntax so others are able to identify the area of the code you have questions about.
2 Likes

Can you provide solutions to the last part of the challenge:
What roller coaster seating type is most popular?
And do different seating types result in higher/faster/longer roller coasters?

Do roller coaster manufacturers have any specialties
(do they focus on speed, height, seating type, or inversions)?

Do amusement parks have any specialties?

3 Likes

import pandas as pd
import matplotlib.pyplot as plt

steel = pd.read_csv(‘Golden_Ticket_Award_Winners_Steel.csv’)
wood = pd.read_csv(‘Golden_Ticket_Award_Winners_Wood.csv’)

#print(wood[wood[‘Name’] == ‘Boulder Dash’])

Write a function to plot rankings over time for 1 roller coaster here:

def rank_year (name, park):
dfwood = wood[(wood[‘Name’] == name) & (wood[‘Park’] == park)]
plt.plot(dfwood[‘Year of Rank’], dfwood[‘Rank’],)
plt.ylabel(‘Rank’)
plt.xlabel(‘Year’)
plt.legend([name], loc = 1)
plt.show()

#print(rank_year(‘El Toro’, ‘Six Flags Great Adventure’))

Write a function to plot rankings over time for 2 roller coasters here:

def rank_year2 (name1, name2, park1, park2):
dfwood1 = wood[(wood[‘Name’] == name1) & (wood[‘Park’] == park1)]
dfwood2 = wood[(wood[‘Name’] == name2) & (wood[‘Park’] == park2)]
ay= plt.subplot()
plt.plot(dfwood1[‘Year of Rank’], dfwood1[‘Rank’])
plt.plot(dfwood2[‘Year of Rank’], dfwood2[‘Rank’])
plt.ylabel(‘Rank’)
plt.xlabel(‘Year’)
plt.legend([name1, name2], loc = 1)
ay.set_yticks([1, 2, 3, 4])
plt.show()

#print(rank_year2(‘El Toro’, ‘Boulder Dash’, ‘Six Flags Great Adventure’, ‘Lake Compounce’))

Write a function to plot top n rankings over time here:

def top_ranking(df,n):

top = df[df[‘Rank’] <= n]
fig, ax = plt.subplots(figsize=(10,10))
for coaster in set(top[‘Name’]):
coaster_rankings = top[top[‘Name’] == coaster]
ax.plot(coaster_rankings[‘Year of Rank’],coaster_rankings[‘Rank’],label=coaster)
ax.set_yticks([i for i in range(1,6)])

plt.title(“Top 10 Rankings”)
plt.xlabel(‘Year’)
plt.ylabel(‘Ranking’)
plt.legend(loc=4)
plt.show()

#print(top_ranked(5, wood))

Load roller coaster data here:

coasters = pd.read_csv(‘roller_coasters.csv’)
#print(coasters.info())

Write a function to plot histogram of column values here:

def hist_roller(df, column):
plt.hist(df[column])
legend = [column]
plt.legend(legend)
plt.xlabel(column)
plt.ylabel(‘Number of Roller Coasters’)
plt.show()
#print(hist_roller(coasters, ‘speed’))

Write a function to plot inversions by coaster at a park here:

def bar_park(df, park):
park_df = df[df[‘park’] == park]
roller_coaster = park_df[‘name’]
inversions = park_df[‘num_inversions’]
plt.figure(figsize = (20, 15))
ax = plt.subplot()
ay = plt.subplot()
plt.bar(range(len(roller_coaster)), inversions)
ax.set_xticks(range(len(roller_coaster)))
ax.set_xticklabels(roller_coaster)
plt.xticks(rotation=45)
plt.legend([park])
plt.show()

#print(bar_park(coasters, ‘Walibi Belgium’))

Write a function to plot pie chart of operating status here:

def pie(coasters):
df_operating = coasters[coasters[‘status’] == ‘status.operating’]
df_closed = coasters[coasters[‘status’] == ‘status.closed.definitely’]
count = [len(df_operating), len(df_closed)]
labelsdata = [‘Operating’, ‘Closed’]
plt.pie(count, autopct=’%0.1f%%’, labels = labelsdata)
plt.axis(‘equal’)
plt.show()

#print(pie(coasters))

Write a function to create scatter plot of any two numeric columns here:

def scatter(df, column1, column2):
c1 = df[column1]
c2 = df[column2]
x = range(len(df))
plt.figure(figsize=(20, 20))
ax = plt.subplot()
plt.scatter(x, c1, color= ‘blue’, alpha= 0.5)
plt.scatter(x, c2, color=‘green’, alpha=0.5)
ax.set_xlabel(‘Variables’)
ax.set_ylabel(‘Roller Coasters’)
plt.ylim(0, 200)
plt.legend([column1, column2])
plt.show()

#print(scatter(coasters, ‘speed’, ‘height’))

3 Likes

Finally finished my first project in Codecademy!

Here is my solution:

3 Likes

Hello, this particular project has given me a headache. Just used your code to understand whats happening.

This is my solution for the Roller Coaster project. I did try to answer the extra questions.
(This is my first time trying GitHub, so bear with me. )
Roller Coaster Project

2 Likes

here is my code, I used Jupyter notebooks (open the link )

1 Like

I like that you put it in a Jupyter Notebook so that we could see the graphs. I cannot figure out a way to show the graphs in GitHub.

Anyhow, what is this maddening warning?
“C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.”
I kept getting it in my output too! I finally found that moving the calls to subplot above calls to plot helped. Also calling plt.clf() at the end of every function. I do not understand why this keeps the warning at bay. It was driving me crazy!

Hello,
You can see the github repository where it says “show original” anyways, here is the link.
Regarding the warning: you can see the explanation here.
hope it helps!

1 Like

Here’s my solution:

hello i run your code and it gives me the following error
max must be larger than min in range parameter.

Hi Everyone,

I don’t quiet understand the the syntax for " function to plot top n rankings over time" below

for coaster in set(top_n_rankings[‘Name’]):
coaster_rankings = top_n_rankings[top_n_rankings[‘Name’] == coaster]
ax.plot(coaster_rankings[‘Year of Rank’],coaster_rankings[‘Rank’],label=coaster)

Why do we use SET here? or what’s the structure of the syntax whenever I use ‘SET’

Regards

2 Likes

I am receiving a NameError: coaster_rankings not defined. why?

this my code:
def plot_ranking_two(coaster_name1,coaster_name2,park_name1,park_name2,df_ranking):
coaster_rankings1= df_ranking[(df_ranking[‘Name’] ==coaster_name1) & (df_ranking[‘Park’]==park_name1)]
coaster_rankings2 = df_ranking[ (df_ranking[‘Name’] == coaster_name2) & (df_ranking[‘Park’] == park_name2) ]
fig,ax = plt.subplots()
ax.plot(coaster_rankings1[‘Year of Rank’],coaster_rankings1[‘Rank’],color=‘green’,label=coaster_name1)
ax.plot(coaster_rankings2[‘Year of Rank’],coaster_rankings2[‘Rank’],color=‘red’,label=coaster_name2)
ax.invert_yaxis()
plt.title("{} vs {} Rankings".format(coaster_name1,coaster_name2))
plt.xlabel(“Year”)
plt.ylabel(“Ranking”)
plt.show()
plot_ranking_two(‘El Toro’,‘Six Flags Great Adventure’,‘Boulder Dash’,‘Lake Compounce’,wood)

Hello! Here is my solution. Cheers!

1 Like

Here is my solution!
Cheers!

Note: I didn’t do the answers part.

My understanding is the set() command transforms the data into a list in which you can more easily manipulate in your function as opposed to a Dataframe column. I think the list() command achieves the same purpose.

1 Like

Hi all, code below. I also answered 1 out of the 3 bonus questions at step 11. Have not compared my code to the sample code yet, but I think the functions are work properly.

Thanks.

I made it in Jupyter Notebook - for me it’s most comfy for data science projects.

ranks

Roller Coaster

https://github.com/Goananda/Codecademy-project-Roller-Coaster/blob/master/Roller%20Coaster.ipynb

2 Likes

There is a defect in decision for point 5, where we should write a function that will plot the ranking of the top n ranked roller coasters over time as lines.
If there are two roller coasters with the same name, but different park in top n rankings - function counts them as same roller coasters. In general, we should give them different names, before count. But it’s much more harder to do than in points 3 and 4…

2 Likes

I agree.

In addition, I found that the base needs cleaning before analisys: there are some differences in the spelling of the names of coasters and parks, for example:
‘Intimidator-305’ / ‘Intimidator 305’, ‘Beast’ / ‘The Beast’.

2 Likes