Roller Coaster Challenge Project (Python, Pandas)

annajiali · January 18, 2020, 7:24pm

Congratulations on completing your project!

Compare your project to our solution code and share your project below! Your solution might not look exactly like ours, and that’s okay! The most important thing right now is to get your code working as it should (you can always refactor more later). There are multiple ways to complete these projects and you should exercise your creative abilities in doing so.

This is a safe space for you to ask questions about any sample solution code and share your work with others! Simply reply to this thread to get the conversation started. Feedback is a vital component in getting better with coding and all ability levels are welcome here, so don’t be shy!

About community guidelines: This is a supportive and kind community of people learning and developing their skills. All comments here are expected to keep to our community guidelines

How do I share my own solutions?

If you completed the project off-platform, you can upload your project to your own GitHub and share the public link on the relevant project topic.
If you completed the project in the Codecademy learning environment, use the share code link at the bottom of your code editor to create a gist, and then share that link here.

Do I really need to get set up on GitHub?
Yes! Both of these sharing methods require you to get set up on GitHub, and trust us, it’s worth your time. Here’s why:

Once you have your project in GitHub, you’ll be able to share proof of your work with potential employers, and link out to it on your CV.
It’s a great opportunity to get your feet wet using a development tool that tech workers use on the job, every day.

Not sure how to get started? We’ve got you covered - read this article for the easiest way to get set up on GitHub.

Best practices for asking questions about the sample solution

Be specific! Reference exact line numbers and syntax so others are able to identify the area of the code you have questions about.

mahak_gupta · February 6, 2020, 11:55am

Can you provide solutions to the last part of the challenge:
What roller coaster seating type is most popular?
And do different seating types result in higher/faster/longer roller coasters?

Do roller coaster manufacturers have any specialties
(do they focus on speed, height, seating type, or inversions)?

Do amusement parks have any specialties?

design4171870240 · February 16, 2020, 3:00pm

import pandas as pd
import matplotlib.pyplot as plt

steel = pd.read_csv(‘Golden_Ticket_Award_Winners_Steel.csv’)
wood = pd.read_csv(‘Golden_Ticket_Award_Winners_Wood.csv’)

#print(wood[wood[‘Name’] == ‘Boulder Dash’])

Write a function to plot rankings over time for 1 roller coaster here:

def rank_year (name, park):
dfwood = wood[(wood[‘Name’] == name) & (wood[‘Park’] == park)]
plt.plot(dfwood[‘Year of Rank’], dfwood[‘Rank’],)
plt.ylabel(‘Rank’)
plt.xlabel(‘Year’)
plt.legend([name], loc = 1)
plt.show()

#print(rank_year(‘El Toro’, ‘Six Flags Great Adventure’))

Write a function to plot rankings over time for 2 roller coasters here:

def rank_year2 (name1, name2, park1, park2):
dfwood1 = wood[(wood[‘Name’] == name1) & (wood[‘Park’] == park1)]
dfwood2 = wood[(wood[‘Name’] == name2) & (wood[‘Park’] == park2)]
ay= plt.subplot()
plt.plot(dfwood1[‘Year of Rank’], dfwood1[‘Rank’])
plt.plot(dfwood2[‘Year of Rank’], dfwood2[‘Rank’])
plt.ylabel(‘Rank’)
plt.xlabel(‘Year’)
plt.legend([name1, name2], loc = 1)
ay.set_yticks([1, 2, 3, 4])
plt.show()

#print(rank_year2(‘El Toro’, ‘Boulder Dash’, ‘Six Flags Great Adventure’, ‘Lake Compounce’))

Write a function to plot top n rankings over time here:

def top_ranking(df,n):

top = df[df[‘Rank’] <= n]
fig, ax = plt.subplots(figsize=(10,10))
for coaster in set(top[‘Name’]):
coaster_rankings = top[top[‘Name’] == coaster]
ax.plot(coaster_rankings[‘Year of Rank’],coaster_rankings[‘Rank’],label=coaster)
ax.set_yticks([i for i in range(1,6)])

plt.title(“Top 10 Rankings”)
plt.xlabel(‘Year’)
plt.ylabel(‘Ranking’)
plt.legend(loc=4)
plt.show()

#print(top_ranked(5, wood))

Load roller coaster data here:

coasters = pd.read_csv(‘roller_coasters.csv’)
#print(coasters.info())

Write a function to plot histogram of column values here:

def hist_roller(df, column):
plt.hist(df[column])
legend = [column]
plt.legend(legend)
plt.xlabel(column)
plt.ylabel(‘Number of Roller Coasters’)
plt.show()
#print(hist_roller(coasters, ‘speed’))

Write a function to plot inversions by coaster at a park here:

def bar_park(df, park):
park_df = df[df[‘park’] == park]
roller_coaster = park_df[‘name’]
inversions = park_df[‘num_inversions’]
plt.figure(figsize = (20, 15))
ax = plt.subplot()
ay = plt.subplot()
plt.bar(range(len(roller_coaster)), inversions)
ax.set_xticks(range(len(roller_coaster)))
ax.set_xticklabels(roller_coaster)
plt.xticks(rotation=45)
plt.legend([park])
plt.show()

#print(bar_park(coasters, ‘Walibi Belgium’))

Write a function to plot pie chart of operating status here:

def pie(coasters):
df_operating = coasters[coasters[‘status’] == ‘status.operating’]
df_closed = coasters[coasters[‘status’] == ‘status.closed.definitely’]
count = [len(df_operating), len(df_closed)]
labelsdata = [‘Operating’, ‘Closed’]
plt.pie(count, autopct=’%0.1f%%’, labels = labelsdata)
plt.axis(‘equal’)
plt.show()

#print(pie(coasters))

Write a function to create scatter plot of any two numeric columns here:

def scatter(df, column1, column2):
c1 = df[column1]
c2 = df[column2]
x = range(len(df))
plt.figure(figsize=(20, 20))
ax = plt.subplot()
plt.scatter(x, c1, color= ‘blue’, alpha= 0.5)
plt.scatter(x, c2, color=‘green’, alpha=0.5)
ax.set_xlabel(‘Variables’)
ax.set_ylabel(‘Roller Coasters’)
plt.ylim(0, 200)
plt.legend([column1, column2])
plt.show()

#print(scatter(coasters, ‘speed’, ‘height’))

css0396125402 · February 17, 2020, 5:55am

Finally finished my first project in Codecademy!

Here is my solution:

gist.github.com

https://gist.github.com/codecademydev/c003702d33c2bc04e2eba1b37321beb5

script.py

import codecademylib3_seaborn
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# load rankings data here:
wood = pd.read_csv("Golden_Ticket_Award_Winners_Wood.csv")
steel = pd.read_csv('Golden_Ticket_Award_Winners_Steel.csv')

wood.rename(columns = {

This file has been truncated. show original

kimnjogu · March 2, 2020, 12:32pm

Hello, this particular project has given me a headache. Just used your code to understand whats happening.

lesvrolyk · March 14, 2020, 6:49pm

This is my solution for the Roller Coaster project. I did try to answer the extra questions.
(This is my first time trying GitHub, so bear with me. )
Roller Coaster Project

fideldelahoya · March 15, 2020, 4:04pm

here is my code, I used Jupyter notebooks (open the link )

github.com

Fdelahoya/Matplotlib/blob/master/Roller Coasters (1).ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3.\n",
    "Write a function that will plot the ranking of a given roller coaster over time as a line. Your function should take a roller coaster’s name and a ranking DataFrame as arguments. Make sure to include informative labels that describe your visualization.\n",
    "\n",
    "Call your function with \"El Toro\" as the roller coaster name and the wood ranking DataFrame. What issue do you notice? Update your function with an additional argument to alleviate the problem, and retest your function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",

This file has been truncated. show original

lesvrolyk · March 19, 2020, 8:30pm

I like that you put it in a Jupyter Notebook so that we could see the graphs. I cannot figure out a way to show the graphs in GitHub.

Anyhow, what is this maddening warning?
“C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.”
I kept getting it in my output too! I finally found that moving the calls to subplot above calls to plot helped. Also calling plt.clf() at the end of every function. I do not understand why this keeps the warning at bay. It was driving me crazy!

fideldelahoya · March 19, 2020, 10:15pm

Hello,
You can see the github repository where it says “show original” anyways, here is the link.
Regarding the warning: you can see the explanation here.
hope it helps!

tag9531813702 · March 25, 2020, 8:05am

Here’s my solution:

github.com

mcjraquel/codecademy_practice/blob/master/codecademy_roller_coaster_matplotlib.py

# Created by: Ma. Celyn Joyce Raquel

import pandas as pd
import matplotlib.pyplot as plt

# load rankings data here:
steel = pd.read_csv('Golden_Ticket_Award_Winners_Steel.csv')
wood = pd.read_csv('Golden_Ticket_Award_Winners_Wood.csv')
steel = steel.rename(columns = {'Rank': 'ranks', 'Name': 'name', 'Park': 'park', 'Location': 'location', 'Supplier': 'supplier', 'Year Built': 'year_built', 'Points': 'points', 'Year of Rank': 'year_of_rank'})
wood = wood.rename(columns = {'Rank': 'ranks', 'Name': 'name', 'Park': 'park', 'Location': 'location', 'Supplier': 'supplier', 'Year Built': 'year_built', 'Points': 'points', 'Year of Rank': 'year_of_rank'})

# write function to plot rankings over time for 1 roller coaster here:
def ranking_plot_1rc(name, park, ranking_df):
    rankings = ranking_df[(ranking_df.name == name) & (ranking_df.park == park)]
    years = [i for i in rankings.year_of_rank]
    ranks = [i for i in rankings.ranks]
    ax = plt.subplot()
    ax.plot(years,ranks, marker = 'o')
    ax.set_xticks(years)
    ax.set_yticks(ranks)

This file has been truncated. show original

tag6950669587 · April 4, 2020, 9:20am

hello i run your code and it gives me the following error
max must be larger than min in range parameter.

skybook · April 5, 2020, 12:52pm

Hi Everyone,

I don’t quiet understand the the syntax for " function to plot top n rankings over time" below

for coaster in set(top_n_rankings[‘Name’]):
coaster_rankings = top_n_rankings[top_n_rankings[‘Name’] == coaster]
ax.plot(coaster_rankings[‘Year of Rank’],coaster_rankings[‘Rank’],label=coaster)

Why do we use SET here? or what’s the structure of the syntax whenever I use ‘SET’

Regards

system2367193808 · April 8, 2020, 12:56pm

I am receiving a NameError: coaster_rankings not defined. why?

this my code:
def plot_ranking_two(coaster_name1,coaster_name2,park_name1,park_name2,df_ranking):
coaster_rankings1= df_ranking[(df_ranking[‘Name’] ==coaster_name1) & (df_ranking[‘Park’]==park_name1)]
coaster_rankings2 = df_ranking[ (df_ranking[‘Name’] == coaster_name2) & (df_ranking[‘Park’] == park_name2) ]
fig,ax = plt.subplots()
ax.plot(coaster_rankings1[‘Year of Rank’],coaster_rankings1[‘Rank’],color=‘green’,label=coaster_name1)
ax.plot(coaster_rankings2[‘Year of Rank’],coaster_rankings2[‘Rank’],color=‘red’,label=coaster_name2)
ax.invert_yaxis()
plt.title("{} vs {} Rankings".format(coaster_name1,coaster_name2))
plt.xlabel(“Year”)
plt.ylabel(“Ranking”)
plt.show()
plot_ranking_two(‘El Toro’,‘Six Flags Great Adventure’,‘Boulder Dash’,‘Lake Compounce’,wood)

data5127116848 · April 15, 2020, 7:16am

Hello! Here is my solution. Cheers!

gist.github.com

https://gist.github.com/somodiferenc/f5961849e99a833f03a70314ff946fef

roller_coaster.py

import pandas as pd
import matplotlib.pyplot as plt

steel = pd.read_csv('Golden_Ticket_Award_Winners_Steel.csv')
wood = pd.read_csv('Golden_Ticket_Award_Winners_Wood.csv')
roller = pd.read_csv('roller_coasters.csv')

print(steel.head())
#print(steel.info())
print(wood.head())

This file has been truncated. show original

tera9172949569 · April 23, 2020, 1:58pm

Here is my solution!
Cheers!

Note: I didn’t do the answers part.

gist.github.com

https://gist.github.com/codecademydev/3393d0cce2abafa45765384b40fd5b2a

script.py

import codecademylib3_seaborn
import pandas as pd
import matplotlib.pyplot as plt

# load rankings data here:
df_wood = pd.read_csv('Golden_Ticket_Award_Winners_Wood.csv')
df_steel = pd.read_csv('Golden_Ticket_Award_Winners_Steel.csv')
df_roller = pd.read_csv('roller_coasters.csv')

# wood = df_wood.head(15)

This file has been truncated. show original

rng009 · April 26, 2020, 10:10pm

My understanding is the set() command transforms the data into a list in which you can more easily manipulate in your function as opposed to a Dataframe column. I think the list() command achieves the same purpose.

board7710174351 · April 30, 2020, 12:13am

Hi all, code below. I also answered 1 out of the 3 bonus questions at step 11. Have not compared my code to the sample code yet, but I think the functions are work properly.

Thanks.

gist.github.com

https://gist.github.com/HTH24/d2d4010b8825bfdcb342e756f298cf1b

Roller_Coaster.py

# This is a Codecademy practice project.

"""
Spyder Editor

This is a temporary script file.
"""

import pandas as pd
import matplotlib.pyplot as plt

This file has been truncated. show original

goananda · May 2, 2020, 1:34am

I made it in Jupyter Notebook - for me it’s most comfy for data science projects.

ranks

Roller Coaster

https://github.com/Goananda/Codecademy-project-Roller-Coaster/blob/master/Roller%20Coaster.ipynb

webpro14138 · May 3, 2020, 4:53am

There is a defect in decision for point 5, where we should write a function that will plot the ranking of the top n ranked roller coasters over time as lines.
If there are two roller coasters with the same name, but different park in top n rankings - function counts them as same roller coasters. In general, we should give them different names, before count. But it’s much more harder to do than in points 3 and 4…

goananda · May 3, 2020, 10:57am

I agree.

In addition, I found that the base needs cleaning before analisys: there are some differences in the spelling of the names of coasters and parks, for example:
‘Intimidator-305’ / ‘Intimidator 305’, ‘Beast’ / ‘The Beast’.