Roller Coaster Graphing Project

Here is my code. Any comments appreciated. It was done using the Jupyter IDE, but Github is still confusing me…

Import necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
winners_steel = pd.read_csv("/users/jeanlawlis/downloads/roller_coaster_starting/Golden_Ticket_Award_Winners_Steel.csv")
winners_wood = pd.read_csv("/users/jeanlawlis/downloads/roller_coaster_starting/Golden_Ticket_Award_Winners_Wood.csv")

#sort by name and location

sorted_wood = winners_wood.sort_values(“Name”)
sorted_steel = winners_steel.sort_values(“Name”)

sorted_wood[“index”] = range(len(sorted_wood))

for easier management, create a np array of numerical values, and a separate np array of strings, First create the segregated df’s then convert to np arrays

#print(“winners wood”, winners_wood[‘Rank’], winners_wood[“Year Built”], winners_wood[‘Points’],winners_wood[“Year of Rank”])
coaster_data_wood = sorted_wood[[‘Rank’, “Year Built”, ‘Points’,“Year of Rank”]].copy()
coaster_descr_wood = sorted_wood[[“Name”, “Park”,“Location”,“Supplier”]].copy()


Create a function to plot rankings over time for 1 roller coaster

def plot_ranking_over_time(name, parkname):
print("\n Ranking over time of the ",name, "Roller Coaster at ", parkname)
temprat =
tempyear =
counter = 0
for index in range(len(winners_wood)):
if (coaster_descr_wood.loc[index, “Name”] == name) & (coaster_descr_wood.loc[index, “Park”] == parkname):
tempyear.append(coaster_data_wood.loc[index,“Year of Rank”])
else: continue

plt.plot(tempyear, temprat)

plot_ranking_over_time(“Boulder Dash”, “Lake Compounce”)


Create a plot of El Toro ranking over time

plot_ranking_over_time(“El Toro”, “Six Flags Great Adventure”)

Create a plot of El Toro and Boulder dash hurricanes


Create a function to plot top n rankings over time

start by sorting by ranking, then split; return new df

def top_rankings(coasters, n):
sorted_coasters = coasters.sort_values(“Rank”)
highest_n_rankings = sorted_coasters.iloc[:n]
return highest_n_rankings
top = top_rankings(sorted_wood, 5)
print("\n", “Top Rankings of Wood Roller Coasters”)[“Year of Rank”], top[“Points”])
top = top_rankings(sorted_steel, 6)
print("\n", “Top Rankings of Steel Roller Coasters”)
plt.clf()[“Year of Rank”], top[“Points”])


load roller coaster data

roller_coaster_data = pd.read_csv("/users/jeanlawlis/downloads/roller_coaster_starting/roller_coasters.csv")


Create a function to plot histogram of column values

def histogram(coaster_data):

print("\n Coaster Speed Distribution ")

Create histogram of roller coaster speed

print("\n Coaster Length Distribution" )

Create histogram of roller coaster length


Create histogram of roller coaster number of inversions

print("\n Number of Inversions ")

Create a function to plot histogram of height values

print("\n Roller Coaster Height ")

Create a histogram of roller coaster height


Create a function to plot inversions by coaster at park

def inversions_by_coaster_by_park(park):
inversions =
rollers =
for index in range(len(roller_coaster_data)):
if roller_coaster_data.loc[index,“park”] == park:
print("Park Name: ", park)

fig, ax = plt.subplots()
plt.hist(inversions,label = rollers)
ax.set_title('Inversions Per Coaster')
print("Rollers : ", rollers)
list_ticks = list(range(len(rollers)))
ax.set_xticklabels(rollers, rotation = 70)
ax.set_yticks([0,1,2,3,4,5,6], )
ax.set_ylabel("number of inversions")

inversions_by_coaster_by_park(“Disneyland Park”)

Create barplot of inversions by roller coasters


Create a function to plot a pie chart of status.operating

def pie(operational, trait):
#To count the number of occurrences in e.g. a column in a
#dataframe, use Pandas value_counts() method.
sorted = operational.sort_values(trait)
set = sorted[trait].value_counts()
list_traits = sorted[trait].unique()
plt.legend(list_traits, loc = “upper right”, bbox_to_anchor = (2.0, 0.9))
pie(roller_coaster_data, “status”)

Create pie chart of roller coasters


Create a function to plot scatter of any two columns

def scatter(dataframe, column1, column2):
print("\n Showing ", column1, " on the x axis, and “, column2, " on the y”)

scatter(roller_coaster_data, “speed”, “height”)

Create a function to plot scatter of speed vs height

Create a scatter plot of roller coaster height by speed

scatter(roller_coaster_data, “height”, “speed”)

I am also working on this problem now (I am halfway through, so will only comment on first portion of graphing 1 and n rankings over time)

I find your solution interesting. I think you’ve approached it in a way that gives you a lot of abstraction and so you can re-use this for different portions of the questions. However, I do find it maybe a little redundant to select the “Points” column to sort rank since rank is already provided in the dataframe. Was there another reason you felt you wanted to select ‘Points’? If rank wasn’t there I can see the value of selecting it and sorting!

Great work Jean!

Hey guys, I’m part way through this project and I’m doing the bar graphs for the different coaster stats but my bar graphs look odd. Can anyone explain how I’m getting these weird, skinny outliers on my graphs?

# 5
# load roller coaster data
coaster_data = pd.read_csv('roller_coasters.csv')
# 6
# Create a function to plot histogram of column values
def plot_coaster_column(data, column):
  coaster = data[column].dropna()
  plt.hist(coaster, color='blue')
  plt.title('{} of Roller Coasters'.format(column))
# Create histogram of roller coaster speed
print(plot_coaster_column(coaster_data, 'speed'))
# Create histogram of roller coaster length
print(plot_coaster_column(coaster_data, 'length'))
# Create histogram of roller coaster number of inversions
print(plot_coaster_column(coaster_data, 'num_inversions'))


Jada :slight_smile: