Data Science Independent Project #5 – Analyze Airfare Data

Project: Analyzing Airfare Data

This project will take you off-platform and get you started in your own developer environment! Never done that before? Not to worry - we’ve shared some resources to help you down below. This project can be completed entirely on your own - or, you can join the #data-science-buddies in the Codecademy Pro Learner community on Slack and find someone to work with! Jump to the community support section to hear more about this.

This project is broken down into key questions that your client or company is looking to answer. As a data scientist, you’ll often become a resource to help businesses answer the key questions about the efficacy of existing or potential strategies & projects.

Overview

Objective

As the data analyst of a company, you are presented with the airfare data covering the top 1,000 contiguous state city-pair markets. Help them analyze this data and identify trends.

Pre-requisites

In order to complete this project, we suggest that you have familiarity with the content in the following courses or lessons on the Codecademy platform:

  1. Queries
  2. Aggregate Functions

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to complete this project - we suggest the following:

  1. DB Browser for SQLite

Project Tasks

Get started - hosting your project

DB Browser for SQLite is a visual tool for working with SQLite databases. Follow the link to download the application for your computer.

  • SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To open a database, select Open Database at the top of the window and browse for the file. Alternatively, you can choose to create a New Database by saving a file with the .sqlite or .db extension.
  • To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to convert fields to numeric type as needed. See here for how to do that.)

There are several tabs near the top of the window for working with the data:

  • Database Structure: View the tables in your database and the columns they contain.
  • Browse Data: Browse the data for each table.
  • Execute SQL: Write and execute SQL queries.

Basic Requirements

You can download the data you’ll be using for this specific project here.

Let’s break this project down into a couple different parts.

Exploration: Familiarize yourself with the dataset.

  • What range of years are represented in the data?
  • What are the shortest and longest-distanced flights, and between which 2 cities are they?
    • Note: When we imported the data from a CSV file, all fields are treated as a string. Make sure to convert the value field into a numeric type if you will be ordering by that field. See here for a hint.
  • How many distinct cities are represented in the data (regardless of whether it is the source or destination)?
    • Hint: We can use UNION to help fetch data from both the city1 and city2 columns. Note the distinction between UNION and UNION ALL.

Analysis: Further explore and analyze the data.

  • Which airline appear most frequently as the carrier with the lowest fare (ie. carrier_low)? How about the airline with the largest market share (ie. carrier_lg)?
  • How many instances are there where the carrier with the largest market share is not the carrier with the lowest fare? What is the average difference in fare?

Additional Challenges

Intermediate Challenge

  • What is the percent change in average fare from 2007 to 2017 by flight? How about from 1997 to 2017?
    • Hint: We can use the WITH clause to create temporary tables containing the airfares, then join them together to compare the change over time.
  • How would you describe the overall trend in airfares from 1997 to 2017, as compared 2007 to 2017?

Advanced Challenge

  • What is the average fare for each quarter? Which quarter of the year has the highest overall average fare? lowest?
    • Note: Not all flights (ie. each city-pair route) have data from all 4 quarters - which may skew the average. Let’s try considering only flights that have data available for all 4 quarters.
  • Considering only the flights that have data available on all 4 quarters of the year, which quarter has the highest overall average fare? lowest? Try breaking it down by year as well.
    • Hint: To consider only flights that have data available for all 4 quarters, we could join the table with itself - each of those tables should be filtered to have data from one quarter.

Resources & Support

Project-specific resources

  1. SQLite Documentation
  2. SQLite Tutorial
  3. View Data Variable Description

General Resources

  1. How to get set-up for coding on your computer
  2. What is a Relational Database Management System?
  3. What you need to know about Git, GitHub & Coding in Teams
  4. How developer teams work
  5. First steps in tackling a group project
  6. Resource on writing pseudocode to get started with off-platform projects

Community Support

Want to look for additional help or someone to work with (or somewhere to brag about your finished project)? Join our Codecademy Lifelong Learner Pro Community (CLL) to meet other learners like yourself!

Once you’re done…

Share in the CLL Community or the Forums for feedback and to see some other ways of solving this problem!

1 Like

Hi ! Thank you for the project ! But… can access the data base with DB Browser… It needs a encryption key to open the data base downloaded from Google Sheet… Where can I get this key?

Thanks !

Greg

3 Likes

The same here. Did you understand how to solve the problem?

Hello, welcome to the forum :slight_smile:

  1. Download sheet https://docs.google.com/spreadsheets/d/1JOQ5bnInxX9Mpf_3xtsX38cw-G1uctiu5jTrjzkJoh0/edit#gid=809136041 as a .csv or .tsv file;
  2. Create a new database using DB Browser;
  3. Use File -> Import -> Table from CSV file... operation to import content of downloaded .csv or .tsv file as a new table in the created database.

No encryption keys needed :slight_smile:

2 Likes

I’m trying to import the csv file as a table in the DB Browser but I receive the error “Error importing data from record number 90021. Inserting row failed: not an error.” Any way I can fix this? Cheers.

Hi kpsimon,

I opened and saved the file in excel and the error was gone.
Hope this helps.

1 Like