Data Science Independent Project #4 – Home Value Trends

Project: Trends in Estimated Home Values

This project will take you off-platform and get you started in your own developer environment! Never done that before? Not to worry - we’ve shared some resources to help you down below. This project can be completed entirely on your own - or, you can join the #data-science-buddies in the Codecademy Pro Learner community on Slack and find someone to work with! Jump to the community support section to hear more about this.

This project is broken down into key questions that your client or company is looking to answer. As a data scientist, you’ll often become a resource to help businesses answer the key questions about the efficacy of existing or potential strategies & projects.

Overview

Objective

You are asked by a company to help them make more informed decisions on real estate investments. Start by analyzing the data on median estimated values of single family homes by zip codes from the past two decades.

Pre-requisites

In order to complete this project, we suggest that you have familiarity with the content in the following courses or lessons on the Codecademy platform:

  1. Queries
  2. Aggregate Functions

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to complete this project - we suggest the following:

  1. DB Browser for SQLite

Project Tasks

Get started - hosting your project

DB Browser for SQLite is a visual tool for working with SQLite databases. Follow the link to download the application for your computer.

  • SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To open a database, select Open Database at the top of the window and browse for the file. Alternatively, you can choose to create a New Database by saving a file with the .sqlite or .db extension.
  • To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to convert fields to numeric type as needed. See here for how to do that.)
  • You can download the data you’ll be using for this specific project here.

There are several tabs near the top of the window for working with the data:

  • Database Structure: View the tables in your database and the columns they contain.
  • Browse Data: Browse the data for each table.
  • Execute SQL: Write and execute SQL queries.

Basic Requirements

Let’s break this project down into a couple different parts.

Exploration: Familiarize yourself with the dataset.

  • How many distinct zip codes are in this dataset?
  • How many zip codes are from each state?
  • What range of years are represented in the data?
    • Hint: The date column is in the format yyyy-mm. Try taking a look at using the substr() function to help extract just the year.
  • Using the most recent month of data available, what is the range of estimated home values across the nation?
    • Note: When we imported the data from a CSV file, all fields are treated as a string. Make sure to convert the value field into a numeric type if you will be ordering by that field. See here for a hint.

Analysis: Explore how home value differ by region as well as change over time.

  • Using the most recent month of data available, which states have the highest average home values? How about the lowest?
  • Which states have the highest/lowest average home values for the year of 2017? What about for the year of 2007? 1997?

Additional Challenges

Intermediate Challenge

  • What is the percent change in average home values from 2007 to 2017 by state? How about from 1997 to 2017?
    • Hint: We can use the WITH clause to create temporary tables containing the average home values for each of those years, then join them together to compare the change over time.
  • How would you describe the trend in home values for each state from 1997 to 2017? How about from 2007 to 2017? Which states would you recommend for making real estate investments?

Advanced Challenge

  • Join the house value data with the table of zip-code level census data. Do there seem to be any correlations between the estimated house values and characteristics of the area, such as population count or median household income?

Resources & Support

Project-specific resources

  1. SQLite Documentation
  2. SQLite Tutorial
  3. SQLite substr() Function
  4. Home Value Data

General Resources

  1. How to get set-up for coding on your computer
  2. What is a Relational Database Management System?
  3. What you need to know about Git, GitHub & Coding in Teams
  4. How developer teams work
  5. First steps in tackling a group project
  6. Resource on writing pseudocode to get started with off-platform projects

Community Support

Want to Looking for additional help or someone to work with (or somewhere to brag about your finished project)? Join our Codecademy Pro Learner Community on Slack to meet other learners like yourself!

  • The data downloaded from here is for Single Family Residences - there is also data for Condo/Co-op as well as 1 bedroom, 2 bedroom, etc homes. If you’d like to collaborate with a peer on this project, as many real-world data analysts do, different team members can analyze a different type of housing, then come together and compare methodologies and results, before coming to a conclusion together.

Once you’re done…

Share on Slack & the Forums