Data Science Independent Project #4 – Home Value Trends

Project: Trends in Estimated Home Values

some information has changed. please see the rest of thread to apply corrections

This project will take you off-platform and get you started in your own developer environment! Never done that before? Not to worry - we’ve shared some resources to help you down below. This project can be completed entirely on your own - or, you can join the Codecademy Lifelong Learner Pro Community and find someone to work with! Jump to the community support section to hear more about this.

This project is broken down into key questions that your client or company is looking to answer. As a data scientist, you’ll often become a resource to help businesses answer the key questions about the efficacy of existing or potential strategies & projects.



You are asked by a company to help them make more informed decisions on real estate investments. Start by analyzing the data on median estimated values of single family homes by zip codes from the past two decades.


In order to complete this project, we suggest that you have familiarity with the content in the following courses or lessons on the Codecademy platform:

  1. Queries
  2. Aggregate Functions

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to complete this project - we suggest the following:

  1. DB Browser for SQLite

Project Tasks

Get started - hosting your project

DB Browser for SQLite is a visual tool for working with SQLite databases. Follow the link to download the application for your computer.

  • SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To open a database, select Open Database at the top of the window and browse for the file. Alternatively, you can choose to create a New Database by saving a file with the .sqlite or .db extension.
  • To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to convert fields to numeric type as needed. See here for how to do that.)
  • You can download the data you’ll be using for this specific project here.

There are several tabs near the top of the window for working with the data:

  • Database Structure: View the tables in your database and the columns they contain.
  • Browse Data: Browse the data for each table.
  • Execute SQL: Write and execute SQL queries.

Basic Requirements

Let’s break this project down into a couple different parts.

Exploration: Familiarize yourself with the dataset.

  • How many distinct zip codes are in this dataset?
  • How many zip codes are from each state?
  • What range of years are represented in the data?
    • Hint: The date column is in the format yyyy-mm. Try taking a look at using the substr() function to help extract just the year.
  • Using the most recent month of data available, what is the range of estimated home values across the nation?
    • Note: When we imported the data from a CSV file, all fields are treated as a string. Make sure to convert the value field into a numeric type if you will be ordering by that field. See here for a hint.

Analysis: Explore how home value differ by region as well as change over time.

  • Using the most recent month of data available, which states have the highest average home values? How about the lowest?
  • Which states have the highest/lowest average home values for the year of 2017? What about for the year of 2007? 1997?

Additional Challenges

Intermediate Challenge

  • What is the percent change in average home values from 2007 to 2017 by state? How about from 1997 to 2017?
    • Hint: We can use the WITH clause to create temporary tables containing the average home values for each of those years, then join them together to compare the change over time.
  • How would you describe the trend in home values for each state from 1997 to 2017? How about from 2007 to 2017? Which states would you recommend for making real estate investments?

Advanced Challenge

  • Join the house value data with the table of zip-code level census data. Do there seem to be any correlations between the estimated house values and characteristics of the area, such as population count or median household income?

Resources & Support

Project-specific resources

  1. SQLite Documentation
  2. SQLite Tutorial
  3. SQLite substr() Function
  4. Home Value Data

General Resources

  1. How to get set-up for coding on your computer
  2. What is a Relational Database Management System?
  3. What you need to know about Git, GitHub & Coding in Teams
  4. How developer teams work
  5. First steps in tackling a group project
  6. Resource on writing pseudocode to get started with off-platform projects

Community Support

Looking for additional help or someone to work with (or somewhere to brag about your finished project)? Join our Codecademy Lifelong Learner Pro Community to meet other learners like yourself!

  • The data downloaded from here is for Single Family Residences - there is also data for Condo/Co-op as well as 1 bedroom, 2 bedroom, etc homes. If you’d like to collaborate with a peer on this project, as many real-world data analysts do, different team members can analyze a different type of housing, then come together and compare methodologies and results, before coming to a conclusion together.

Once you’re done…

Share in the CLL Community or the Forums for feedback and to see some other ways of solving this problem!


8 posts were split to a new topic: Import table from CSV file into DB Browser

How can I calculate the percent change (intermediate challenge) based on the state and years? I have created a WITH table with 3 columns that represent the state, year and average value that is broken down into the 3 years. I can’t seem to figure out how to pull out the individual average value based on the year and state for a calculation while doing that repeatedly for every state. Here is the code I have so far

WITH average_value_per_year AS(
SELECT state, SUBSTR(date, 1, 4) AS years, ROUND(AVG(value),1) AS average_value
FROM home_value_data
WHERE years = ‘2017’
OR years = ‘2007’
OR years = ‘1997’

I am trying to answer the question “Using the most recent month of data available, which states have the highest average home values? How about the lowest?”.

SELECT state, ROUND(AVG(home_value), 0)
FROM home_value_data
WHERE date = 2017-11

state AVG
state 234493

This is the data I took for the table.

What I want to do is take the average (AVG) of home_value, and list it on a DISTINCT state, but when I do it only outputs a single value with an output of average (AVG) of the entire database of home_value? I want to take the average (AVG) out of every individual state that is returned.
I am so confused and I need assistance!

False alarm:

1 SELECT DISTINCT field3, ROUND(AVG(field7), 0)
2 FROM home_value_data
3 WHERE field6 >= 2017-11
4 ORDER BY field3 ASC;

Line 4 actually required a GROUP BY, so the correct syntax goes:

SELECT DISTINCT field3, ROUND(AVG(field7), 0)
FROM home_value_data
WHERE field6 >= 2017-11
GROUP BY field3;

That’s how I made it work.