Codecademy Forums

Data Science Independent Project #4 – Home Value Trends

Project: Trends in Estimated Home Values

This project will take you off-platform and get you started in your own developer environment! Never done that before? Not to worry - we’ve shared some resources to help you down below. This project can be completed entirely on your own - or, you can join the #data-science-buddies in the Codecademy Pro Learner community on Slack and find someone to work with! Jump to the community support section to hear more about this.

This project is broken down into key questions that your client or company is looking to answer. As a data scientist, you’ll often become a resource to help businesses answer the key questions about the efficacy of existing or potential strategies & projects.

Overview

Objective

You are asked by a company to help them make more informed decisions on real estate investments. Start by analyzing the data on median estimated values of single family homes by zip codes from the past two decades.

Pre-requisites

In order to complete this project, we suggest that you have familiarity with the content in the following courses or lessons on the Codecademy platform:

  1. Queries
  2. Aggregate Functions

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to complete this project - we suggest the following:

  1. DB Browser for SQLite

Project Tasks

Get started - hosting your project

DB Browser for SQLite is a visual tool for working with SQLite databases. Follow the link to download the application for your computer.

  • SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To open a database, select Open Database at the top of the window and browse for the file. Alternatively, you can choose to create a New Database by saving a file with the .sqlite or .db extension.
  • To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to convert fields to numeric type as needed. See here for how to do that.)
  • You can download the data you’ll be using for this specific project here.

There are several tabs near the top of the window for working with the data:

  • Database Structure: View the tables in your database and the columns they contain.
  • Browse Data: Browse the data for each table.
  • Execute SQL: Write and execute SQL queries.

Basic Requirements

Let’s break this project down into a couple different parts.

Exploration: Familiarize yourself with the dataset.

  • How many distinct zip codes are in this dataset?
  • How many zip codes are from each state?
  • What range of years are represented in the data?
    • Hint: The date column is in the format yyyy-mm. Try taking a look at using the substr() function to help extract just the year.
  • Using the most recent month of data available, what is the range of estimated home values across the nation?
    • Note: When we imported the data from a CSV file, all fields are treated as a string. Make sure to convert the value field into a numeric type if you will be ordering by that field. See here for a hint.

Analysis: Explore how home value differ by region as well as change over time.

  • Using the most recent month of data available, which states have the highest average home values? How about the lowest?
  • Which states have the highest/lowest average home values for the year of 2017? What about for the year of 2007? 1997?

Additional Challenges

Intermediate Challenge

  • What is the percent change in average home values from 2007 to 2017 by state? How about from 1997 to 2017?
    • Hint: We can use the WITH clause to create temporary tables containing the average home values for each of those years, then join them together to compare the change over time.
  • How would you describe the trend in home values for each state from 1997 to 2017? How about from 2007 to 2017? Which states would you recommend for making real estate investments?

Advanced Challenge

  • Join the house value data with the table of zip-code level census data. Do there seem to be any correlations between the estimated house values and characteristics of the area, such as population count or median household income?

Resources & Support

Project-specific resources

  1. SQLite Documentation
  2. SQLite Tutorial
  3. SQLite substr() Function
  4. Home Value Data

General Resources

  1. How to get set-up for coding on your computer
  2. What is a Relational Database Management System?
  3. What you need to know about Git, GitHub & Coding in Teams
  4. How developer teams work
  5. First steps in tackling a group project
  6. Resource on writing pseudocode to get started with off-platform projects

Community Support

Want to Looking for additional help or someone to work with (or somewhere to brag about your finished project)? Join our Codecademy Pro Learner Community on Slack to meet other learners like yourself!

  • The data downloaded from here is for Single Family Residences - there is also data for Condo/Co-op as well as 1 bedroom, 2 bedroom, etc homes. If you’d like to collaborate with a peer on this project, as many real-world data analysts do, different team members can analyze a different type of housing, then come together and compare methodologies and results, before coming to a conclusion together.

Once you’re done…

Share on Slack or the Forums for feedback and to see some other ways of solving this problem!

1 Like

Hey Alyssa
Seems like the Slack link doesnt work for me as it says it’s expired :frowning:
Also I was wondering where I could see others who have done this - I couldn’t find anything on the forums on this going via the search function

I can’t check the link either :confused:

But re: seeing other people’s progress, you can always leave a message on the SQL forums and see what comes up!

1 Like

Hey @array1383908755 and @luis_domingos, thank you for this callout—it’s all fixed now! Please let me know if I can help. :slight_smile:

1 Like

Hi, Alyssa. Thanks for fixing the link here.

Just a heads-up: the links set up for the other projects at around the same time are not working either, so if you could copy-paste this new link to your other posts, that would be great (and it would save you and some people the time).

Best,
Luís

that’s a very important callout, @luis_domingos!! thank you :pray:

I can’t seem to import table from CSV file into DB Browser? The selection is grey-ed out.

Hello :slight_smile: Welcome to the forum!

In order to import a table from the CSV file, you first need to create a new database.

1 Like

Thanks! What’s next? It’s asking me to create that database, but there are about 4 million rows in that CSV file from this project.

Oh, I guess that you can simply select Cancel and import the table from the file as you intended in the first place.

I canceled out of that box and imported the table from the CSV file. This time the selection was available and worked :slight_smile:
Thanks!

1 Like

You’re very welcome :slight_smile:

Would it be right to suspect my query not working because the columns aren’t labeled? The column labels seem to be in the first row (see below):

And if that is the case: I can’t seem to edit the column labels where it says “field1, field2, etc.”

Yes, that is correct.

The fastest way to solve this is to import the table again, this time please select option Columns names in first line in the Import CSV file window. Checking this option will make SQLite treat the first row of the file as labels of the columns.