Data Science Independent Project #4 – Home Value Trends

Project: Trends in Estimated Home Values

some information has changed. please see the rest of thread to apply corrections

This project will take you off-platform and get you started in your own developer environment! Never done that before? Not to worry - we’ve shared some resources to help you down below. This project can be completed entirely on your own - or, you can join our Community Discord and find someone to work with! Jump to the community support section to hear more about this.

This project is broken down into key questions that your client or company is looking to answer. As a data scientist, you’ll often become a resource to help businesses answer the key questions about the efficacy of existing or potential strategies & projects.

Overview

Objective

You are asked by a company to help them make more informed decisions on real estate investments. Start by analyzing the data on median estimated values of single family homes by zip codes from the past two decades.

Pre-requisites

In order to complete this project, we suggest that you have familiarity with the content in the following courses or lessons on the Codecademy platform:

  1. Queries
  2. Aggregate Functions

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to complete this project - we suggest the following:

  1. DB Browser for SQLite

Project Tasks

Get started - hosting your project

DB Browser for SQLite is a visual tool for working with SQLite databases. Follow the link to download the application for your computer.

  • SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To open a database, select Open Database at the top of the window and browse for the file. Alternatively, you can choose to create a New Database by saving a file with the .sqlite or .db extension.
  • To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to convert fields to numeric type as needed. See here for how to do that.)
  • You can download the data you’ll be using for this specific project here.

There are several tabs near the top of the window for working with the data:

  • Database Structure: View the tables in your database and the columns they contain.
  • Browse Data: Browse the data for each table.
  • Execute SQL: Write and execute SQL queries.

Basic Requirements

Let’s break this project down into a couple different parts.

Exploration: Familiarize yourself with the dataset.

  • How many distinct zip codes are in this dataset?
  • How many zip codes are from each state?
  • What range of years are represented in the data?
    • Hint: The date column is in the format yyyy-mm. Try taking a look at using the substr() function to help extract just the year.
  • Using the most recent month of data available, what is the range of estimated home values across the nation?
    • Note: When we imported the data from a CSV file, all fields are treated as a string. Make sure to convert the value field into a numeric type if you will be ordering by that field. See here for a hint.

Analysis: Explore how home value differ by region as well as change over time.

  • Using the most recent month of data available, which states have the highest average home values? How about the lowest?
  • Which states have the highest/lowest average home values for the year of 2017? What about for the year of 2007? 1997?

Additional Challenges

Intermediate Challenge

  • What is the percent change in average home values from 2007 to 2017 by state? How about from 1997 to 2017?
    • Hint: We can use the WITH clause to create temporary tables containing the average home values for each of those years, then join them together to compare the change over time.
  • How would you describe the trend in home values for each state from 1997 to 2017? How about from 2007 to 2017? Which states would you recommend for making real estate investments?

Advanced Challenge

  • Join the house value data with the table of zip-code level census data. Do there seem to be any correlations between the estimated house values and characteristics of the area, such as population count or median household income?

Resources & Support

Project-specific resources

  1. SQLite Documentation
  2. SQLite Tutorial
  3. SQLite substr() Function
  4. Home Value Data

General Resources

  1. How to get set-up for coding on your computer
  2. What is a Relational Database Management System?
  3. What you need to know about Git, GitHub & Coding in Teams
  4. How developer teams work
  5. First steps in tackling a group project
  6. Resource on writing pseudocode to get started with off-platform projects

Community Support

Looking for additional help or someone to work with (or somewhere to brag about your finished project)? Join our Community Discord to meet other learners like yourself!

  • The data downloaded from here is for Single Family Residences - there is also data for Condo/Co-op as well as 1 bedroom, 2 bedroom, etc homes. If you’d like to collaborate with a peer on this project, as many real-world data analysts do, different team members can analyze a different type of housing, then come together and compare methodologies and results, before coming to a conclusion together.

Once you’re done…

Share in the Forums for feedback and to see some other ways of solving this problem!

7 Likes

8 posts were split to a new topic: Import table from CSV file into DB Browser

How can I calculate the percent change (intermediate challenge) based on the state and years? I have created a WITH table with 3 columns that represent the state, year and average value that is broken down into the 3 years. I can’t seem to figure out how to pull out the individual average value based on the year and state for a calculation while doing that repeatedly for every state. Here is the code I have so far

WITH average_value_per_year AS(
SELECT state, SUBSTR(date, 1, 4) AS years, ROUND(AVG(value),1) AS average_value
FROM home_value_data
WHERE years = ‘2017’
OR years = ‘2007’
OR years = ‘1997’
GROUP BY 1, 2
ORDER BY 1 ASC
)

1 Like

Hello.
I am trying to answer the question “Using the most recent month of data available, which states have the highest average home values? How about the lowest?”.

[INPUT]
SELECT state, ROUND(AVG(home_value), 0)
FROM home_value_data
WHERE date = 2017-11
ORDER BY state ASC;

[OUTPUT]
state AVG
state 234493


This is the data I took for the table.

What I want to do is take the average (AVG) of home_value, and list it on a DISTINCT state, but when I do it only outputs a single value with an output of average (AVG) of the entire database of home_value? I want to take the average (AVG) out of every individual state that is returned.
I am so confused and I need assistance!

False alarm:

1 SELECT DISTINCT field3, ROUND(AVG(field7), 0)
2 FROM home_value_data
3 WHERE field6 >= 2017-11
4 ORDER BY field3 ASC;

Line 4 actually required a GROUP BY, so the correct syntax goes:

SELECT DISTINCT field3, ROUND(AVG(field7), 0)
FROM home_value_data
WHERE field6 >= 2017-11
GROUP BY field3;

That’s how I made it work.

Where can I find solutions to this problem set? I’ve finished most of it but am having trouble with the intermediate and beyond questions (the information provided on WITH clause wasn’t sufficient enough to answer the question). Please help!

2 Likes

Hello!
Can somebody walk me through the solutions for these questions? I am stuck and having a hard time. I’m trying to do the first Intermediate Challenge question and I’m stuck on the WITH Clause portion. I tried googling but nothing I’m doing is working.

Intermediate Challenge

  • What is the percent change in average home values from 2007 to 2017 by state? How about from 1997 to 2017?
    Hint: We can use the WITH clause to create temporary tables containing the average home values for each of those years, then join them together to compare the change over time.

  • How would you describe the trend in home values for each state from 1997 to 2017? How about from 2007 to 2017? Which states would you recommend for making real estate investments?

Advanced Challenge

  • Join the house value data with the table of zip-code level census data. Do there seem to be any correlations between the estimated house values and characteristics of the area, such as population count or median household income?

Thank you in advance…

1 Like

Is there a solution for this project? I’m having a bit of trouble with the substr()

What do you need help with on substr()?

Can anyone share an example of how multiple temporary tables can be defined with one instance of the WITH keyword?

Hope this helps…
Was stuck here as well.

SELECT DISTINCT MIN(substr(date,1,4)) AS 'Start Year', MAX(substr(date,1,4)) AS 'End Year'
FROM home_value_data;
3 Likes

Thanks for your help.

Can you pls check these queries below about the question "How many zip codes are from each state?’’ is true or not.

Thanks a lot!

SELECT DISTINCT zip_code, state, COUNT()
FROM home_value_data
GROUP BY 2
ORDER BY 2 ASC;

You can try with link below

https://www.codecademy.com/learn/learn-sql/modules/learn-sql-multiple-tables/cheatsheet

Hope it helps.

Since we’re not asked for distinct zip codes per state,
That would be the answer:

SELECT state, COUNT(zip_code)
FROM home_value_data
GROUP BY state;

You can also, as you wrote in your example, exchange GROUP BY state to GROUP BY 1.

And,
if you’d like to find distinct zips per state:

SELECT state, COUNT(DISTINCT zip_code)
FROM home_value_data
GROUP BY state;

I believe (correct me if I’m wrong) the logic is that we first choose the columns that stand for themselves (without any manipulation to the data), then add the columns we want to manipulate.

2 Likes

Many thank, I will check it out.

And yes, I wrote this queries before but when I check the tables, there are many cities in a state and different zip-code from each city. That’s why I asked this question.

When I try to open the csv file in DB Browser it tells me I need a password to access the table. Anyone else have this problem?

I’m having some trouble with this as well. Is there anyone who has managed to figure this out?

Did you click open database in DB Browser then open the CSV file?

When I was starting, I came across this problem but it was because I was opening the file incorrectly within the DB browser.

Does anyone know the query for the question below?

What is the [percent change] in average home values from 2007 to 2017 by state? How about from 1997 to 2017?

  • Hint: We can use the WITH clause to create temporary tables containing the average home values for each of those years, then join them together to compare the change over time.

Codecademy should make a project solution…
Would help greatly! :slight_smile:

1 Like