Data Science Independent Project #2 – Explore a Sample Database

Project: Exploring a SQLite Sample Database

This project will take you off-platform and get you started in your own developer environment! Never done that before? Not to worry - we’ve shared some resources to help you down below. This project can be completed entirely on your own - or, you can join our Community Discord and find someone to work with! Jump to the community support section to hear more about this.

This project is broken down into key questions that your client or company is looking to answer. As a data scientist, you’ll often become a resource to help businesses answer the key questions about the efficacy of existing or potential strategies & projects.



You are the lead data analyst for a popular music store. Help them analyze their sales and service! Download the SQLite sample database provided here.


In order to complete this project, we suggest that you have familiarity with the content in the following courses or lessons on the Codecademy platform:

  1. Manipulation
  2. Queries
  3. Aggregate Functions
  4. Multiple Tables

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to complete this project - we suggest the following:

  1. DB Browser for SQLite

Project Tasks

Get started - hosting your project

DB Browser for SQLite is a visual tool for working with SQLite databases. Follow the link to download the application for your computer.

  • SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To open a database, select Open Database at the top of the window and browse for the file. Alternatively, you can choose to create a New Database by saving a file with the .sqlite or .db extension.
  • To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to convert fields to numeric type as needed. See here for how to do that.)
  • You can download the data you’ll be using for this specific project here.

There are several tabs near the top of the window for working with the data:

  • Database Structure: View the tables in your database and the columns they contain.
  • Browse Data: Browse the data for each table.
  • Execute SQL: Write and execute SQL queries.

Basic Requirements

Let’s break this project down into a couple different parts.

Analyzing the Data:

  • Which tracks appeared in the most playlists? how many playlist did they appear in?

  • Which track generated the most revenue? which album? which genre?

  • Which countries have the highest sales revenue? What percent of total revenue does each country make up?

  • How many customers did each employee support, what is the average revenue for each sale, and what is their total sale?

Additional Challenges

Intermediate Challenge

  • Do longer or shorter length albums tend to generate more revenue?
    • Hint: We can use the WITH clause to create a temporary table that determines the number of tracks in each album, then group by the length of the album to compare the average revenue generated for each.
  • Is the number of times a track appear in any playlist a good indicator of sales?
    • Hint: We can use the WITH clause to create a temporary table that determines the number of times each track appears in a playlist, then group by the number of times to compare the average revenue generated for each.

Advanced Challenge

  • How much revenue is generated each year, and what is its percent change from the previous year?
    • Hint: The InvoiceDate field is formatted as ‘yyyy-mm-dd hh:mm:ss’. Try taking a look at using the strftime() function to help extract just the year. Then, we can use a subquery in the SELECT statement to query the total revenue from the previous year. Remember that strftime() returns the date as a string, so we would need to CAST it to an integer type for this part. Finally, since we cannot refer to a column alias in the SELECT statement, it may be useful to use the WITH clause to query the previous year total in a temporary table, and then calculate the percent change in the final SELECT statement.

Resources & Support

Project-specific resources

  1. SQLite Date and Time Functions
  2. SQLite strftime() Function
  3. CAST expression
  4. SQLite Documentation
  5. SQLite Tutorial

General Resources

  1. How to get set-up for coding on your computer
  2. What is a Relational Database Management System?
  3. What you need to know about Git, GitHub & Coding in Teams
  4. How developer teams work
  5. First steps in tackling a group project
  6. Resource on writing pseudocode to get started with off-platform projects

Community Support

Looking for additional help or someone to work with (or somewhere to brag about your finished project)? Join our Community Discord to meet other learners like yourself!

Once you’re done…

Share in the Forums for feedback and to see some other ways of solving this problem!


Thanks for setting this project up, it was a great independent learning task!

I found that as I was going through it would have been handy to have an idea of the answers, so I could tell whether or not my solutions were correct or not.

I also misinterpreted “Is the number of times a track appear in any playlist a good indicator of sales?” because of the ambiguity of the word sales - when I attempted the task I had thought it meant unit sales, but when reviewing the hint it became clear it was average revenue rather than average unit sales.

I’ve shared my commented queries here - curious to see how others did it.


I did the first part where it asks for tracks that appear in the most playlists, but I was wondering how we can find out which playlists these tracks appear in, without manually looking up the track ID in the playlist_track table?

Thank you for the project.

If anyone is interested, here are my answers to the tasks.


You can use this script to check that:
select tracks.TrackId , tracks.Name as ‘Track Name’, playlists.PlaylistId, playlists.Name as ‘Playlist Name’
from tracks left join playlist_track on tracks.TrackId = playlist_track.TrackId left join playlists on playlist_track.PlaylistId = playlists.PlaylistId
order by playlists.PlaylistId;

The list is ordered by PlaylistId.

Here are my answers to the project.

Hey newtr, thanks for sharing.

I believe you made a mistake on the second intermediate challenge; you’ve divided the total revenue per playlist frequency (i.e. 2-5) with the total number of songs which generated revenue per playlist frequency.

Your code gives an answer of about 1.0 for everything; not surprising, since you’re basically calculating occurrence*AVG(revenue)/occurrence.

I believe you need to account for songs which are part of the different playlist frequencies (i.e. 2-5) but which don’t make any sales. This should show that the less playlists a song appears on, the more revenue it makes on average.

Just commenting so others can see.