Data Science Independent Project #2 – Explore a Sample Database

I just started working on this and I have some questions to discuss.
For question 2 of " Which track/album/genre generated the most revenue?", I used GROUP BY with trackName/albumName/genreName and the results are different.
May I know using trackName for GROUP BY for this question is correct? Or trackID? Appreciate the help.

You’re the first to show the solutions to the first part of the assignment. When it comes to “which track generated the most revenue”, I’m having trouble understanding where the calculation took place in your query. How does that SUM statement get you the “Total Revenue”? Shouldn’t that be multiplication?

Any help would be much appreciated!

I had a question about your answer to the “Do longer or shorter length albums generate more revenue?”
When joining ‘tracks’ with ‘album_length’ only I get there a 82 albums that have a 1 track count. When you join it with the ‘invoice_items’ table as well, that number drops to 41. All of the track counts drop in various amounts. I am trying to figure out why.

WITH album_length AS(
SELECT
tracks.albumId AS ‘AlbumId’,
COUNT(tracks.TrackId) AS ‘TrackCount’
FROM tracks
GROUP BY 1
ORDER BY 2)

SELECT
album_length.TrackCount as ‘Number of Tracks’,
COUNT(DISTINCT album_length.AlbumId) as ‘Number of Albums’
FROM tracks
JOIN album_length
ON tracks.AlbumId = album_length.AlbumId
GROUP BY 1;

(Without any invoice info produces this:

1 82
2 8
3 3
4 2
5 2
6 3
7 8
8 14
9 15
10 27
11 25
12 29
13 16
14 34
15 18
16 13
17 15
18 8
19 3
20 5
21 2
22 3
23 3
24 3
25 2
26 1
30 1
34 1
57 1

When you add the invoice_items table:

WITH album_length AS(
SELECT
tracks.albumId AS ‘AlbumId’,
COUNT(tracks.TrackId) AS ‘TrackCount’
FROM tracks
GROUP BY 1
ORDER BY 2)

SELECT
album_length.TrackCount as ‘Number of Tracks’,
COUNT(DISTINCT album_length.AlbumId) as ‘Number of Albums’
FROM tracks
JOIN album_length
ON tracks.AlbumId = album_length.AlbumId
JOIN invoice_items
ON tracks.TrackId = invoice_items.TrackId
GROUP BY 1;

It produces this:

Number of tracks | Number of Albums
|1|41|
|2|6|
|3|3|
|4|2|
|5|2|
|6|3|
|7|8|
|8|14|
|9|15|
|10|27|
|11|25|
|12|29|
|13|16|
|14|34|
|15|18|
|16|13|
|17|15|
|18|8|
|19|3|
|20|5|
|21|2|
|22|3|
|23|3|
|24|3|
|25|2|
|26|1|
|30|1|
|34|1|
|57|1|

Hallo! I am new here, is the following answer correct for the first question?
Thank you very much!

WITH temporary_playlist AS (
SELECT *
FROM tracks
JOIN playlist_track ON
playlist_track.Trackid = tracks.Trackid
)

SELECT Name , COUNT(Trackid)
FROM temporary_playlist
GROUP BY Trackid
HAVING COUNT(Trackid) > 1;

Hello everyone! Here is my attempt at the answers for basic requirements…
The answers are based on how I have understood the questions. So, please feel free to correct me if I’m wrong or ask for any clarifications if needed. Hope it is helpful!

Which tracks appeared in the most playlists? how many playlist did they appear in?
SELECT TrackId, count(*) FROM playlist_track
GROUP BY 1
ORDER BY 2 DESC;

There were 41 tracks that appeared in most playlists (i.e. 5 playlists)

Additional related query
WITH subset1 AS (
	SELECT TrackId, count(*) AS 'count' FROM playlist_track
	GROUP BY 1)
SELECT subset1.count, count(*) AS 'no_of_tracks' FROM subset1
GROUP BY 1
ORDER BY 1 DESC;

This query shows the output as follows:

count no_of_tracks
5 41
4 70
3 1446
2 1946

 

Which track generated the most revenue? which album? which genre?
  • Tracks generating the most revenue -
          SELECT TrackId, sum(UnitPrice*Quantity) FROM invoice_items
          GROUP BY 1
          ORDER BY 2 DESC;

There were 8 eight tracks generating the highest revenue ($3.98)

  • Album generating the most revenue -
          SELECT tracks.AlbumId, albums.Title, artists.Name AS 'Artist', sum(invoice_items.UnitPrice) FROM invoice_items
          JOIN tracks
          ON invoice_items.TrackId = tracks.TrackId
          JOIN albums
          ON tracks.AlbumId = albums.AlbumId
          JOIN artists
          ON albums.ArtistId = artists.ArtistId
          GROUP BY 1
          ORDER BY 4 DESC;

‘Battlestar Galactica (Classic), Season 1’ generated the most revenue ($35.82)

  • Genre generating the most revenue -
          SELECT tracks.GenreId, genres.Name, round(sum(invoice_items.UnitPrice),2) FROM invoice_items
          JOIN tracks
          ON invoice_items.TrackId = tracks.TrackId
          JOIN genres
          ON tracks.GenreId = genres.GenreId
          GROUP BY 1
          ORDER BY 3 DESC;

Rock genre generated the most revenue ($826.65)

 

Which countries have the highest sales revenue? What percent of total revenue does each country make up?
WITH
table1 AS (SELECT sum(invoices.Total) AS 'total_revenue' FROM invoices)
SELECT BillingCountry, sum(Total),
       round(((sum(Total)/table1.total_revenue)*100),2) AS '% of total revenue'
FROM invoices, table1
GROUP BY 1
ORDER BY 3 DESC;

USA and Canada have the highest sales revenue with 22.46% and 13.05% respectively.

 

How many customers did each employee support, what is the average revenue for each sale, and what is their total sale?

I did not understand what ‘avg revenue for each sale’ means in this case, so I have not included that in the query…

WITH
emp_data AS (
	SELECT CustomerId, SupportRepId AS 'emp_id' FROM customers),
	
subset1 AS (
	SELECT invoices.CustomerId, sum(invoices.Total) 'sales_total', emp_data.emp_id AS 'emp_id' FROM invoices
	JOIN emp_data ON invoices.CustomerId = emp_data.CustomerId
	GROUP BY 1),

final_set AS (
	SELECT emp_id, sum(sales_total) AS 'total_sale' FROM subset1
	GROUP BY 1)

SELECT EmployeeId, count(CustomerId) AS 'cust_count', final_set.total_sale FROM employees
LEFT JOIN customers ON employees.EmployeeId = customers.SupportRepId
LEFT JOIN final_set ON employees.EmployeeId = final_set.emp_id
GROUP BY 1;

Yup, your answer is correct!
Only addition that I would do is add an ORDER BY clause to the final query

1 Like

Hi Everyone

Where do I get the project information from to start? I’ve downloaded SQLite, just need the CSV file now.

Hi,
you can find the link to db at the top of this page:
"
#Overview

Objective

You are the lead data analyst for a popular music store. Help them analyze their sales and service! Download the SQLite sample database provided here .
"

Thanks for the help!

Hi! Thanks for the great opportunity to work with SQL. Here is my answers GitHub - halb-nat/musical-store.

1 Like

Has anyone successfully updated the table “tracks” to fix the repeated “2 Minutes To Midnight?” I tried:

UPDATE tracks
SET TrackID = 1221
WHERE name = “2 Minutes To Midnight”;

but I get the error:

“Result: UNIQUE constraint failed: tracks.TrackId”

@awkardarm , Hi.

Are you done with the project? I’m just looking with someone to do it with.

I finished the project and just accepted the fact that the repeated values altered the averages.

Do you do a review with projects 1-5?

For everyone struggling on the last part:

[[**Advanced Challenge

  • How much revenue is generated each year, and what is its percent change from the previous year?**]]

Do check out this video on youtube about self-join: It’s about Current, Next, and previous values

Also, please check my code if it works:

/*
WITH previous AS (
WITH previous_table AS(
SELECT CAST (strftime(‘%Y’, invoiceDate) AS int) AS ‘Year’,
SUM(Total) As ‘Revenue’
FROM invoices
GROUP BY 1)

SELECT curr.*,
prev.Year AS ‘Previous_Year’,
prev.Revenue AS ‘Previous_Revenue’,
(curr.Revenue-prev.Revenue)*100/prev.Revenue AS ‘Percent_Change’

FROM previous_table AS curr
LEFT JOIN previous_table AS prev
ON prev.Year=curr.Year-1)

SELECT previous.*,
(CASE WHEN Percent_Change >1 THEN ‘increase’
WHEN Percent_Change <1 THEN ‘decrease’
ELSE ‘NA’
END) AS ‘percentage_increase_or_decrease’
FROM previous;
*/

Really great project and it helped me understand sql even better

I used the jupyter notebook to visualize my sql query outputs

Here is my work

Here’s my solution to: Which tracks appeared in the most playlists? how many playlists did they appear in? @maxstephens358314469 and @elmulix

Code:

SELECT t.Name,
	COUNT(*)
FROM tracks t
JOIN playlist_track pt
	ON t.TrackId = pt.TrackId
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1;

And the result was:

  • 2 Minutes To Midnight
  • It appeared on 13 Playlists

Hi there @newtr, I really appreciate the effort you’ve taken to share your solution with the community. However, it seems you misinterpreted one of the questions: How many customers did each employee support, what is the average revenue for each sale, and what is their total sale?

The question here requires us to find or I believe (because this project’s questions have been really ambiguous) the total number of customers each employee supported, the Average revenue from the customer they supported and the total revenue brought in by the particular employee.

So the correct solution should be:

WITH Customers_expense AS (
SELECT c.Customerid,
	c.SupportRepId,
	Total
FROM customers c
JOIN invoices i
	ON c.CustomerId = i.CustomerId)

SELECT e.LastName, 
	e.FirstName, 
	e.Title,
	COUNT(DISTINCT CustomerId) "No. of Customers Supported",
	ROUND (AVG(Total), 2) "Average Revenue On Each Sale",
	ROUND (SUM(Total), 2) "Total Revenue"
FROM employees e
JOIN customers_expense ce
	ON e.EmployeeId = ce.SupportRepId
GROUP BY 1, 2;

I would love to hear the community’s opinion on this because I myself might be wrong with my interpretation. But I must use this opportunity to say, I really liked how you structured your query using comments and self-explanatory labels. I learned a lot from it and I will try implementing them in future projects.

It seems my solution here might be wrong, I should have grouped the TrackId which is the primary key of the table. There are different tracks on the table with the same name that made this solution wrong but the TrackId which is a Primary is Unique.

Here’s the correct solution:

-- Top 50 Tracks In Appearance in Playlist

SELECT t.TrackId,
	t.Name,
	COUNT(*)  "How many playlists"
FROM tracks t
JOIN playlist_track pt
	ON t.TrackId = pt.TrackId
GROUP BY 1
ORDER BY 3 DESC
LIMIT 50;

You are absolutely right @net3879063533. I believe the correct to Is the number of times a track appear in any playlist a good indicator of sales? should be:

WITH TrackPlaylist AS (
SELECT TrackId,
	COUNT(PlaylistId) No_Playlists
FROM playlist_track 
GROUP BY TrackId),

TrackValue AS (
SELECT TrackId,
	SUM(UnitPrice) TrackRevenue
FROM invoice_items
GROUP BY 1
ORDER BY 2 DESC)

SELECT No_Playlists,
	ROUND (AVG(TrackRevenue), 2) "Average Revenue By Appearance"
FROM TrackPlaylist tp
LEFT JOIN TrackValue tv
	ON tp.TrackId = tv.TrackId
GROUP BY 1
ORDER BY 2 DESC;