Data Science Independent Project #2 – Explore a Sample Database

I just started working on this and I have some questions to discuss.
For question 2 of " Which track/album/genre generated the most revenue?", I used GROUP BY with trackName/albumName/genreName and the results are different.
May I know using trackName for GROUP BY for this question is correct? Or trackID? Appreciate the help.

You’re the first to show the solutions to the first part of the assignment. When it comes to “which track generated the most revenue”, I’m having trouble understanding where the calculation took place in your query. How does that SUM statement get you the “Total Revenue”? Shouldn’t that be multiplication?

Any help would be much appreciated!

I had a question about your answer to the “Do longer or shorter length albums generate more revenue?”
When joining ‘tracks’ with ‘album_length’ only I get there a 82 albums that have a 1 track count. When you join it with the ‘invoice_items’ table as well, that number drops to 41. All of the track counts drop in various amounts. I am trying to figure out why.

WITH album_length AS(
SELECT
tracks.albumId AS ‘AlbumId’,
COUNT(tracks.TrackId) AS ‘TrackCount’
FROM tracks
GROUP BY 1
ORDER BY 2)

SELECT
album_length.TrackCount as ‘Number of Tracks’,
COUNT(DISTINCT album_length.AlbumId) as ‘Number of Albums’
FROM tracks
JOIN album_length
ON tracks.AlbumId = album_length.AlbumId
GROUP BY 1;

(Without any invoice info produces this:

1 82
2 8
3 3
4 2
5 2
6 3
7 8
8 14
9 15
10 27
11 25
12 29
13 16
14 34
15 18
16 13
17 15
18 8
19 3
20 5
21 2
22 3
23 3
24 3
25 2
26 1
30 1
34 1
57 1

When you add the invoice_items table:

WITH album_length AS(
SELECT
tracks.albumId AS ‘AlbumId’,
COUNT(tracks.TrackId) AS ‘TrackCount’
FROM tracks
GROUP BY 1
ORDER BY 2)

SELECT
album_length.TrackCount as ‘Number of Tracks’,
COUNT(DISTINCT album_length.AlbumId) as ‘Number of Albums’
FROM tracks
JOIN album_length
ON tracks.AlbumId = album_length.AlbumId
JOIN invoice_items
ON tracks.TrackId = invoice_items.TrackId
GROUP BY 1;

It produces this:

Number of tracks | Number of Albums
|1|41|
|2|6|
|3|3|
|4|2|
|5|2|
|6|3|
|7|8|
|8|14|
|9|15|
|10|27|
|11|25|
|12|29|
|13|16|
|14|34|
|15|18|
|16|13|
|17|15|
|18|8|
|19|3|
|20|5|
|21|2|
|22|3|
|23|3|
|24|3|
|25|2|
|26|1|
|30|1|
|34|1|
|57|1|

Hallo! I am new here, is the following answer correct for the first question?
Thank you very much!

WITH temporary_playlist AS (
SELECT *
FROM tracks
JOIN playlist_track ON
playlist_track.Trackid = tracks.Trackid
)

SELECT Name , COUNT(Trackid)
FROM temporary_playlist
GROUP BY Trackid
HAVING COUNT(Trackid) > 1;

Hello everyone! Here is my attempt at the answers for basic requirements…
The answers are based on how I have understood the questions. So, please feel free to correct me if I’m wrong or ask for any clarifications if needed. Hope it is helpful!

Which tracks appeared in the most playlists? how many playlist did they appear in?
SELECT TrackId, count(*) FROM playlist_track
GROUP BY 1
ORDER BY 2 DESC;

There were 41 tracks that appeared in most playlists (i.e. 5 playlists)

Additional related query
WITH subset1 AS (
	SELECT TrackId, count(*) AS 'count' FROM playlist_track
	GROUP BY 1)
SELECT subset1.count, count(*) AS 'no_of_tracks' FROM subset1
GROUP BY 1
ORDER BY 1 DESC;

This query shows the output as follows:

count no_of_tracks
5 41
4 70
3 1446
2 1946

 

Which track generated the most revenue? which album? which genre?
  • Tracks generating the most revenue -
          SELECT TrackId, sum(UnitPrice*Quantity) FROM invoice_items
          GROUP BY 1
          ORDER BY 2 DESC;

There were 8 eight tracks generating the highest revenue ($3.98)

  • Album generating the most revenue -
          SELECT tracks.AlbumId, albums.Title, artists.Name AS 'Artist', sum(invoice_items.UnitPrice) FROM invoice_items
          JOIN tracks
          ON invoice_items.TrackId = tracks.TrackId
          JOIN albums
          ON tracks.AlbumId = albums.AlbumId
          JOIN artists
          ON albums.ArtistId = artists.ArtistId
          GROUP BY 1
          ORDER BY 4 DESC;

‘Battlestar Galactica (Classic), Season 1’ generated the most revenue ($35.82)

  • Genre generating the most revenue -
          SELECT tracks.GenreId, genres.Name, round(sum(invoice_items.UnitPrice),2) FROM invoice_items
          JOIN tracks
          ON invoice_items.TrackId = tracks.TrackId
          JOIN genres
          ON tracks.GenreId = genres.GenreId
          GROUP BY 1
          ORDER BY 3 DESC;

Rock genre generated the most revenue ($826.65)

 

Which countries have the highest sales revenue? What percent of total revenue does each country make up?
WITH
table1 AS (SELECT sum(invoices.Total) AS 'total_revenue' FROM invoices)
SELECT BillingCountry, sum(Total),
       round(((sum(Total)/table1.total_revenue)*100),2) AS '% of total revenue'
FROM invoices, table1
GROUP BY 1
ORDER BY 3 DESC;

USA and Canada have the highest sales revenue with 22.46% and 13.05% respectively.

 

How many customers did each employee support, what is the average revenue for each sale, and what is their total sale?

I did not understand what ‘avg revenue for each sale’ means in this case, so I have not included that in the query…

WITH
emp_data AS (
	SELECT CustomerId, SupportRepId AS 'emp_id' FROM customers),
	
subset1 AS (
	SELECT invoices.CustomerId, sum(invoices.Total) 'sales_total', emp_data.emp_id AS 'emp_id' FROM invoices
	JOIN emp_data ON invoices.CustomerId = emp_data.CustomerId
	GROUP BY 1),

final_set AS (
	SELECT emp_id, sum(sales_total) AS 'total_sale' FROM subset1
	GROUP BY 1)

SELECT EmployeeId, count(CustomerId) AS 'cust_count', final_set.total_sale FROM employees
LEFT JOIN customers ON employees.EmployeeId = customers.SupportRepId
LEFT JOIN final_set ON employees.EmployeeId = final_set.emp_id
GROUP BY 1;

Yup, your answer is correct!
Only addition that I would do is add an ORDER BY clause to the final query

1 Like