Hi Everybody,
I am stuck on the very final exam of Advanced Databases.
The question was the following :
You’ve been given a database for an online streaming service that first released in 2020. The database has a table customers
with a column net_new
that records net new monthly subscriptions to the streaming service broken down by tiers: free, standard, and premium.
Write a SELECT
statement using windows functions to calculate two new columns from net_new
:
-
total
: this column should contain a running total count of customers partitioned by the three subscription tiers
-
year_over_year
: for each month, this column should subtract the net_new
value 12 months earlier from the current net_new
value. This calculation should also be partitioned by level
. The year_over_year
value for the first 12 months (2020) should be null.
Your final SELECT
query should have the columns year
, month
, level
, total
, and year_over_year
in that order.
Here’s my answer:
SELECT
year,
month,
level,
SUM(net_new) OVER (PARTITION BY level ORDER BY month) AS total,
CASE
WHEN LAG(net_new, 12) OVER (PARTITION BY level ORDER BY month) IS NULL THEN NULL
ELSE net_new - LAG(net_new, 12) OVER (PARTITION BY level ORDER BY month)
END AS year_over_year
FROM customers
ORDER BY year, month, level
It runs fine, giving I think the right answers but it won’t pass :-/
Any idea why ?
Thanks in advance.
Guillaume
Hi Guillaume! I’m having the same issue.
I was using practically the same code as you, but I noticed that with that code the first four months of every year have NULL values in the year_over_year column, which is not what the question is asking to resolve.
I then modified the code to this:
SELECT year, month, level,
SUM(net_new) OVER (
PARTITION BY level
ORDER BY year, month
) AS ‘total’,
CASE
WHEN net_new - LAG(net_new, 12) OVER (
PARTITION BY level
ORDER BY year, month
) IS NULL THEN NULL
ELSE net_new - LAG(net_new, 12) OVER (
PARTITION BY level
ORDER BY year, month
)
END AS ‘year_over_year’
FROM customers
ORDER by year
And that solved the issue, now all the year_over_year values for 2020 (for every level) have NULL values while all the other values (for all the other years, months and levels) are not NULL, but when I hit the check answer button the same message appears, asking me to check if I selected the right columns in the right order (I did) and if I’m using SUM and LAG correctly (this may be the problem I think?).
I noticed that this database has a ‘total’ column, maybe that is what is generating the conflict? Like, we are asked to name a column as ‘total’ but there is already a column with that name, maybe that is why the error message says to check if we are selecting the right columns.
I don’t know, I hope someone else can help.
Best regards!
Hey!
Just wanted to let you know that I figured out the problem.
Indeed, the error was the one I pointed out in my previous comment. The thing is I kept getting an error message because I had an ORDER by after the “FROM customers” line. I checked without that last ORDER BY clause and it passed.
I hope it helps.
Best regards!
Thanks Claudio
All good now indeed.