EXAM : Advanced Databases (part 2) - Data Scientist : Analytics - Career Path

Hi Everybody,
I am stuck on the very final exam of Advanced Databases.

The question was the following :
You’ve been given a database for an online streaming service that first released in 2020. The database has a table customers with a column net_new that records net new monthly subscriptions to the streaming service broken down by tiers: free, standard, and premium.

Write a SELECT statement using windows functions to calculate two new columns from net_new:

  1. total: this column should contain a running total count of customers partitioned by the three subscription tiers
  2. year_over_year: for each month, this column should subtract the net_new value 12 months earlier from the current net_new value. This calculation should also be partitioned by level. The year_over_year value for the first 12 months (2020) should be null.

Your final SELECT query should have the columns year, month, level, total, and year_over_year in that order.

Here’s my answer:

SELECT
year,
month,
level,
SUM(net_new) OVER (PARTITION BY level ORDER BY month) AS total,
CASE
WHEN LAG(net_new, 12) OVER (PARTITION BY level ORDER BY month) IS NULL THEN NULL
ELSE net_new - LAG(net_new, 12) OVER (PARTITION BY level ORDER BY month)
END AS year_over_year
FROM customers
ORDER BY year, month, level

It runs fine, giving I think the right answers but it won’t pass :-/

Any idea why ?

Thanks in advance.
Guillaume

Hi Guillaume! I’m having the same issue.

I was using practically the same code as you, but I noticed that with that code the first four months of every year have NULL values in the year_over_year column, which is not what the question is asking to resolve.

I then modified the code to this:

SELECT year, month, level,
SUM(net_new) OVER (
PARTITION BY level
ORDER BY year, month
) AS ‘total’,
CASE
WHEN net_new - LAG(net_new, 12) OVER (
PARTITION BY level
ORDER BY year, month
) IS NULL THEN NULL
ELSE net_new - LAG(net_new, 12) OVER (
PARTITION BY level
ORDER BY year, month
)
END AS ‘year_over_year’
FROM customers
ORDER by year

And that solved the issue, now all the year_over_year values for 2020 (for every level) have NULL values while all the other values (for all the other years, months and levels) are not NULL, but when I hit the check answer button the same message appears, asking me to check if I selected the right columns in the right order (I did) and if I’m using SUM and LAG correctly (this may be the problem I think?).

I noticed that this database has a ‘total’ column, maybe that is what is generating the conflict? Like, we are asked to name a column as ‘total’ but there is already a column with that name, maybe that is why the error message says to check if we are selecting the right columns.

I don’t know, I hope someone else can help.

Best regards!

Hey!

Just wanted to let you know that I figured out the problem.

Indeed, the error was the one I pointed out in my previous comment. The thing is I kept getting an error message because I had an ORDER by after the “FROM customers” line. I checked without that last ORDER BY clause and it passed.

I hope it helps.

Best regards!

Thanks Claudio :slight_smile: All good now indeed.