HI everyone,
I was struggling with an example given in the section “Analyzing User Churn”, its a logic problem.
This is the code given to test for march churn rates. Apologies for the incorrect format as I could not find an sql option.
SELECT COUNT(DISTINCT user_id) AS enrollments,
COUNT(CASE
WHEN strftime("%m", cancel_date) = '03'
THEN user_id
END) AS march_cancellations,
ROUND(100.0 * COUNT(CASE
WHEN strftime("%m", cancel_date) = '03'
THEN user_id
END) / COUNT(DISTINCT user_id)) AS churn_rate
FROM pro_users
WHERE signup_date < '2017-04-01'
AND (
(cancel_date IS NULL) OR
(cancel_date > '2
My issue lies in the last few lines beginning with the where statement.
My assumption is that a sign up date is being filtered in order to get accounts created prior, hence being less than April. However, what is “Cancel Date is null” doing? Why are we filtering for active subscriptions OR Cancel date > march. The cancel date section seems to be the end of the filter to gather the data we need. But why OR?
Could you link the the lesson? That would be useful.
I’m unsure of the details but is a cancel date of NULL suggesting it was never cancelled?
Part of the query is missing from there onwards but is it perhaps filtering things that have been cancelled before March? It’s a little hard to guess at present.
If you’re dropping code to the forums that cannot be executed then the markdown codeblock formatting is probably nicer than a codebyte (at least it’s not squashed in a tiny window).
Something like the following dentotes a codeblock-
```sql
SELECT * FROM table;
```
and it comes out like this when posted-
I’m guessing the last condition is (cancel_date > '2017-03-01'), in which case you need to account for people who cancelled their subscriptions but have prepaid for a certain amount of time and so their account is still active during the month of march. Like if you purchase Codecademy pro for a year but decide 10 months in you don’t want another year, so you cancel it. You’ll still get those last 2 months because you already paid for them.
SELECT COUNT(DISTINCT user_id) AS enrollments,
COUNT(CASE
WHEN strftime("%m", cancel_date) = '03'
THEN user_id
END) AS march_cancellations,
ROUND(100.0 * COUNT(CASE
WHEN strftime("%m", cancel_date) = '03'
THEN user_id
END) / COUNT(DISTINCT user_id)) AS churn_rate
FROM pro_users
WHERE signup_date < '2017-04-01'
AND (
(cancel_date IS NULL) OR
(cancel_date > '2
Ah yes, I get you now, query is looking for folks who are active subscribers at the start of the month (month in which churn is being calculated) so you’re absolutely right there. As you say folks who sign up afterwards are excluded and folks who’ve cancelled prior to the date of interest are excluded.
If I’m not mistaken the ORIS NULL check is added because the comparison operators >, < etc. don’t work well with NULL values. So if you have some users who never cancelled you risk failing to include them in your “active subscriber count” (giving you an incorrect churn value).
Dealing sensibly with NULLs is difficult in a lot of SQL problems and you should always keep them in mind, they often behave unexpectedly within certain functions and operations.
In this particular example I don’t think it matters (the selections you make mean there aren’t any NULLs) but it could matter on a different problem. This course goes into churn with a lot more detail later on so keep an eye out.
Right this finally clicked in my head. The query excludes people who signed up after 31st of march, as well as people who cancelled before march, because we are looking at the churn rate for March.
I will definitely keep an eye out for null values and churn rate. Thank you for the heads up.
One notice though, after querying is null values, it does in fact result in users that never cancelled like you said. However, how is it not being used? After the AND statement, the OR means one or both values need to be true. In this case I was under the impression that we are accounting for people that never unsubscribed.
Sorry you’ve lost me a bit. The query targets the users active at the start of March. To do this it-
Makes sure they were subscribed beforehand
Makes sure they didn’t unsubscribe beforehand
To do part 2 it checks the date, but checking the date using the comparsion operators (e.g. >) can go wrong when dealing with NULLs as the three-value logic is not fully accounted for.
The NULL example here returns NULL. If you use that in your WHERE clause it will not do what you might expect. Including OR makes sure you get those who: