https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-get-started-with-sql/modules/what-is-sql/lessons/why-sql-ii/exercises/sql-churn
This exercise is seemingly meant to determine the churn rate for cancellations in a particular month - in this case, March 2017. This is a minor point, but I wonder why the WHERE clause selects the cancel_date when it’s GREATER THAN ‘2017-03-01’ instead of GREATER THAN OR EQUAL TO ‘2017-03-01’.
Why would we exclude March cancellations that take place on the 1st of the month?
CODE that comes with the exercise
SELECT COUNT(DISTINCT user_id) AS enrollments,
COUNT(CASE
WHEN strftime(“%m”, cancel_date) = ‘03’
THEN user_id
END) AS march_cancellations,
ROUND(100.0 * COUNT(CASE
WHEN strftime(“%m”, cancel_date) = ‘03’
THEN user_id
END) / COUNT(DISTINCT user_id)) AS churn_rate
FROM pro_users
WHERE signup_date < ‘2017-04-01’
AND (
(cancel_date IS NULL) OR
(cancel_date > ‘2017-03-01’)
);
greater than 3-1-2017 includes March cancellations, right?
Thank you for responding. Greater than 3-1-2017 includes MOST March cancellations, but excludes the cancellations that occur ON 3-1-2017. I was wondering if there’s a reason why we’d exclude the cancellations that specifically occur ON March 1
Gotcha. Did you try: >=
to see what happens?
SELECT COUNT(DISTINCT user_id) AS enrollments,
COUNT(CASE
WHEN strftime("%m", cancel_date) = '03'
THEN user_id
END) AS march_cancellations,
ROUND(100.0 * COUNT(CASE
WHEN strftime("%m", cancel_date) = '03'
THEN user_id
END) / COUNT(DISTINCT user_id)) AS churn_rate
FROM pro_users
WHERE signup_date < '2017-04-01'
AND (
(cancel_date IS NULL) OR
(cancel_date >= '2017-03-01')
);
>>enrollments march_cancellations churn_rate
>>16479 4209 26.0
Yes, I tried the greater than, which I believe gives the correct number, as far as I can tell.
I was just wondering if the existing code deliberately excluded March 1 for adding up the cancellations for March, or if it was just an oversight. I’m just learning this material now, so I wasn’t sure if I was missing a key point.
I wonder if it’s a typo on their part? I mean, I’d write the code how I had it above to include any cancellations on 3.1.2017 AND > than that date (as you said).