Exercise - Analyzing User Churn for March 2017 - Wondering why we're ignoring the cancellations ON 2017-03-01

https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-get-started-with-sql/modules/what-is-sql/lessons/why-sql-ii/exercises/sql-churn

This exercise is seemingly meant to determine the churn rate for cancellations in a particular month - in this case, March 2017. This is a minor point, but I wonder why the WHERE clause selects the cancel_date when it’s GREATER THAN ‘2017-03-01’ instead of GREATER THAN OR EQUAL TO ‘2017-03-01’.

Why would we exclude March cancellations that take place on the 1st of the month?


CODE that comes with the exercise
SELECT COUNT(DISTINCT user_id) AS enrollments,
COUNT(CASE
WHEN strftime(“%m”, cancel_date) = ‘03’
THEN user_id
END) AS march_cancellations,
ROUND(100.0 * COUNT(CASE
WHEN strftime(“%m”, cancel_date) = ‘03’
THEN user_id
END) / COUNT(DISTINCT user_id)) AS churn_rate
FROM pro_users
WHERE signup_date < ‘2017-04-01’
AND (
(cancel_date IS NULL) OR
(cancel_date > ‘2017-03-01’)
);


greater than 3-1-2017 includes March cancellations, right?

Thank you for responding. Greater than 3-1-2017 includes MOST March cancellations, but excludes the cancellations that occur ON 3-1-2017. I was wondering if there’s a reason why we’d exclude the cancellations that specifically occur ON March 1

Gotcha. Did you try: >= to see what happens?

SELECT COUNT(DISTINCT user_id) AS enrollments,
	COUNT(CASE
       	WHEN strftime("%m", cancel_date) = '03'
        THEN user_id
  END) AS march_cancellations,
 	ROUND(100.0 * COUNT(CASE
       	WHEN strftime("%m", cancel_date) = '03'
        THEN user_id
  END) / COUNT(DISTINCT user_id)) AS churn_rate
FROM pro_users
WHERE signup_date < '2017-04-01'
	AND (
    (cancel_date IS NULL) OR
    (cancel_date >= '2017-03-01')
  );

>>enrollments	march_cancellations	churn_rate
>>16479	          4209	                26.0

Yes, I tried the greater than, which I believe gives the correct number, as far as I can tell.
I was just wondering if the existing code deliberately excluded March 1 for adding up the cancellations for March, or if it was just an oversight. I’m just learning this material now, so I wasn’t sure if I was missing a key point.

I wonder if it’s a typo on their part? I mean, I’d write the code how I had it above to include any cancellations on 3.1.2017 AND > than that date (as you said).

Thank you for your help!

1 Like