FAQ: Calculating Churn - Single Month II

I think that line is active user because they’re counting a user as ACTIVE during the month of January. Even if they cancelled on or after Jan 1st, they were still “active” in Jan?

I think that you missing parenthesis is going to give you a lot of trouble with the syntax. Try adding them. I got it to work with the following but I didn’t make an enrollment table.

WITH status AS
(SELECT
CASE
WHEN (subscription_end > ‘2017-01-31’)
OR (subscription_end IS NULL) THEN 0
ELSE 1
END AS canceled,
CASE
WHEN subscription_start < ‘2017-01-01’
AND ((subscription_end >= ‘2017-01-01’)
OR (subscription_end IS NULL)) THEN 1
ELSE 0
END AS active
FROM subscriptions)

SELECT 1.0 * SUM(canceled) / SUM(active)
AS churn
FROM status;

1 Like

Hello Guy´s

Can you help me understand this problem? Why I need use this condition on “is_active”
In my mind you only need select all clients until “207-01-01”, the total of users in the beging of the month.

AND ((subscription_end >= ‘2017-01-01’) OR (subscription_end IS NULL) ) THEN 1

image

1 Like

If you remove the AND ((subscription_end >= ‘2017-01-01’) OR (subscription_end IS NULL) ) THEN 1 statement, it will also count all subscriptions that started before 2017-01-01 who had ends also before 2017-01-01. For example, someone who purchased on 2016-12-02 and ended the subscription on 2016-12-20 would be picked up by your code, even though it wasn’t active in January. By adding the subscription_end statement, it means the end date was at least some time in January 2017. The IS NULL means it was still active through January as well.

3 Likes

Having a hard time sometimes understanding the parentheses logic.

I tried the following, it gave the correct answer (35) on is_canceled.
Wrong answer (877) on is_active.

WITH status AS 
(SELECT
CASE
  WHEN (subscription_end BETWEEN '2017-01-01'
        AND '2017-01-31') THEN 1
    ELSE 0
  END AS is_canceled,
CASE 
  WHEN (subscription_start < '2017-01-01')
    AND (subscription_end >= '2017-01-01')
    OR (subscription_end IS NULL) THEN 1
    ELSE 0
  END AS is_active
  FROM subscriptions
  )
  
  SELECT SUM(is_canceled), 
  SUM(is_active)
  FROM status;

After digging I realized I have misplaced some parentheses.
Fixed code:

WITH status AS 
(SELECT
CASE
  WHEN (subscription_end BETWEEN '2017-01-01'
        AND '2017-01-31') THEN 1
    ELSE 0
  END AS is_canceled,
CASE 
  WHEN subscription_start < '2017-01-01'
    AND (
      (subscription_end >= '2017-01-01')
      OR (subscription_end IS NULL)
    ) THEN 1
   ELSE 0
END AS is_active
FROM subscriptions
)
  
SELECT SUM(is_canceled), SUM(is_active)
FROM status;

How exactly it produces 877 active users in the first version of the code?
It isn’t clear to me.

2 Likes

Hi there, thank you for the working code, I was having a hard time to use WITH, and your code really helped :slightly_smiling_face:. For your question, the missing parenthesis in the AND and OR, it changes it completely. You can give it some thought and I’m sure you will figure out why.
Cheers :partying_face:

Hello,

Why is WITH function used for creation of the first ‘enrolment’ table, but not second ‘status’ table?

WITH status AS
(SELECT
CASE
WHEN (subscription_end BETWEEN ‘2017-01-01’
AND ‘2017-01-31’) THEN 1
ELSE 0
END AS is_canceled,
CASE
WHEN subscription_start < ‘2017-01-01’
AND (
(subscription_end >= ‘2017-01-01’)
OR (subscription_end IS NULL)
) THEN 1
ELSE 0
END AS is_active
FROM subscriptions
)

SELECT 1.0 * SUM(is_canceled)/ SUM(is_active)
FROM status;

I hope this will work.

I believe the code has a small error in the “is_active” line. Specifically, the
subscription_end >= ‘2017-01-31
vice
subscription_end >= ‘2017-01-01

We’re looking for the active subscriptions over the month of jan17
Am I right? See my commented code below the original code.

ORIGINAL CODE–

CASE
WHEN subscription_start < ‘2017-01-01’
AND (
(subscription_end >= ‘2017-01-01’)
OR (subscription_end IS NULL)
) THEN 1
ELSE 0
END as is_active
FROM enrollments
)

MY UPDATED CODE WITH COMMENTS–
CASE
WHEN subscription_start < ‘2017-01-01’
–Sx was started before 1jan17
AND (
(subscription_end >= ‘2017-01-01’)
–Sx ended after 31jan17
OR (subscription_end IS NULL)
–Sx never ended
) THEN 1
–1 means Sx was Active for jan17
ELSE 0
END AS is_active

1 Like

I understand all of the questions asked above. However, I don’t understand this:
Take a user who started his subscription on 2016-10-25 and his subscription will end on 2017-01-25. When this code analyzes this user, we will get the following:

CASE
  WHEN (subscription_end > '2017-01-31')
    OR (subscription_end IS NULL) THEN 0
    ELSE 1
  END as is_canceled,

It will be True, so he will go to is_canceled. However, when it goes through this

CASE
    WHEN subscription_start < '2017-01-01'
      AND (
        (subscription_end >= '2017-01-01')
        OR (subscription_end IS NULL)
      ) THEN 1
    ELSE 0
  END as is_active

Here it gets True as well, so it is placed in is_active. How can this be possible? Shouldn’t we put subscription_end >=‘2017-02-01’ since if his membership expires in January, he will stop being a subscriber?

Full code, for reference:

WITH enrollments AS
(SELECT *
FROM subscriptions
WHERE subscription_start < '2017-01-01'
AND (
  (subscription_end >= '2017-01-01')
  OR (subscription_end IS NULL)
)),
status AS 
(SELECT
CASE
  WHEN (subscription_end > '2017-01-31')
    OR (subscription_end IS NULL) THEN 0
    ELSE 1
  END as is_canceled,
CASE
    WHEN subscription_start < '2017-01-01'
      AND (
        (subscription_end >= '2017-01-01')
        OR (subscription_end IS NULL)
      ) THEN 1
    ELSE 0
  END as is_active
  FROM enrollments
  )
SELECT 1.0 * SUM(is_canceled) / SUM(is_active)
FROM status;
1 Like

Can anyone explain why the instructions indicate to “create another temporary table that contains an is_canceled status”? In my mind, it seems like we are adding columns or building on the aforementioned Enrollments table, not creating a new temporary table.

Why do we have to multiply with (1.0*) to calculate the churn rate?

the group of customers that are active going into December:
WITH enrollments AS
(SELECT *
FROM subscriptions
WHERE subscription_start < ‘2016-12-01’
AND (
(subscription_end >= ‘2016-12-01’)
OR (subscription_end IS NULL)
)),

I cant understand why we use “subscription_end >= ‘2016-12-01’”. If the subscription_end is >= than 2016-12-01, wasnt it supossed to include 2017-01-01, for example?

Hi gigabyte233. I asked that same question a while back and it is because the SQL engine inherintly rounds results of calculations up or down to the nearest whole number. Multiplying by 1.0 forces it to provide the result as a floating point number. :slight_smile:
Hope that helps.

3 Likes

IMHO - as I’m learning this just as much as anybody-

@java7682825630 is right on. The enrollments table prepares the field by culling subscriptions that ended in previous months – effectively taking them OFF the board. That’s a necessary condition for the is_canceled clause to work.

We’re looking at churn in January. Here is the codecademy clause:

WHEN (subscription_end > ‘2017-01-31’)
OR (subscription_end IS NULL) THEN 0
ELSE 1
END as is_canceled,

Without the enrollments table acting as a filter, a subscription that ended 2016-10-20 (random plug-in example) would be counted, pass through ELSE, and be assigned 1 – becoming part of our churn calculation, against our wishes. The enrollments clause cut off these sneaky buggers by earlier including (subscription_end >= ‘2016-12-01’) OR (subscription_end IS NULL) as a WHERE condition.

But @softpower is also right – there is no reason why the is_canceled clause cannot do this locally , without the “preparatory help” of the enrollments table. Consider:

CASE
WHEN (subscription_end BETWEEN ‘2017-01-01’ AND ‘2017-01-31’)
THEN 1
ELSE 0
END as is_canceled,

I also like this better because I just favor functions that calculate to 1 ELSE 0, rather than the inverse…it’s easier for humans to think about what they want and not everything else in the world they don’t want but maybe that’s just me.

The enrollments table does end up looking pretty redundant — but to do away with it you would need to make sure your is_canceled clause supplies the necessary “filter.”


By now you’re probably thinking "Aha, but I deleted the enrollments table, hooked up the status AS clause to take input FROM subscriptions directly, and got the same correct answer. So where are these “early subscription endings” that were supposed to sabotage our result?

TO which I say - you just got lucky. The subscriptions table HAS NO subscription_end prior to ‘2017-01-01.’ Try and see for yourself:

SELECT MIN(subscription_end) FROM subscriptions;

A different data set (one with earlier subscription endings) would not be so forgiving.

4 Likes

@elaol because people who cancelled in January are ALSO active for the month of January (they are “no longer subscribed” starting in February.)

Remember the churn formula is

Cancelling members / Active members (including those who are cancelling!)

When we know we’re dealing with a smaller group within a larger group, we can quickly common-sense check if we’ve set up a good “percentage of” formula. Just ask if we’ve set up counting conditions so that the denominator is bigger than the numerator.

Everyone who gets ‘1’ for cancelling MUST ALSO get ‘1’ for being active. Additionally, everyone who doesn’t cancels gets ‘0’ for cancelling and ‘1’ for just being active. Therefore, we know active members will be the bigger number, because there are only two kinds of people and they ALL get 1s for activity, but only a SUBSET gets 1s for cancelling.

My much easier way of doing this.
Whats wrong with doing code this way?

SELECT COUNT(subscription_end)
FROM subscriptions
WHERE subscription_end BETWEEN ‘2017-01-01’ AND ‘2017-01-31’
UNION
SELECT COUNT(subscription_start)
FROM subscriptions
WHERE subscription_start < ‘2017-01-01’;

SELECT 1.0* 35/277 *100;

As others have noted, it seemed like there was a redundancy here that could be eliminated. I made a slight alteration: since the first table already filtered down to those with an active subscription at the start of the month, for the second table, I just gave every row a value of 1 for ‘is_active’. That way, the SUM functions still operated as intended.

WITH enrollments AS
(SELECT *
FROM subscriptions
WHERE subscription_start < '2017-01-01'
AND (
  (subscription_end >= '2017-01-01')
  OR (subscription_end IS NULL)
)), -- creates a subset of data that only includes actives

status AS 
(SELECT
  CASE
    WHEN (subscription_end > '2017-01-31')
      OR (subscription_end IS NULL) THEN 0
      ELSE 1
    END as is_canceled, -- +1 for every cancellation

  1 as is_active -- +1 for every row

FROM enrollments)

SELECT 1.0 * SUM(is_canceled) / SUM(is_active) AS 'Churn'
FROM status;
1 Like

Although we only calculate one month for this specific exorcise, it states that churning rates are calculated over many months. I speculate that temporary enrollments table would come in handy to extend the single month calculation to a multi month calculation.

THE CORRECT SOLUTION:

WITH enrollments AS
(
SELECT *
FROM subscriptions
WHERE subscription_start<‘2017-01-01’
AND (
(subscription_end>=‘2017-01-01’)
OR (subscription_end IS NULL)
)
),
status AS
(SELECT
CASE
WHEN (subscription_end > ‘2017-01-31’)
OR (subscription_end IS NULL)
THEN 0
ELSE 1
END as is_canceled,
CASE
WHEN (subscription_start < ‘2017-01-01’
AND (
(subscription_end >= ‘2017-01-01’)
OR (subscription_end IS NULL)
) )
THEN 1
ELSE 0
END as is_active
FROM enrollments
)
SELECT 1.0 * SUM(is_canceled) / SUM(is_active)
FROM status;

1 Like