FAQ: Calculating Churn - Single Month II

the group of customers that are active going into December:
WITH enrollments AS
(SELECT *
FROM subscriptions
WHERE subscription_start < ‘2016-12-01’
AND (
(subscription_end >= ‘2016-12-01’)
OR (subscription_end IS NULL)
)),

I cant understand why we use “subscription_end >= ‘2016-12-01’”. If the subscription_end is >= than 2016-12-01, wasnt it supossed to include 2017-01-01, for example?

Hi gigabyte233. I asked that same question a while back and it is because the SQL engine inherintly rounds results of calculations up or down to the nearest whole number. Multiplying by 1.0 forces it to provide the result as a floating point number. :slight_smile:
Hope that helps.

3 Likes

IMHO - as I’m learning this just as much as anybody-

@java7682825630 is right on. The enrollments table prepares the field by culling subscriptions that ended in previous months – effectively taking them OFF the board. That’s a necessary condition for the is_canceled clause to work.

We’re looking at churn in January. Here is the codecademy clause:

WHEN (subscription_end > ‘2017-01-31’)
OR (subscription_end IS NULL) THEN 0
ELSE 1
END as is_canceled,

Without the enrollments table acting as a filter, a subscription that ended 2016-10-20 (random plug-in example) would be counted, pass through ELSE, and be assigned 1 – becoming part of our churn calculation, against our wishes. The enrollments clause cut off these sneaky buggers by earlier including (subscription_end >= ‘2016-12-01’) OR (subscription_end IS NULL) as a WHERE condition.

But @softpower is also right – there is no reason why the is_canceled clause cannot do this locally , without the “preparatory help” of the enrollments table. Consider:

CASE
WHEN (subscription_end BETWEEN ‘2017-01-01’ AND ‘2017-01-31’)
THEN 1
ELSE 0
END as is_canceled,

I also like this better because I just favor functions that calculate to 1 ELSE 0, rather than the inverse…it’s easier for humans to think about what they want and not everything else in the world they don’t want but maybe that’s just me.

The enrollments table does end up looking pretty redundant — but to do away with it you would need to make sure your is_canceled clause supplies the necessary “filter.”


By now you’re probably thinking "Aha, but I deleted the enrollments table, hooked up the status AS clause to take input FROM subscriptions directly, and got the same correct answer. So where are these “early subscription endings” that were supposed to sabotage our result?

TO which I say - you just got lucky. The subscriptions table HAS NO subscription_end prior to ‘2017-01-01.’ Try and see for yourself:

SELECT MIN(subscription_end) FROM subscriptions;

A different data set (one with earlier subscription endings) would not be so forgiving.

4 Likes

@elaol because people who cancelled in January are ALSO active for the month of January (they are “no longer subscribed” starting in February.)

Remember the churn formula is

Cancelling members / Active members (including those who are cancelling!)

When we know we’re dealing with a smaller group within a larger group, we can quickly common-sense check if we’ve set up a good “percentage of” formula. Just ask if we’ve set up counting conditions so that the denominator is bigger than the numerator.

Everyone who gets ‘1’ for cancelling MUST ALSO get ‘1’ for being active. Additionally, everyone who doesn’t cancels gets ‘0’ for cancelling and ‘1’ for just being active. Therefore, we know active members will be the bigger number, because there are only two kinds of people and they ALL get 1s for activity, but only a SUBSET gets 1s for cancelling.

My much easier way of doing this.
Whats wrong with doing code this way?

SELECT COUNT(subscription_end)
FROM subscriptions
WHERE subscription_end BETWEEN ‘2017-01-01’ AND ‘2017-01-31’
UNION
SELECT COUNT(subscription_start)
FROM subscriptions
WHERE subscription_start < ‘2017-01-01’;

SELECT 1.0* 35/277 *100;

As others have noted, it seemed like there was a redundancy here that could be eliminated. I made a slight alteration: since the first table already filtered down to those with an active subscription at the start of the month, for the second table, I just gave every row a value of 1 for ‘is_active’. That way, the SUM functions still operated as intended.

WITH enrollments AS
(SELECT *
FROM subscriptions
WHERE subscription_start < '2017-01-01'
AND (
  (subscription_end >= '2017-01-01')
  OR (subscription_end IS NULL)
)), -- creates a subset of data that only includes actives

status AS 
(SELECT
  CASE
    WHEN (subscription_end > '2017-01-31')
      OR (subscription_end IS NULL) THEN 0
      ELSE 1
    END as is_canceled, -- +1 for every cancellation

  1 as is_active -- +1 for every row

FROM enrollments)

SELECT 1.0 * SUM(is_canceled) / SUM(is_active) AS 'Churn'
FROM status;
1 Like

Although we only calculate one month for this specific exorcise, it states that churning rates are calculated over many months. I speculate that temporary enrollments table would come in handy to extend the single month calculation to a multi month calculation.

THE CORRECT SOLUTION:

WITH enrollments AS
(
SELECT *
FROM subscriptions
WHERE subscription_start<‘2017-01-01’
AND (
(subscription_end>=‘2017-01-01’)
OR (subscription_end IS NULL)
)
),
status AS
(SELECT
CASE
WHEN (subscription_end > ‘2017-01-31’)
OR (subscription_end IS NULL)
THEN 0
ELSE 1
END as is_canceled,
CASE
WHEN (subscription_start < ‘2017-01-01’
AND (
(subscription_end >= ‘2017-01-01’)
OR (subscription_end IS NULL)
) )
THEN 1
ELSE 0
END as is_active
FROM enrollments
)
SELECT 1.0 * SUM(is_canceled) / SUM(is_active)
FROM status;

1 Like

Would anybody happen to know where I went wrong? I used the video as a guide, but for some reason the same code doesn’t come back with any results:

WITH months AS
(
SELECT
‘2017-01-01’ AS first_day,
‘2017-01-31’ AS last_day
UNION
SELECT
‘2017-02-01’ AS first_day,
‘2017-02-28’ AS last_day
UNION
SELECT
‘2017-03-01’ AS first_day,
‘2017-03-31’ AS last_day
),cross_join AS
(SELECT * FROM subscriptions
CROSS JOIN months),
status as
(SELECT id, first_day AS month,
) ( CASE
WHEN (subscription_start<first_day) AND (subscription_end>first_day OR subscription_end IS NULL) AND (segment = 87) THEN 1
ELSE 0
END as is_active_87,
) CASE
WHEN (subscription_start<first_day) AND (subscription_end>first_day OR subscription_end IS NULL) AND (segment = 30) THEN 1
ELSE 0
END as is_active_30
CASE
WHEN (subscription_end BETWEEN first_day AND last_day) AND (segment = 87) THEN 1
ELSE 0
END AS is_canceled_87
CASE
WHEN (subscription_end BETWEEN first_day AND last_day) AND (segment = 30) THEN 1
ELSE 0
END AS is_canceled_30
) SELECT * FROM subscriptions;

I used the different approach to solve this problem.
Churn Rate is number of user canceled the subscription divided by total number of user.

Now to find total number of user for Jan month user need to be fulfilled :

  1. It need to subscribe before 1st Jan 2017.
  2. The subscription if cancelled then need to be cancelled after 31 Dec 2016 otherwise not cancelled.
    Now we are calling this table as enrollment table so all the member in the enrollment tables are active users.

To find the user who cancelled the subscription, he needs to cancelled the subscription before 31 Jan 2017.

so my query is like that

WITH enrollment as

(

select * from subscriptions

where subscription_start<‘2017-01-01’

and

(subscription_end >=‘2017-01-01’ or subscription_end is null)

),

status as

(

SELECT

CASE

WHEN (subscription_end <= ‘2017-01-31’)

THEN 1

ELSE 0

END as is_canceled,1 as is_active /*,subscription_start,subscription_end */

from enrollment

)

select 1.0*sum(is_canceled)/sum(is_active) from status;

1 Like

I think is because is using the >= for january 01. So as far as i understand the exercise is being applied just for the month of December, so if a subscription ended beyond january first the user is still active trough all the month of december, thats why its lumped together with the conditional “OR subscription is null”

I don’t really get why the first WITH enrollments is needed. Here is what I wrote and made sense, as a reference for you:

WITH status AS (

SELECT

CASE

WHEN (subscription_end BETWEEN ‘2017-01-01’ AND ‘2017-01-31’) THEN 1

ELSE 0

END AS ‘is_canceled’,

CASE

WHEN (

(subscription_start < '2017-01-01')

AND ((subscription_end > '2016-12-31')

OR (subscription_end IS NULL))

) THEN 1

ELSE 0

END AS ‘is_active’

FROM subscriptions

)

SELECT 1.0 * SUM (is_canceled) / SUM (is_active)

AS churn_rate

FROM status;

I’m sorry if I sound dumb, but what’s the difference between Single month I(previous lesson) code and single month II code ? they both return the churn every month. sorry if I have hard time to comprehend the lesson.

Nice! This shows that you can implicitly use the WITH clause within the FROM clause by basically declaring a temporary table with the SELECT statement. I think this is because SQL does not iterate code sequentially but compiles it and all the relations before running it. It must make a table called before running the code just as a WITH clause would explicitly do that.

Here’s how I solved it with a JOIN statement and removing the second CASE statement so that you aren’t repeating the code that’s already written to create . I’m not sure if there’s any downside to this code or which one I like “better.”

WITH enrollments AS

(SELECT *
FROM subscriptions
WHERE subscription_start < ‘2017-01-01’
AND (
(subscription_end >= ‘2017-01-01’)
OR (subscription_end IS NULL)
)),

status AS
(SELECT enrollments.id,
CASE
WHEN (subscription_end > ‘2017-01-31’)
OR (subscription_end IS NULL) THEN 0
ELSE 1
END as is_canceled
FROM enrollments
)

SELECT 1.0 * SUM(status.is_canceled) / COUNT(enrollments.id)
FROM status
JOIN enrollments
ON status.id = enrollments.id;

WITH status AS
(SELECT
CASE
WHEN (subscription_end BETWEEN ‘2017-01-01’
AND ‘2017-01-31’) THEN 1
ELSE 0
END AS is_canceled,
CASE
WHEN subscription_start < ‘2017-01-01’
AND (
(subscription_end >= ‘2017-01-01’)
OR (subscription_end IS NULL)
) THEN 1
ELSE 0
END AS is_active
FROM subscriptions
)

SELECT 1.0*SUM(is_canceled)/SUM(is_active)
FROM status;

My query is not returning anything - a log to troubleshoot errors would be nice… anyone see something I cannot? I appreciate any and all help, community!

WITH status AS
(
  SELECT
  CASE
    WHEN ((subscription_end > '2017-01-31')
      OR (subscription_end IS NULL)) THEN 0
      ELSE 1
  END as is_canceled,
  CASE
    WHEN (subscription_start < '2017-01-01')
      AND (
        (subscription_end >= '2017-01-01')
        OR (subscription_end IS NULL)
      ) THEN 1
    ELSE 0
  END as in_active
  FROM subscriptions;
)
SELECT 1.0 * SUM(is_canceled) / SUM(is_active)
FROM status;

Help!!! I dont understand the problem here. it’s coming out blank.

with enrollments as ( select (*)

from subscriptions

where subscription_start < ‘2017-01-01’

and (subscription_end >‘2017-01-01’)

or (subscription_end is NULL)

),

status as

(select

case

when (subscription_end > '2017-01-31')

or (subscription_end is null) 

then 0

else 1

end as is_canceled),

status as

(select case

when subscription_start < ‘2017-01-01’

and (subscription_end >= ‘2017-01-01’)

or (subscription_end is null)

then 1

else 0

end as is_active

from enrollments),

select 1.0 * sum(is_canceled) / sum(is_active)

from status;

I used this solution to get a response but, much like yourself, I am still not sure why mine came out blank. Good luck my friend!

You will not see the results of a CTE (WITH) because it’s a temp table.
You can break up that lengthy query into smaller parts.
Try SELECT *
in order to see results. (I believe that’s in the video)

thanks, perfectly explained, I had the same question as Murilolui. The thing is that there are no user in the table that abandoned before january because of the “one month of commitment” so if you remove the AND / OR statements you also get the correct Churn Ratio.