CASE Statement within a aggregate Function - Code Read through

Hi all,

Analyse Data with SQL - Pro Skill Path
Usage Funnels Project - Lesson - Usage Funnels
Question 4

I’m trying to understand the code here:

SELECT modal_text, COUNT(DISTINCT CASE
                              WHEN ab_group = "control" THEN user_id
                              END) AS "Control Group", 
                    COUNT(DISTINCT CASE
                                WHEN ab_group = "variant" THEN user_id
                                END) AS "Variant Group"
FROM onboarding_modals
GROUP BY modal_text
ORDER BY 1;



SELECT modal_text COUNT (DISTINCT ab_group = "control")
FROM onboarding_modals
GORUP BY 1;

This is my reading of it :

We couldn’t use the WHERE statement simply because we needed to perform another calculation, where there is a different condition to be met (upon which the execution would happen) - If we had put WHERE, the result would be zero - because the data for the second operation itself is not there.

I’m okay with the above, but I’m trying to understand the CASE statement.

In simple english, this is how I’m reading it:

COUNT the DISTINCT user_Id column WHEN the condition for ab_group column is “control” (as an example from first operation)

Now - What would you say is happening in the background? Is SQL filtering the data for where the ab_group is “Control” and counting the user_id column

Secondly, why didn’t we put ELSE here? Does this mean that its okay if we don’t put ELSE in a CASE statement (i.e its not a strict rule in SQL that ELSE should be there always when we are applying CASE statement?)

Third - I was wondering if there could be multiple condtions within this case statement in a hypothetical table with extensive data - a CASE statement like

COUNT(DISTINCT CASE
   WHEN ab_group = "Control2" THEN user_id2
   WHEN ab_group = "Control3" THEN user_id3
  ELSE user_id) AS "Example"

What I think is happening above is - the COUNT function will count a colum - yes, but that’s dependent on which condition gets met, if ab_group = Control 2 then what gets counted is user_id2, if nothing gets met then it just counts the user_id column.

Is that a correct understanding? (I do think there’s a flaw here, but unable to pinpoint)

Until this exercise, I was only aware of using CASE separately, that too to put a text where condition is met. Like,

CASE
    WHEN "Condition" THEN "Display This"
    WHEN "COndition 2" THEN "Display this"
    ELSE "Display this"
END AS "Name of the column in output screen"

So would appreciate any guidance here.

I think the first bit is easier to read when you understand that COUNT counts non-null rows (you could replace user_id with 1 or any other non-null value for a similar result). So with this particular expample it’s like you’re just counting how many rows match ab_group = "control" and so on.

If you want to see the filtering run it without COUNT and you can simply view the returned data (consider LIMIT for large tables).

The CASE basically goes through all the rows in and returns a value based on the conditions you set, if you’re familiarly with other languages it’s like an lengthy if/elif chain (switch might be closer).

Second: Yes, you can do without the ELSE, you’ll get NULLs but COUNT does not count them.

Third: Yes, multiple conditions is fine and used often.

2 Likes

Thanks for your answer.

I tried writing without the COUNT function, and it helped in understanding what actually was happening.

I have one doubt regarding the syntax though. Here’s the query I wrote, that worked:

SELECT user_id, modal_text, ab_group, (CASE
          WHEN ab_group = "control" THEN "Control"
          END) AS "Column 1", 
    (CASE
          WHEN ab_group = "variant" THEN "Variant"
          END) AS "Column 2"
FROM onboarding_modals
LIMIT 10;

However, until this question, I was aware of different placing of brackets and how/and where to write the AS clause when in relation to the CASE statement. This was my original code before I fixed it, and arrived at the above:

SELECT user_id, modal_text, ab_group, CASE(
          WHEN ab_group = "control" THEN "Control")
          END AS "Column 1", 
    CASE
          (WHEN ab_group = "variant" THEN "Variant")
          END AS "Column 2"
FROM onboarding_modals
LIMIT 10;

This didn’t quite work for me. Are the brackets always enclosing the CASE (i.e starting before it) and closing at END - and then AS statement?

Next Question:

Third: Yes, multiple conditions is fine and used often.

Would there be any example where multiple conditions are used in the same manner? If there’s a situation you might be aware of that’d be fine, I just need something to understand the context of when it would be used as against the actual code as of now.

Next Question:

This is from Question 7/10 of the same path, comes later. But its sort on the lines of earlier question as well i.e sorting/filtering in the SELECT statement itself.

Heres the correct query for it:

SELECT DISTINCT(b.browse_date), b.user_id, c.user_id IS NOT NULL AS "is_checkout", p.user_id IS NOT NULL AS "Is Purchase"

FROM browse AS "b"
LEFT JOIN checkout AS "c"
      ON b.user_id = c.user_id
LEFT JOIN purchase AS "p"
      ON p.user_id = b.user_id
LIMIT 50;

The output is something like this:

Screenshot 2022-02-20 at 6.12.55 PM

I have a little gap in understanding here.

The first gap is the use of IS NOT NULL in the SELECT statement itself. Could you please help me understand…why…how is that possible? The reason this gap appears is because until this question, I was only familiar with using it in the WHERE clause, and not here.

I still moved a step ahead with an assumption, which as : IS NOT NULL is there so that we can filter out the NULL values. But this contradicts the fact that we now have a table where NULL values are displayed as 0…

How did we even get here? There’s a learning gap here. Any reading material/explanation/links?

I understand what’s happening, but I can’t understand the logic of getting there.

Thanks for the help.