Warby Parker Exercise: help me understand my code? which of two codes is better?

In the context of the Warby Parker exercise, on funnels, I have two questions.

  1. I get the same result with both codes. Is one preferable to the other?

  2. CODE 2 is adapted from an earlier exercise in the series. However, I do not understand the two instances of COUNT(DISTINCT CASE in it.

Earlier in the course, a ‘semantic breakdown’ was offered: an explanation that allowed me to understand each part of the code, what it does, and why. I’d very much appreciate a guide to that same understanding in this case.

Many thanks in advance.

CODE 1:

  SELECT
  quiz.user_id,
  home_try_on.user_id IS NOT NULL AS 'is_home_try_on',
  home_try_on.number_of_pairs,
  purchase.user_id IS NOT NULL AS 'is_purchase'
FROM quiz
  LEFT JOIN home_try_on
    ON quiz.user_id = home_try_on.user_id
  LEFT JOIN purchase
    ON home_try_on.user_id = purchase.user_id
)
SELECT 
  number_of_pairs,
  SUM(is_purchase)
  FROM base
  GROUP BY number_of_pairs;

CODE 2

WITH base AS (
  SELECT
  quiz.user_id,
  home_try_on.user_id IS NOT NULL AS 'is_home_try_on',
  home_try_on.number_of_pairs,
  purchase.user_id IS NOT NULL AS 'is_purchase'
FROM quiz
  LEFT JOIN home_try_on
    ON quiz.user_id = home_try_on.user_id
  LEFT JOIN purchase
    ON home_try_on.user_id = purchase.user_id
)
SELECT 
COUNT(DISTINCT CASE
WHEN number_of_pairs = '3 pairs' 
AND is_purchase = 1
THEN user_id
END) AS '3 pair buy',
COUNT(DISTINCT CASE
WHEN number_of_pairs = '5 pairs' 
AND is_purchase = 1
THEN user_id
END) AS '5 pair buy'
  FROM base;

@ole.koksvik,

In the first query, you are taking data from your temporary table base and grouping it by number_of_pairs This will group your data into three rows based on the values in number_of_pairs: NULL, 3 pairs and 5 pairs. Then from that grouped data you are selecting the column number_of_pairs as well as the sum of the is_purchase column. Notice that is_purchase contains only 0 or 1 representing True or False based on this line:

purchase.user_id IS NOT NULL AS 'is_purchase'

This query will return this result:
image

Your second query also pulls data from your base temporary table, but rather than using existing columns and grouping, you create new temporary columns with your CASE statements. By using the CASE statements you are returning the user_id only if that person made a purchase and then you are counting the total number of rows with a user_id. You are doing this for all the rows where number_of_pairs is “3 pairs” first, selecting that column with the alias 3 pair buy. Then you are doing the same on rows where number_of_pairs is “5 pairs” and selecting that column with the alias 5 pair buy. Notice that because you only had CASE statements for those two number_of_pairs values, any other value (in this case just NULL) will not have its own column.

This second query returns this result:
image

Both of these queries work fine if you are just trying to find out the number of people who purchased after trying on 3 or 5 pairs of glasses. However, this doesn’t tell you a whole lot. You might find more insights if you also take into account the number of people who didn’t purchase in each group.

Anyway, hope this helps you wrap your head around the difference. Happy coding!

Thank you so much, @el_cocodrilo. I am moved by your generosity in taking the time to explain this.

I’m on a 3-month free PRO membership because I am unemployed due to COVID-19. Encountering such generosity is really wonderful. Thank you.

This was the key for me – it was what I had not grasped.

And point well taken about the analytic (lack of) value of these numbers.

Thanks!!