FAQ: Usage Funnels - Build a Funnel from Multiple Tables 2

If the same user, on the same day, got to the same point in the funnel that they did earlier that day there would be a duplicate row in the table which we wouldn’t count using DISTINCT.

2 Likes

Thank you very much. I got it.

c.user_id IS NOT NULL AS ‘is_checkout’,
CAN WE USE THE SYNTAX 'CASE’INSTEAD OF THE ABOVE?AND HOW?THANK YOU!

1 Like

Hi,

The exercise suggests joining c to b, then p to c, as below:

SELECT *
FROM browse AS ‘b’
LEFT JOIN checkout AS ‘c’
ON c.user_id = b.user_id
LEFT JOIN purchase AS ‘p’
ON p.user_id = c.user_id
LIMIT 50;

Does the choice of tables to join and the order you join them in matter here? I imagine it does, as it’s a LEFT JOIN so you might lose rows if you’re not careful, but in that case why not just join both c and p to b directly?

SELECT *
FROM browse AS ‘b’
LEFT JOIN checkout AS ‘c’
ON c.user_id = b.user_id
LEFT JOIN purchase AS ‘p’
ON b.user_id = p.user_id
LIMIT 50;

Thanks,

Also, what is the purpose of the ‘DISTINCT’ in the second part (copied below)?

Thanks,

"But we don’t want all of these columns in the result!

Instead of selecting all columns using * , let’s select these four:

  • DISTINCT b.browse_date
  • b.user_id
  • c.user_id IS NOT NULL AS 'is_checkout'
  • p.user_id IS NOT NULL AS 'is_purchase'

Edit your query so that you select these columns."

The purpose of using DISTINCT is that you don’t get duplicates of the data for a given set of columns you are selecting. So if two rows have the same values for all four columns you have selected, DISTINCT would make sure that the combination of values is only displayed once instead of multiple times.

I have a question on the query since a customer may have multiple items in the same day, wouldn’t the query in the join lead to many to many join. Shouldn’t we create the final table at a cm11 level, item_id and browse_date. This will give us the information for that specific item, what was the customer journey of the customer like and at which stage did he pause.

1 Like

Thanks for the explanation !

hi, do you know if using group by instead of distinct would have give the same results?

my query is this:

SELECT

b.browse_date,

b.user_id,

CASE WHEN c.user_id IS NOT NULL THEN ‘True’ ELSE ‘False’ END AS ‘is_checkout’,

CASE WHEN p.user_id IS NOT NULL THEN ‘True’ ELSE ‘False’ END AS ‘is_purchase’

FROM browse b

LEFT JOIN checkout c

ON b.user_id = c.user_id

LEFT JOIN purchase p

ON c.user_id = p.user_id

GROUP BY 1,2

LIMIT 50;

what about a group by?
would have been correct as well?
my query is:

SELECT

b.browse_date,

b.user_id,

CASE WHEN c.user_id IS NOT NULL THEN ‘True’ ELSE ‘False’ END AS ‘is_checkout’,

CASE WHEN p.user_id IS NOT NULL THEN ‘True’ ELSE ‘False’ END AS ‘is_purchase’

FROM browse b

LEFT JOIN checkout c

ON b.user_id = c.user_id

LEFT JOIN purchase p

ON c.user_id = p.user_id

GROUP BY 1,2

LIMIT 50;

How does b.browse_date, b.user_id, c.user_id, p.user_id in this query work if these aren’t defined until the Alias?

SELECT DISTINCT b.browse_date, b.user_id, c.user_id IS NOT NULL AS ‘is_checkout’, p.user_id IS NOT NULL AS ‘is_purchase’
FROM browse AS ‘b’
LEFT JOIN checkout AS ‘c’
ON c.user_id = b.user_id
LEFT JOIN purchase AS ‘p’
ON p.user_id = c.user_id
LIMIT 50;

To my understanding SQL doesn’t run line by line like some coding languages do, but instead runs all at once after it validates. Thus, defining the aliases after they’re referenced doesn’t matter (i.e. “p.user_id” as “p” can be used in the select statement before we’ve created an alias to define purchase as p).

I’m confused as to why the ‘is_checkout’ and ‘is_purchase’ output binary (1 or 0) and don’t show a user_id string instead?

whoops. Step 8 answered my own question… IS NOT NULL is a true/false function.

None able to help here??

Hello
It seems that this exercise makes the assumption that each user only completes one visit during the entire timeframe of the data recorded in the data base. Is that correct?
I’m interested in knowing how we should adapt the query in case a given user can have browsed the website multiple times.

Can we use WHERE instead of c.user_id IS NOT NULL AS 'is_checkout'? and why do use this c.user_id IS NOT NULL AS 'is_checkout' anyway? is there any other way to achieve this?

SELECT DISTINCT b.browse_date, b.user_id, c.user_id IS NOT NULL AS 'is_checkout', p.user_id IS NOT NULL AS 'is_purchase'
FROM browse AS 'b'
LEFT JOIN checkout AS 'c'
ON b.user_id = c.user_id
LEFT JOIN purchase AS 'p'
ON p.user_id = c.user_id
LIMIT 50;

1 Like

Would also be interested in this - I’ve used CASE but didn’t get any result. Curious to hear why

That’s what I wrote using CASE and got results:

SELECT DISTINCT b.browse_date AS 'Browse Date',
   b.user_id AS 'User ID',
   CASE
    WHEN c.user_id IS NOT NULL THEN 'True'
    ELSE 'False'
    END AS 'is_checkout',
   CASE
    WHEN p.user_id IS NOT NULL THEN 'True'
    ELSE 'False'
    END AS 'is_purchase'
FROM browse AS 'b'
LEFT JOIN checkout AS 'c'
  ON b.user_id = c.user_id
LEFT JOIN purchase AS 'p'
  ON c.user_id = p.user_id
LIMIT 50;

I’ve used WHERE as well and I get no result. As far as I might understand, the member who used CASE and got result means he/she is able to tell the program to distinguish between customers who did check-out and purchase the product and who did not. Meanwhile, using WHERE only CHOOSE people who did check-out and purchase (value is not null) while ignore who did not (value is null). For example, my code using WHERE did not return result I wanted.

SELECT b.browse_date,
b.user_id,
c.user_id AS 'is_checkout',
p.user_id AS 'is_purchase'
FROM browse as 'b'
LEFT JOIN checkout as 'c'
ON b.user_id = c.user_id
LEFT JOIN purchase as 'p'
ON c.user_id = p.user_id
WHERE c.user_id IS NOT NULL
AND p.user_id IS NOT NULL
LIMIT 50;