FAQ: Usage Funnels - Build a Funnel from Multiple Tables 2

This community-built FAQ covers the “Build a Funnel from Multiple Tables 2” exercise from the lesson “Usage Funnels”.

Paths and Courses
This exercise can be found in the following Codecademy content:

FAQs on the exercise Build a Funnel from Multiple Tables 2

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Can you really refer to an alias before the AS statement has created the alias?

select
    distinct b.browse_date,
    b.user_id,
    c.user_id is not null as 'is_checkout',
    p.user_id is not null as 'is_purchased'
from browse as 'b'
left join checkout as 'c'
on c.user_id = b.user_id
left join purchase as 'p'
on p.user_id = c.user_id
limit 50;
7 Likes

I’m not a SQL professional, but it seems that yes, you can.
I’t looks like that SQL, the first, creates the table (using aliases) and the second, selects solumns from the table.

Would you tell me, please, the meaning of using DISTINCT in the code below (when we create sales funnel).
And did it apply to all the columns after the expression or to the only?

Thank you.

 SELECT DISTINCT b.browse_date, 
b.user_id, c.user_id IS NOT NULL AS 'is_checkout', 
p.user_id IS NOT NULL AS 'is_purchase' 
 FROM browse AS 'b'
 LEFT JOIN checkout AS 'c'
 ON b.user_id = c.user_id
 LEFT JOIN purchase AS 'p'
 ON c.user_id = p.user_id
 LIMIT 50;
3 Likes

I had the same question, a little searching turned up the answer:

https://www.sqlitetutorial.net/sqlite-select-distinct

[T]he DISTINCT clause must appear immediately after the SELECT keyword.
… [Y]ou place a column or a list of columns after the DISTINCT keyword. If you use one column, SQLite uses values in that column to evaluate the duplicate. [If] you use multiple columns, SQLite uses the combination of values in these columns to evaluate the duplicate.

So in our case, DISTINCT is looking for unique combinations of the 4 columns, not just unique user_id’s.

4 Likes

jpritcha, thank you.
I understood, that all DISTINCT is applied on all the columns.
The question I still have, why do we use DISTINCT at all?

1 Like

If the same user, on the same day, got to the same point in the funnel that they did earlier that day there would be a duplicate row in the table which we wouldn’t count using DISTINCT.

2 Likes

Thank you very much. I got it.

c.user_id IS NOT NULL AS ‘is_checkout’,
CAN WE USE THE SYNTAX 'CASE’INSTEAD OF THE ABOVE?AND HOW?THANK YOU!

1 Like

Hi,

The exercise suggests joining c to b, then p to c, as below:

SELECT *
FROM browse AS ‘b’
LEFT JOIN checkout AS ‘c’
ON c.user_id = b.user_id
LEFT JOIN purchase AS ‘p’
ON p.user_id = c.user_id
LIMIT 50;

Does the choice of tables to join and the order you join them in matter here? I imagine it does, as it’s a LEFT JOIN so you might lose rows if you’re not careful, but in that case why not just join both c and p to b directly?

SELECT *
FROM browse AS ‘b’
LEFT JOIN checkout AS ‘c’
ON c.user_id = b.user_id
LEFT JOIN purchase AS ‘p’
ON b.user_id = p.user_id
LIMIT 50;

Thanks,

Also, what is the purpose of the ‘DISTINCT’ in the second part (copied below)?

Thanks,

"But we don’t want all of these columns in the result!

Instead of selecting all columns using * , let’s select these four:

  • DISTINCT b.browse_date
  • b.user_id
  • c.user_id IS NOT NULL AS 'is_checkout'
  • p.user_id IS NOT NULL AS 'is_purchase'

Edit your query so that you select these columns."

The purpose of using DISTINCT is that you don’t get duplicates of the data for a given set of columns you are selecting. So if two rows have the same values for all four columns you have selected, DISTINCT would make sure that the combination of values is only displayed once instead of multiple times.

I have a question on the query since a customer may have multiple items in the same day, wouldn’t the query in the join lead to many to many join. Shouldn’t we create the final table at a cm11 level, item_id and browse_date. This will give us the information for that specific item, what was the customer journey of the customer like and at which stage did he pause.

1 Like

Thanks for the explanation !

hi, do you know if using group by instead of distinct would have give the same results?

my query is this:

SELECT

b.browse_date,

b.user_id,

CASE WHEN c.user_id IS NOT NULL THEN ‘True’ ELSE ‘False’ END AS ‘is_checkout’,

CASE WHEN p.user_id IS NOT NULL THEN ‘True’ ELSE ‘False’ END AS ‘is_purchase’

FROM browse b

LEFT JOIN checkout c

ON b.user_id = c.user_id

LEFT JOIN purchase p

ON c.user_id = p.user_id

GROUP BY 1,2

LIMIT 50;

what about a group by?
would have been correct as well?
my query is:

SELECT

b.browse_date,

b.user_id,

CASE WHEN c.user_id IS NOT NULL THEN ‘True’ ELSE ‘False’ END AS ‘is_checkout’,

CASE WHEN p.user_id IS NOT NULL THEN ‘True’ ELSE ‘False’ END AS ‘is_purchase’

FROM browse b

LEFT JOIN checkout c

ON b.user_id = c.user_id

LEFT JOIN purchase p

ON c.user_id = p.user_id

GROUP BY 1,2

LIMIT 50;

How does b.browse_date, b.user_id, c.user_id, p.user_id in this query work if these aren’t defined until the Alias?

SELECT DISTINCT b.browse_date, b.user_id, c.user_id IS NOT NULL AS ‘is_checkout’, p.user_id IS NOT NULL AS ‘is_purchase’
FROM browse AS ‘b’
LEFT JOIN checkout AS ‘c’
ON c.user_id = b.user_id
LEFT JOIN purchase AS ‘p’
ON p.user_id = c.user_id
LIMIT 50;

To my understanding SQL doesn’t run line by line like some coding languages do, but instead runs all at once after it validates. Thus, defining the aliases after they’re referenced doesn’t matter (i.e. “p.user_id” as “p” can be used in the select statement before we’ve created an alias to define purchase as p).

I’m confused as to why the ‘is_checkout’ and ‘is_purchase’ output binary (1 or 0) and don’t show a user_id string instead?

whoops. Step 8 answered my own question… IS NOT NULL is a true/false function.