FAQ: First- and Last-Touch Attribution - The Attribution Query III

This community-built FAQ covers the “The Attribution Query III” exercise from the lesson “First- and Last-Touch Attribution”.

Paths and Courses
This exercise can be found in the following Codecademy content:

FAQs on the exercise The Attribution Query III

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Can someone please explain why we need to create a new table then join it to the old one instead of adding utm_source to the initial query that is grouped by user_id and includes max(timestamp) ?

11 Likes

I have been wondering about the same thing. Hopefully someone can answer.

I wonder the same thing but I also wonder how the system knows that ‘lt’ is the abbreviation for the last_touch table…

I think I get it now, though. Don’t think they actually give the same result, because first you need to link the first touches to user ID and later to campaign. You can’t do both at the same time so need to create a first_touch table. Try it in the project and I think you’ll understand.

1 Like

I think they actually give the same result.

If you select the specific columns from your table like;

SELECT user_id, MAX(timestamp), utm_source
FROM page_visits;

You get all the last_touch tables, if you point it to a particular user with ‘WHERE’ like:

SELECT user_id, MAX(timestamp), utm_source
FROM page_visits
WHERE user_id = 10069;

You will get the result for a particular user

Conclusion, I also don’t understand why we have to create that long table in the exercise to find the utm_sources.

the first query you gave does not work as intended. You need the create a temporary first_touch table if you want to get the utm_source along with the fist time stamp for each user.

I have the same question. For the sake of clarity, the solution according to the exercise should be

WITH last_touch AS (
  SELECT user_id,
    MAX(timestamp) AS 'last_touch_at'
  FROM page_visits
  GROUP BY user_id)
SELECT lt.user_id,
  lt.last_touch_at,
  pv.utm_source
FROM last_touch AS 'lt'
JOIN page_visits AS 'pv'
  ON lt.user_id = pv.user_id
  AND lt.last_touch_at = pv.timestamp;

But this other query gives the same result, without creating a temporary table to JOIN.

SELECT user_id,
  MAX(timestamp) AS 'last_touch_at',
  utm_store
FROM page_visits
GROUP BY user_id;

I compared the results given by the two queries and they were identical. So what is the difference?

My guess, but I am not sure about this, is that with a user that has multiple entries in the table with different values for utm_source, the short query might not return the utm_source that corresponds to the correct last_touch_at.

1 Like

SQL knows that that lt is the abbreviation for the last_touch table because in your query you should have added:

FROM last_touch AS 'lt'
select user_id,timestamp,utm_source from page_visits
group by user_id
having timestamp = max(timestamp);

will this query work against


WITH last_touch AS (
    SELECT user_id,
        MAX(timestamp) AS 'last_touch_at'
    FROM page_visits
    GROUP BY user_id)
SELECT lt.user_id,
    lt.last_touch_at,
    pv.utm_source
FROM last_touch AS 'lt'
JOIN page_visits AS 'pv'
    ON lt.user_id = pv.user_id
    AND lt.last_touch_at = pv.timestamp

Hi , can anyone explain to me how SQL will read the below code ?As in what sequence? I know seems like a basic question at this stage . Thanks!!

WITH first_touch AS (
SELECT user_id,
MIN(timestamp) AS ‘first_touch_at’
FROM page_visits
GROUP BY user_id)
SELECT ft.user_id,
ft.first_touch_at,
pv.utm_source
FROM first_touch AS ‘ft’
JOIN page_visits AS ‘pv’
ON ft.user_id = pv.user_id
AND ft.first_touch_at = pv.timestamp;

SELECT user_id ,MAX(timestamp),utm_source FROM page_visits
WHERE user_id = 10069;

WHY NEED THIS LONG SQL BELOW?

WITH last_touch AS (
SELECT user_id,
MAX(timestamp) as last_touch_at
FROM page_visits
GROUP BY user_id)
SELECT lt.user_id,
lt.last_touch_at,
pv.utm_source
FROM last_touch lt
JOIN page_visits pv
ON lt.user_id = pv.user_id
AND lt.last_touch_at = pv.timestamp
WHERE lt.user_id = 10069;

1 Like

SELECT user_id ,MAX(timestamp),utm_source FROM page_visits
WHERE user_id = 10069;

WITH last_touch AS (
SELECT user_id,
MAX(timestamp) as last_touch_at
FROM page_visits
GROUP BY user_id)
SELECT lt.user_id,
lt.last_touch_at,
pv.utm_source
FROM last_touch lt
JOIN page_visits pv
ON lt.user_id = pv.user_id
AND lt.last_touch_at = pv.timestamp
WHERE lt.user_id = 10069;

For the final task, why does the WHERE have to be lt.user_id? With the JOIN being partially on user_id, why does that identifier not work?

Hello CodeAcademy

I did a query to obtain both attributions in the same query. My issue here is that i will not get the utm_source for the last_touch_at, it will only show the first_touch’s utm_source.
How could i edit my query so it will show the different utm_source’s that X user_id registered.

"
WITH attribution AS (
SELECT user_id, MIN(timestamp) AS ‘first_touch_at’, MAX(timestamp) as ‘last_touch_at’
FROM page_visits
GROUP BY user_id)

SELECT a.user_id,
a.first_touch_at
a.last_touch_at,
pv.utm_source
FROM attribution AS ‘a’
JOIN page_visits AS ‘pv’
ON a.user_id = pv.user_id
AND a.first_touch_at = pv.timestamp
AND a.last_touch_at = pv.timestamp;

            "

I would imagine because since there are two user_ids, the system needs you to specify which one you are referring to, even though they are referring to the same records.

I have the same question as most of the people here. What’s the reason behind writing a long code with joins when the same question can be answered by MAX(timestamp) and group by user_id? I couldn’t find any answers here or did I miss it out somewhere?

1 Like

why do we have to use the join on this? Is it bcause SQL does not allow the group by with the UTM source included?

Question regarding the data itself. The below query is not correct because some users have multiple landing page entries. What causes this? Ive looked at the data and some of these users have two landing page entries in a row (like user_id = 59264)

SELECT utm_source, utm_campaign, count(distinct user_id), page_name

FROM page_visits

WHERE page_name LIKE ‘%landing%’

GROUP BY 1,2,4;

Hi, I also have the same question as you.
And I don’t think they will have any change even building temp table, because,t in last_touch, only one latest record will be extracted in each user due to the ‘groupby user_id’.
I am not sure… but it will be great if anyone can tell us the purpose of this method.