Hi all.
I don’t understand the solution of this exercise.
https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-analyze-real-data/modules/analyze-data-sql-marketing-attribution/projects/analyze-data-sql-attribution-queries
The task is to count all first touches of each campaign. There are 8 campaigns in total but the official solution only shows 4 campaigns.
Here’s my code that brings up all 8 campaigns and their first touches:
WITH first_touch AS(
SELECT user_id, MIN(timestamp) as first_touch_at
FROM page_visits
GROUP BY user_id
)
SELECT pv.utm_campaign, COUNT(ft.first_touch_at) AS total_first_touch
FROM first_touch AS ft
JOIN page_visits AS pv
ON ft.user_id = pv.user_id
GROUP BY pv.utm_campaign
ORDER BY 2 DESC;
And here the official solution that brings up only 4 campaigns.
WITH first_touch AS(
SELECT user_id, MIN(timestamp) as first_touch_at
FROM page_visits
GROUP BY user_id
),
ft_attr AS (
SELECT ft.user_id,
ft.first_touch_at,
pv.utm_source,
pv.utm_campaign
FROM first_touch ft
JOIN page_visits pv
ON ft.user_id = pv.user_id
AND ft.first_touch_at = pv.timestamp
)
SELECT ft_attr.utm_source,
ft_attr.utm_campaign,
COUNT(*)
FROM ft_attr
GROUP BY 1, 2
ORDER BY 3 DESC;
Here a screenshot of what my solution looks like vs the official one below.
Is my code wrong? Is Codecademys code wrong?
Any insights that really explain what’s going on here would be appreciated.
Cheers.
EDIT: I’ve already noticed that my code is not correct, but I’m still wondering why the official solution only shows 4 campaigns instead of 8.
EDIT 2:
I got it now! There are four “first touch” campaigns" and four “last touch” campaigns.
That’s why it’s important to join the tables correctly. My mistake was to only join them on id.
So this here works, too, and seems to be a shorter/simpler solution:
WITH first_touch AS(
SELECT user_id, MIN(timestamp) as first_touch_at
FROM page_visits
GROUP BY user_id
)
SELECT pv.utm_source, pv.utm_campaign, COUNT(ft.first_touch_at) AS total_first_touch
FROM first_touch AS ft
JOIN page_visits AS pv
ON ft.user_id = pv.user_id
AND ft.first_touch_at = pv.timestamp
GROUP BY 2
ORDER BY 3 DESC;