I already get the point "Not every campaign contains “first touches” " and the reason why we should not add the utm_campaign column to that first query, but still in confusion that, how JOIN corrects that issue?
I already get the point "Not every campaign contains “first touches” " and the reason why we should not add the utm_campaign column to that first query, but still in confusion that, how JOIN corrects that issue?
I feel like most of the confusion actually stems from how row values are select with MIN/MAX
(and other aggregate functions) and GROUP BY.
Let’s look at the following query:
SELECT
user_id,
utm_source,
utm_campaign,
MIN(timestamp)
FROM page_visits
GROUP BY user_id;
Here the row values for utm_source and utm_campaign will NOT be the corresponding values associated
with each grouped MIN(timestamp) but rather the values from the first row it encounters.
Therefore a JOIN (or a subquery) is needed to find the right corresponding values.
The reason is for you to be able to incorporate both first_touch and last_touch into one table in the future. See, first_touch has a different source. Once you create temporary tables for first_touch and last_touch you could join them to have an overview of sources for each user.