Discrepancy In Purchase Results, Marketing Attribution Project

Hello! This question is about the Marketing Attribution project in the Analyze Real Data chapter of the Data Science course.

Just curious if anyone else noticed a discrepancy between the total number of purchases made, and the number of purchases made from the last touch attribution?

When I ran a query to find the total number of users to make a purchase, I got 361:

SELECT distinct user_id FROM mark_att WHERE page_name LIKE '%4%'

But when I ran a query to find the number of users who made a purchase from their last touch, I got 316:

/*CREATE TABLE last_touch AS*/
WITH last_touch AS (
SELECT distinct user_id, MAX(timestamp) AS 'last_touch_at'
FROM mark_att
GROUP BY user_id)
SELECT distinct lt.user_id, lt.last_touch_at, ma.utm_campaign, ma.page_name
FROM last_touch AS 'lt'
JOIN mark_att AS 'ma'
	ON lt.user_id = ma.user_id
	AND lt.last_touch_at = ma.timestamp;

SELECT COUNT(distinct user_id) FROM  last_touch
WHERE page_name like '%4%';

Did anyone else notice this difference? When I counted the number of users to make a purchase from their first touch, I got 2.

Does this mean the other 43 purchases happened during other webpage access occurrences (not first or last touch)? Or am I writing my query incorrectly?

Any insight is helpful, thank you!

@heatherkushner975531,

Can you post a link to the exercise please?

The marketing attribution project that I see in the Data Science path looks different and has a table named page_visits, not mark_att.

Sure! Here is the link:
https://www.codecademy.com/paths/data-science/tracks/analyze-data-sql-analyze-real-data/modules/f94562dec290c4af13db325a9083cf51/projects/analyze-data-sql-attribution-queries

I changed the table name to mark_att when I imported the dataset to DB Browser.

I see. And what was the exact query that resulted in 316? The query you posted above doesn’t work because this code…

SELECT COUNT(distinct user_id) FROM  last_touch
WHERE page_name like '%4%';

…is trying to query from the CTE last_touch, which no longer exists after you have called this query:

SELECT distinct lt.user_id, lt.last_touch_at, ma.utm_campaign, ma.page_name
FROM last_touch AS 'lt'
JOIN mark_att AS 'ma'
	ON lt.user_id = ma.user_id
	AND lt.last_touch_at = ma.timestamp;

I tried different combinations of what you might have done, but I couldn’t get 316. Also, there shouldn’t be any users who purchased on their first touch.

Here are the queries that I wrote to see (1) how many users made a purchase, and (2) the number of users whose last touch was on each page. As you can see, there were 361 purchasers, but only 358 of them stopped browsing the website after the purchase page:
(click to reveal)

Ah thanks for the response! I suspect I did a few things here that really made mine a confusing post. While I was in DB Browser I created a separate table called last_touch, which is what you were seeing referenced in the attached code above.

Anywho, I’ve rewritten the code directly into codecademy and screengrabbed it:

When I run the same exact queries directly in Codecademy, I’m getting 358 like you. It seems like I was getting 316 because I messed up when pulling the data into DB Browser, which is a whole separate topic.

Still, any ideas why we have this discrepancy between the 361 and 358? I can’t seem to get it out of my head that they ought to match, since both are pulling data for user_id’s that accessed the purchase page.

Apologies if this is an obvious question!

1 Like

Glad to see we’re getting the same thing on the codecademy platform at least!

I suspect the reason last_touch_purchase has 3 fewer people is because those users went on to look at the site for a little longer after their purchase.

For instance, sometimes after you order something on Amazon you might continue browsing for a while. This would mean they made it to the purchase page, but the purchase page was not their “last touch.”

1 Like

Brilliant! Thanks so much for the time and help!

No problem.

Happy coding!