Help! I don't understand the usefulness of the result of this SQL Multiple tables CROSS JOIN challenge

Exercise link:
https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-join-data/modules/analyze-data-sql-learn-joins/lessons/sql-multiple-tables-code-challenge/exercises/cross-join

I successfully completed this coding challenge on sql cross join. However, I don’t see how the resulting table of the CROSS JOIN helps us to see which users were active or inactive in the various months.
I feel that just displaying the premium_users table as it is will tell us which users were active in each month. I don’t see the essence of displaying the months.months as an extra column.

Kindly help me out. What am I missing here?

You know, I agree with you. I think having a months table makes zero sense b/c all the information on a user’s purchase date and cancel date is in the premium_users table. One can determine how many months one was a user from the information that is contained in that table.
The months table has 1 column & 3 rows: Jan, Feb, March of 2017.
So, I must be missing something too.

Hi Princesnam!

Just to cover the principle first: a CROSS JOIN will join every item in your first table with every item in your second table. So say you have one table which contains all possible t-shirts you sell (1,red shirts; 2,white shirts, 3,blue shirts), and another containing all possible months (Jan;Feb;March), your table will look like this:

ID Shirt Month
1 Red Jan
1 Red Feb
1 Red Mar
2 White Jan
2 White Feb
... ... ...

In this exercise, you’re joining a month table with a users table. Your users table have joined at a particular date, and if they have stopped paying, they left at a particular date.

So by doing a cross-join, you get a table that is sort of like this:

[u].User ID [u].Joined [u].Cancelled [m].Month
1 03/02/2020 03/02/2021 Jan
1 03/02/2020 03/02/2021 Feb
1 03/02/2020 03/02/2021 Mar
2 04/01/2020 {NULL} Jan
2 04/01/2020 {NULL} Feb
... ... ...

The CROSS JOIN just puts the two tables next to each other – you’re right that this isn’t useful in its current form!

In exercise 7/10, you don’t display the Month column. Instead, you do a test (CASE):

  • WHEN the start date is BEFORE the join date, AND the user has either: (a) cancelled after the month, or (b) cancelled IS NULL, THEN the user is active;
    ELSE the user is inactive.

Note, your CASE is dependent on the CROSS JOIN data table. Now, your table looks as follows:

[u].User ID [m].Month Status that Month
1 Jan inactive
1 Feb inactive
1 March active
2 Jan inactive
2 Feb active
... ... ...

This table would show you when each user is active or inactive, for any given month!

Finally, you might want to group this data together by each month, to count how many people are active in each month, to get a table like:

[m].Month Active Users
Jan 1001
Feb 2033
March 3127
... ...

As you get further through the course, you can also see how to e.g. calculate churn using this information, and so on. This also has real-world applications. For example, I use a very similar table to calculate my email marketing results per-month and per-week:

  • I can see every open event from users (unique ID), which user ID generated that open, and when that open event took place (DATETIME).
  • I use a CROSS JOIN to translate that open event DATETIME to a ‘week’ and ‘month’ (e.g. 01-01-1970 12:00:00 GMT becomes “Week 1”, and “Jan”).`*
  • I can then use GROUP BY to join data together to calculate (a) how many opens I have each week or month, (b) calculate average open rates, (c) see which emails had the best open rate, (d) which emails need improvement, and so on.

Hope this helps! The specific answer to your question is: that data is not helpful as a visible table, but it allows you to do cool things as you do further calculations.

`* You can also use STRFTIME() for this, depending on use case.

P.S:

I think I addressed this partially in my answer, but to make it explicit:

A ‘months’ table, I would think, lets you do things like aggregates much more easily. You might also have ‘custom’ dates in your table. Consider the following use case for a florist, who wants to know how much impact holidays have on their order numbers. They might have the following ‘months’ table:

id month holidays
0 Jan
1 Feb Valentines
2 Mar Mothers Day
... ... ...
11 Dec Christmas

There are three immediate benefits of this table:

  • Lots of languages will use month as a number ranging 0-11 to define Jan-Dec. This table uses the ID as the ‘month’, which makes platform integration a bit easier for some. You’d also be able to do calcs on that month field (e.g. Feb+1 = ERR; 2+1 = 3, i.e. March)
  • You can do a calculation where holidays IS NOT NULL to work out how many more / fewer sales one gets on holiday months compared to non-holiday months.
  • Suppose you later also need to evaluate not just sales but refunds for your business too. You just need to reference the months table; not reconstruct a whole new query. If you need to do it for payroll, this works too. And order tracking, staff allocation, flower subscriptions, etc. etc.

Hope this makes sense! Obviously there’s other approaches, but given this is an exercise to show how a CROSS JOIN might work in practice, it makes sense as an example of why one’d use CROSS JOIN.

1 Like

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.