FAQ: Multiple Tables - Cross Join

I am stuck on step 3 of this exercise:

code

SELECT *
FROM newspaper
CROSS JOIN months
where start_month <= month
AND end_month >= month;

generates the message
“Did you restrict your cross join to months that were between start_month and end_month”

and I can’t move on to step 4, help?

1 Like

You’re not wrong, I personally think “including” would be a better word than “during” since month is always between start_month and end_month.

SELECT start_month, month, end_month, COUNT(*) 
FROM newspaper
CROSS JOIN months
WHERE start_month <= month
  AND end_month >= month
GROUP BY month;

Or better with BETWEEN and AND

SELECT start_month, month, end_month, COUNT(*) 
FROM newspaper
CROSS JOIN months
WHERE month 
  BETWEEN start_month AND end_month
GROUP BY month;

Capture

1 Like

SELECT month,

COUNT(*)

FROM newspaper

CROSS JOIN months

WHERE start_month <= month

AND end_month >= month

GROUP BY month;

I found it hard because I can’t get any visualization of it. Well if anyone confuses with CROSS JOIN then this is for you:

Source: SQL CROSS JOIN with examples

Hopefully this is the correct place to ask this question. In the SQL cross-join exercise with the following code:

SELECT month, COUNT(*) FROM newspaper
CROSS JOIN months
  WHERE (start_month<=month) & (end_month>=month)
GROUP BY month;

How can I then retrieve the row of the max COUNT(*) from this table?
({month:6, COUNT( * ):30}) ?

I tried the following which doesn’t work:

SELECT * FROM 
  (SELECT month, COUNT(*) FROM newspaper
    CROSS JOIN months
      WHERE (start_month<=month) & (end_month>=month)
    GROUP BY month)
WHERE COUNT(*)=(
  SELECT MAX(COUNT(*)) FROM
   (SELECT month, COUNT(*) FROM newspaper
    CROSS JOIN months
      WHERE (start_month<=month) & (end_month>=month)
    GROUP BY month)
    );

Preferably, I would to this to work without renaming COUNT(*).

Have you covered sub-queries or CTEs? The simplest option might be to use the original solution and simply query the table created with the additional filtering you want.

-- perhaps something like this
WITH temporary_name AS (original query) 
* commands to get max row *

-- or this
* commands to get max row *
FROM (original squery)

Earlier join exercises gave the full address for the column we were looking for, does this exercise not do that because the names are different?

aka:

SELECT months.month,
  COUNT(*)
FROM newspaper
CROSS JOIN months
WHERE newspaper.start_month <= months.month
AND newspaper.end_month >= months.month
GROUP BY months.month;
1 Like

Can someone please explain where and how this month column appeared?

The whole concept of this exercise is very confusing.

2 Likes

You could just add ORDER BY 2 DESC LIMIT 1; where 2 is COUNT(*) if I got your question right.

I had this same question.

The query for the first part of this section says that the answer should be:

SELECT COUNT(*)

FROM newspaper

WHERE start_month <= 3

AND end_month >= 3;

but shouldn’t the WHERE clauses be WHERE start_month = 3?

The requirement here is that the user had an active subscription during March. So they could have, for example, started in February month <= 3 (less than or equal to) and as long as they didn’t cancel that month they’d be considered active in March (even if they cancelled during or after March month >= 3).

Answer for question 3.

SELECT COUNT(*)

FROM newspaper

WHERE start_month <= 3

AND end_month >= 3;

SELECT *

FROM newspaper

CROSS JOIN months;

SELECT *

FROM newspaper

CROSS JOIN months

WHERE start_month<=month AND end_month>=month;

Can someone please explain where this ‘month’ column came from?

SELECT month,
COUNT(*)
FROM newspaper
CROSS JOIN months
WHERE start_month <= month
AND end_month >= month
GROUP BY month;

There is no month column in newspaper, so where did it come from?

The month column comes from the months table!

My issue with the logic in this example are that the subscription could have started in December (12) and ended in April (4) so the “where” statement wouldn’t be satisfied, but that is an actual scenario we would want to count. Could you explain to me how the query could be fixed to work this way?

2 Likes

It’s a good question. I think the data here is given for just a single year so the dates don’t wrap around and you don’t need to worry about it (at no point is any record’s end_date > start_date in this dataset). If you had dates that crossed years you’d probably want a more complex way of dating each subscription event.

Most SQL flavours have one or more date and/or time types and you’d likely be using them in that case. I think some of the lessons on Churn have dates from multiple years so you will get some practice on them at some point (I’m not sure if those lessons are on the course you’re on but you can find more SQL lessons in the catalogue, they’re in there somewhere).

SQLite which is what most of the early SQL lessons use doesn’t actually have any specific date or time types- Datatypes In SQLite. However it is built to allow various dates/times to be stored as TEXT along with tools to make working with them easier- Date And Time Functions and certain operators like <, > and things like BETWEEN work with proper “dates” and not just the small range of integers you have here when the text data is correctly formatted. Then you should be able to correctly filter for dates that span years :slightly_smiling_face:.

2 Likes

Ah yes, that makes a lot more sense. I did make a note of myself that this only works for a single year once I thought about it a bit more. Would you be able to do:

WHERE months.month BETWEEN start_month AND end_month – I think between is inclusive?

Thanks again for answering!

2 Likes

I think it sort of works but there is some issue in that '2022-08-15' is treated like 2022-08-15 00:00 ... etc. so '2022-08-15 00:01' or indeed any time behind the very stroke of midnight is not between!

So if your data happened to have any other units in its timestamp then you could be silently missing or returning erroneous data.

If you weren’t sure your input always had, and always will have, a date-only format then you might need something like WHERE date(months.month) BETWEEN start_month AND end_month to strip everything but the date. However that comes with its own issues in calling a function (which may be a problem for query efficiency at large scales due to overheads and possibly missing indexes).

This might be a decent intro to the potential issue in the wild (it’s SQL server but the datetime issue still applies)- https://stackoverflow.com/a/16347680

Something like the following is suggested for instances where you may have inconsistent input where the end_date is incremented by one day. This then represents midnight the next day so that a < less than operation would find also find timestamps during the day of interest-

WHERE start_month >= start_date
      AND end_month < end_date_plus_day
-- target end_date of '2022-08-15' would instead use '2022-08-16'
2 Likes

Hello!

I am not sure what I am doing wrong. I have tried entering the code as is and commenting out the previous code, but everything I do I keep getting an error message and can’t move forward. The error is “Did you add a new query which cross joins based on the and ?”

I think it’s quite obvious but just to be sure; this query to find the month when a customer was subscribed wouldn’t work if someone subscribed in December (12) and canceled in April (4). I just wanted to point it out. :smiley:

image