Detailed explanation of the code

This is how I understood it.

The inner query works out the relative position of an id value by using COUNT(*) to get how many id values are below it.

As an example, just focusing on the inner query and manually adding the id as “815” and the carrier as “AA

SELECT carrier, id
	FROM flights
	WHERE id < 815 
	AND carrier = "AA"
	ORDER BY id;

You get this returned;

carrier		id
AA			75
AA			194
AA			361
AA			391

id815” is the (5th) next value, using “<=” would be better;

SELECT carrier, id
 	FROM flights
	WHERE id <= 815 
	AND carrier = "AA"
	ORDER BY id;

You get this returned;

carrier		id
AA			75
AA			194
AA			361
AA			391
AA			851

If you then apply COUNT(*) you get its (relative) sequence number

SELECT carrier, id, COUNT(*)
 	FROM flights
	WHERE id <= 815 
	AND carrier = "AA"
	ORDER BY id;
carrier		id		COUNT(*)
AA			815		5

In the example they used “<” and it gives a value one less, hence the “+ 1” is added to fix the sequence otherwise it starts at zero.

SELECT carrier, id, COUNT(*) + 1
 	FROM flights
	WHERE id < 815 
	AND carrier = "AA"
	ORDER BY id;
carrier		id		COUNT(*) + 1
AA			391		5

The inner query is just to work out the sequence number, the outer query iterates through the whole table and feeding its current “rowid value (and carrier) into the inner query to find sequence value COUNT(*) for the id/carrier.

35 Likes

I just don’t understand what this question was ? " Using the same pattern, write a query to view flights by origin, flight id, and sequence number. Alias the sequence number column as `flight_sequence_number .". I had to get the solution to see what the code was doing. I seriously did not follow this chapter so far. There should be better examples.

Thank you
Sivarama Krishnan (kvsrkrishnan)

24 Likes

I had the same problem, and have been having similar problems with the next few lessons as well.

5 Likes

SELECT origin, id,
(SELECT COUNT(*)
FROM flights f
WHERE f.id < flights.id
AND f.origin=flights.origin) + 1
AS flight_sequence_number
FROM flights;

Here we query based on origin and id rather than carrier and id as shown in the example.

3 Likes

But what does that achieve? What are you trying to query here? The Correlated subqueries are used for row-by-row processing, so for every orgin and id in the outer query… what are you executing in the inner query?

2 Likes

This is a very bad example of a Correlated subquery. It’s too complicated (because of lack of explanation and level of difficulty )to get the point across. Copy/pasting and tweaking the code to arrive at the solution is not practical as in the real world there’s no such luxury.
I think the SQL lesson falls apart in this chapter; everything else is very well explained and makes sense.

25 Likes

How does the query know that the ID for each carrier should be sorted. When I run
SELECT carrier, id
from flights
where carrier=‘MQ’
LIMIT 4;
I get
|MQ|17107|
|MQ|7869|
|MQ|2205|
|MQ|14979|

How does the query know to tag the lowest ID in the entire set as 1?
For example, carrier MQ, the first ID is 1878 and there are a total of 32 IDs. So 1878 is less then 31 IDs, why is it tagged as “1”.
|carrier|id|flight_sequence_number|
|MQ|1878|1|
|MQ|2205|2|
|MQ|3068|3|

I think I understand but I am getting the inverse. If I count the number of times the Lowest ID is less than the rest of the IDs and Add 1, so I get the highest number, and the inverse for the highest number which is Count of 0 since it is never lower than any ID, so plus 1 gives me 1. The query returns the opposite where the highest ID has the highest count and the lowest ID is a count of 1… Why?

1 Like

I just figured it out now. I reversed the tables I was comparing. It looks at the outer table ‘flights’ and for each ID, it sees if it is BIGGER than the inner table ‘f’. So for the smallest ID in ‘flight’ it is NEVER BIGGER than any ID in ‘f’ so the count is 0. We then ADD 1 to it and get it the sequence number of 1. And so on and so forth… and that is how the world works

1 Like

Can someone tell me why we’re adding 1? I’m losing my mind trying to figure this out.

Let’s try to think this through, and hopefully be able to relate back to the narrative. I’m going in blind right now, since I don’t have that in front of me, only a post from above.

Consider,

(SELECT COUNT(*)
FROM flights f
WHERE f.id < flights.id
AND f.carrier=flights.carrier)

is a number. The adding of one must be creating an incremented value, suggestive of extending a sequence.

We need to refer back to the narrative for some hint as to why the above is implemented, but at least we’ve given it some thought, to this point. Now to find the real explanation!

1 Like

Agree, this exercise doesn’t make any sense… Just put ‘origin’ instead of the ‘carrier’

1 Like

Hi,

First of all, I think your message helped me understand why we were incrementing by 1, which felt really weird to me, so thank you.

This whole lesson about subqueries has been a mess, no explanations at all anywhere, but I think your code does the exact same thing. I couldn’t find anything different when comparing the results.

3 Likes

The “f.id < flights.id” part of the code did not exactly make sense to me. It was not exactly clear why we need to compare the same column from two identical databases.

After searching online and looking over the code several times, I realized that the goal of code is creating “flight sequence number” based on the id. With the same carrier, the flight with the smallest id would have “1” as the flight sequence number, the flight with the second smallest id would have “2”, etc. So “COUNT (*) … WHERE f.id < flights.id and f.carrier = flights.carrier” just counts how many flights with the same carrier have smaller id numbers.

This video provides a nice explanation using a different example, which might help understand the code better. https://www.youtube.com/watch?v=fwhSEqMnbgI

12 Likes

If we assume that we have just five flights and we want to count them we can do this in two ways:
I)

1<1 - FALSE count is 0
1<2 - TRUE count is now 1
1<3 - TRUE count is now 2
1<4 - TRUE count is now 3
1<5 - TRUE count is now 4

To have an accurate number of flights we have to + 1,

II)
or to just write a logical expression with <=:

1<=1 - TRUE count is now 1, in that situation, we don’t have to add 1 at the end.

So in this example,
we can just write logical expression without plus 1:

WHERE f.id <= flights.id
AND f.carrier = flights.carrier)

Hope it helps! :slight_smile:

7 Likes

I don’t understand much about Correlated Subqueries II , why plus 1 in this select.

Very good clarifation from nessie2015 couple post earlier. If you just past whole example code and use ORDER BY 1,3, you can see very clearly how this query works.

I think the most unclear thing here is how loops use parameter from outer queries. Inner query looks like basic loop, where outer query gives parameter (id) and inner query does own stuff this based on it. What is not well covered in these courses is how SQL references work.

In subquery f.id and f.carrier goes through all rows. Flight.id is constant during every loop, because it refers to outer query. After looking is done like nessie described, flight.id flight.carrier moves to next row. Then same procedure runs again and correct number found.

image

4 Likes

So the way we’re creating the sequence number is to make a statement that is always false equal to zero and then add 1 to it? And because this loops through the inner table to do it it generates a sequence number one larger each time it assess?

Am I reading this right? what a weird and unexplained step.

This is totally the explanation that Codecademy need to have in the lesson and never bothered with. :slight_smile:

Nice one, @nessie2015 ! Especially with the example and the added point that they could just be using <= rather than adding 1 all the time.

your comment is very helpful, especially youtube link:) Thanks!

1 Like

@nessie2015 you’re a lifesaver!! :smile: