Correlated subqueries II - understanding code logic


#1

Hi
in the code example below there are two conditional statements that i do not understand. Would someone with knowledge of SQL please explain these in plain text.

SELECT origin, id,
(SELECT COUNT(*)
FROM flights f
WHERE f.id < flights.id
AND f.origin=flights.origin) + 1
AS flight_sequence_number
FROM flights;

We select "origin", "id" and the count of all posts from the table "flights" and alias the table data "f". That much I understand. But later on we have the condition
f.id < flights.id <-- I would say tis is equal to f.id < f.id or flights.id < flights.id
The same argument goes for the next line in the code example. I am obviously wrong but I do not understand what else this code means.


#2

Seconded. Those conditionals seem redundant. Both lessons 5/7 and 6/7 do this with no explanation. e.g

FROM flights
WHERE carrier = f.carrier);

Is saying where 'carrier' in 'flights' is equal to 'flights.carrier.'


#3

I also got confused here.

WHERE f.id < flights.id
AND f.origin=flights.origin) + 1

But here's how i understand this logic.
Suppose our flight ID is 10 right now... so this clause is going to do search on the table as long as Subquery ID is <10.

So you could say will scan 0-9 rows in table.
And for those 0-9 rows if the origin matches it will increment the count.
So assume for our example it found 5 matches with the origin, Count field will get value of 5+1 = 6 , hence generating sequence for flight with id=10


#4

Where used here is to join two tables together, under the condition that all the rows in two tables have the same carrier field equal to each other.


#5

This had me stumped for ages too. I have reached the conclusion that since the inner query is run separately for each line/row of the outer query, then 'f' actually only refers to the data in the row currently being processed.

So in 5/7, when it says in the inner query:

SELECT AVG(distance)
FROM flights
WHERE carrier = f.carrier

it means 'calculate the average flight distance for the carrier currently being processed in f', a bit like a 'for' loop in JS. That way, the flight distance is always compared with the average for that carrier and not the overall average flight distance across all carriers.

Still at a loss trying to work out what's going on in 6/7 though! :confused: I'm finding that the explanations really could do with more actual exposition - they seem to simply throw in a heap of new elements in each screen with just a couple of lines describing the desired outcome, making it incredibly difficult to work out what each part of the code is doing.


#6

I'm still squinting sideways at 6/7 but here's my interpretation (not sure if it's the same as what @yeshus is saying, though I think it may be):

SELECT origin, id,
(SELECT COUNT(*)
FROM flights f
WHERE f.id < flights.id
AND f.carrier=flights.carrier) + 1
AS flight_sequence_number
FROM flights;

'f' is a variable, and could be any row from the 'flights' table. This allows us to compare every row of 'flights' to every other row, in order to place them into a sequence.

The inner query is counting how many instances there could be where the (variable) value of an id in 'f' is less than the value of the current id being assessed in the 'flights' table, for that same carrier.

For example, where flight.id is 75 and carrier is AA, it will find 0 instances of a possible lower value of f.id among other AA flights as 75 is the smallest number in the id list.

It then adds 1 to that number. We do this because the first item in a list is usually numbered 1, not 0. Thus, the flight with id of 75 will be assigned the sequence number '1'.

Similarly, where flight.id is 391 and carrier is AA, it will find 3 instances where the variable f.id could be lower (flight ids 75, 194 and 361). 3 + 1 = 4, so flight.id 391 will be assigned flight_sequence_number 4 among the AA flights. And so on.

Note that if you delete the line:

AND f.carrier=flights.carrier

then it will assign flight_sequence_numbers from 1 right through to 499 and not restart the numbering for each carrier.

I hope my reasoning here is sound and that it helps someone else out there struggling with the somewhat scant explanations for this module! (No doubt they'll improve those over the coming months.)


Lesson 6/7
#7

I agree! The questions asked and the lessons to be learned are out of scope. Very few seem to understand this lesson. It's one thing to just "get the code" from pressing the button, but we are here to learn and someone forgot to teach us the lesson here..better explanations would be greatly appreciated to a full (Not fragmented) understanding of each lesson.

kinda complaining but still very great full for Codecademy! Best free online course out there!


#8

"SELECT AVG(distance)
FROM flights
WHERE carrier = f.carrier)"

means calculate the ave by carrier, for example by carrier='MQ', we get a average. we select the id with distance greater than this average for carrier is "MQ", different carrier has different average.

You can use following code to test,

SELECT id, carrier, distance
FROM flights AS f
WHERE distance > (
SELECT AVG(distance)
FROM flights
WHERE carrier = f.carrier);

SELECT AVG(distance)
FROM flights
WHERE carrier = 'MQ';

SELECT id, carrier, distance
FROM flights
WHERE carrier = 'MQ';


#9

I understand the code but it doesn't seem to fully address the query criteria.

"It would also be interesting to order flights by giving them a sequence number based on time, by carrier...we could use the following query to view flights by carrier, flight id, and sequence number:"

Shouldn't the code have included an ORDER BY for the airline and then the sequence number? As an example we would have all AA (carrier) flights listed together and the flight id column would go from 1-75 for the 75 flights in the database.

Does anyone know the answer to this?


#10

Thank you for the explanation but my confusion came with the directions "Using the same pattern"..
I think anything with SELECT, FROM WHERE...etc has the same pattern.

I feel the question could have been asked much better than what is written....


#11

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.


#12