FAQ: Subqueries - Correlated Subqueries II

The “f.id < flights.id” part of the code did not exactly make sense to me. It was not exactly clear why we need to compare the same column from two identical databases.

After searching online and looking over the code several times, I realized that the goal of code is creating “flight sequence number” based on the id. With the same carrier, the flight with the smallest id would have “1” as the flight sequence number, the flight with the second smallest id would have “2”, etc. So “COUNT (*) … WHERE f.id < flights.id and f.carrier = flights.carrier” just counts how many flights with the same carrier have smaller id numbers.

This video provides a nice explanation using a different example, which might help understand the code better. https://www.youtube.com/watch?v=fwhSEqMnbgI

1 Like

If we assume that we have just five flights and we want to count them we can do this in two ways:
I)

1<1 - FALSE count is 0
1<2 - TRUE count is now 1
1<3 - TRUE count is now 2
1<4 - TRUE count is now 3
1<5 - TRUE count is now 4

To have an accurate number of flights we have to + 1,

II)
or to just write a logical expression with <=:

1<=1 - TRUE count is now 1, in that situation, we don’t have to add 1 at the end.

So in this example,
we can just write logical expression without plus 1:

WHERE f.id <= flights.id
AND f.carrier = flights.carrier)

Hope it helps! :slight_smile:

1 Like

I don’t understand much about Correlated Subqueries II , why plus 1 in this select.

Thank for explaining, this was very helpful!
I am also still strugglng since the beginning of this “Table Transformation with SQL” course, I think it is a huge jump from the previous one and everything is not explained well. Your addition of the “ORDER BY” line made much more sense then the whole exercise attempt.

2 Likes

These questions just seem to get more and more vague. Very poor course!

This is how I understood it.

The inner query works out the relative position of an id value by using COUNT(*) to get how many id values are below it.

As an example, just focusing on the inner query and manually adding the id as “815” and the carrier as “AA

SELECT carrier, id
	FROM flights
	WHERE id < 815 
	AND carrier = "AA"
	ORDER BY id;

You get this returned;

carrier		id
AA			75
AA			194
AA			361
AA			391

id815” is the (5th) next value, using “<=” would be better;

SELECT carrier, id
 	FROM flights
	WHERE id <= 815 
	AND carrier = "AA"
	ORDER BY id;

You get this returned;

carrier		id
AA			75
AA			194
AA			361
AA			391
AA			851

If you then apply COUNT(*) you get its (relative) sequence number

SELECT carrier, id, COUNT(*)
 	FROM flights
	WHERE id <= 815 
	AND carrier = "AA"
	ORDER BY id;
carrier		id		COUNT(*)
AA			815		5

In the example they used “<” and it gives a value one less, hence the “+ 1” is added to fix the sequence otherwise it starts at zero.

SELECT carrier, id, COUNT(*) + 1
 	FROM flights
	WHERE id < 815 
	AND carrier = "AA"
	ORDER BY id;
carrier		id		COUNT(*) + 1
AA			391		5

The inner query is just to work out the sequence number, the outer query iterates through the whole table and feeding its current “rowid value (and carrier) into the inner query to find sequence value COUNT(*) for the id/carrier.

1 Like