FAQ: Subqueries - Correlated Subqueries I

This community-built FAQ covers the “Correlated Subqueries I” exercise from the lesson “Subqueries”.

Paths and Courses
This exercise can be found in the following Codecademy content:

SQL: Table Transformation

FAQs on the exercise Correlated Subqueries I

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

I don’t understand the importance of
WHERE carrier = f.carrier

Could someone explain please?

4 Likes

The average value is calculated based on “carrier”, it is already compared with “distance”

2 Likes

This was confusing for me too.

I think since SQL is accessing the same table “flights” and column “distance” based on carriers SQL needs to distinguish the difference between the two. One instance of carriers is holding all the distances as “f.carrier” while the the other is holding the average (AVG) distances as just “carriers” so we are now comparing the two here: WHERE carrier = f.carrier I think SQL needs to be able to distinguish the two in order to use the < or > operators in the above query and give us the appropriate id associated to those carriers who are above or below average.

This is basically what @smilexdrus has stated and what I think I understood from it. I’m just trying to be more explanatory about it.

3 Likes

Why is there the following?
f.origin = flights.origin

Aren’t they both referring to the same chart and data?

1 Like

My understanding is that this will calculate the average distance for each carrier every time it comes up in the flight list.
i.e. the average for each carrier is calculated multiple times.

Is that true?
If so, is this a wasteful/slow way of doing it?
If so, how would you go about doing it otherwise? Would you create a table of carrier names and averages then look up the value in that? Would that actually speed things up?

Thanks in advance,
Alex

I don’t understand this:
SELECT id
FROM flights AS f
Why “as f”?
I tried omitting the “as f” part and used “WHERE carrier = flights.carrier” instead of “WHERE carrier = f.carrier”. Does it mean the same thing?

The aliased table (f) is used to distinquish between the two times that the flights table is approached for data. Simply put, the query refers to the same table twice:

Once to select ID’s where the distance is greater than…
Once to select the average distance.

The two are combined to create the result.

The query is confusing because multiple ways of working are used (aliased tables and non-aliased tables). Personally I would write the query as follows:

SELECT a.id
FROM flights AS a
WHERE a.distance > (
SELECT AVG(b.distance)
FROM flights as b
WHERE b.carrier = a.carrier);

This creates a far better overview of what is actually done.

Coming back to my earlier explanation:

Once to select ID’s where the distance is greater than… <- is extracted from aliased table a.
Once to select the average distance. <- is extracted from aliased table b.

2 Likes

Reading the instructions of the exercise :
“Find the id of the flights whose distance is below average for their carrier”,
what I thought that I had to do was to calculate the average for every distinct carrier and then compare the result with the distance of each flight(id) . The expression “for their carrier” meant for me executing a 'GROUP BY carrier ’ in the subquery.

So, firstly I wrote seperately the query :
SELECT carrier, AVG(distance)
FROM flights
GROUP BY carrier
to have a clear picture of the average distances for every carrier

and then I wrote the code I thought as the solution:
SELECT id, carrier, distance
FROM flights
WHERE distance < (
SELECT AVG(distance)
FROM flights
GROUP BY carrier)
ORDER BY carrier;

Apart from id, I also selected the columns ‘carrier’ & ‘distance’ as a more analytical approach of the solution and moreover , I ordered the outer query by carrier to be easier for me to approach the results in relation to my first ‘experimental’ query.

Unfortunately, my solution turned out to be wrong because: 1) it was different from the codeacademy solution code & 2) some results were not logical.
For example, for the FL carrier the AVG(distance) is 583,16.
My solution included the id flight 12038 whose distance is 590. This shouldn’t be the case, I was looking for id’s whose distance is lower than average, not higher!

In the official solution code (I also added here the columns carrier, distance and ORDER BY carrier for the outer query), this id flight is , as expected, not included .

I wonder what is wrong with my “GROUP BY carrier” approach in the subquery. Why some of the calculated results are wrong?

Finally, what is actually the contribution of the line in the official solution code “WHERE carrier = f.carrier”. Which are the calculations that are made, in what order, when the code is executed in relation to this line and the results are as expected?

1 Like