Lesson 5 Correlated Subqueries SQL "f.carrier"


Ok, so I get why the code works, that's great and dandy. What I don't get is where table "F" came from or what that means in the last bit of code. Here's the code:
SELECT id FROM flights AS f
WHERE distance < (
SELECT AVG(distance) FROM flights
WHERE carrier = f.carrier);
I get that you need to define an average distance for each carrier so you can then select those below average. I suppose I do not understand how "WHERE carrier = f.carrier" accomplishes this task. Any explanation would be great. Thanks!

Correlated Subquery Lesson: Is there a name for what "f.carrier" is doing?

Hi @biteswhenbitten,

I will try my best to see if i can shed some light :). Let's look at the code you have.

FROM flights AS f
WHERE distance < ( SELECT AVG(distance) 
                   FROM flights
                   WHERE carrier = f.carrier);

The first part then is what is "F"?

Well if we review the code we see that "F" appears in two places. and the most significant of the two is the first appearance here:

FROM flights AS f

This is the "ALIAS" for the table flights.
It's exactly what it says, an alternative way to name this table. With that in mind what matters now is that you should reference this table as "F" from this point on.
So that means when we see "F" in the subquery we are basically stating this first table. We could just as easily have labelled it an alias of

FROM flights AS firstOuterTableNotInSubquery

(Ok not that easy but it is a lot more descriptive :slight_smile: )
That means essentially we have this

FROM flights AS firstTableNotInSubquery

This would do exactly what it says on the tin; give us a list of Id's from the table Flights.

Now, we don't just want this list of Id's, we need a list that meets a specific criteria, which we filter in the 'WHERE' clause.
In this case we want the Id's only for those Flights that have a recorded distance that is less than each carriers AVGERAGE. That there is the crunch!
So let's break this down, we know the first outer-query returns all Id's.
Well the subquery (inner-query) returns the AVERAGE(Distance) for each carrier from the same table.

There's caveats here though, first SQL doesn't like objects with the same name, so we can't just have TWO tables both called flights, we have to give one of them an alias by which to be recognized by, which is great we already did that we called it "F".
The second problem is if we just run the first part of the subquery

SELECT AVG(distance) 
FROM flights AS secondTableThisIsInSubquery

We would end up with an average for all carriers, but we just want to know the average for each carrier on their own.

So we use the 'WHERE" clause again to filter our answer only this time our filter goes thus:
Give me the AVERAGE(Distance) for each carrier individually, where the carrier I am asking you to calculate the average right now in the subquery, is the same carrier in the outer query.
So I am asking for a correlation by carrier; a match on a carrier in the inner table subquery to equal the carrier in the outer table query, so that I only accurately calculate and compress for the average of the carrier I am working on.

What you get then is

FROM flights AS firstTableNotInSubquery 
WHERE distance < ( SELECT AVG(distance) 
                   FROM flights AS secondTableThisIsInSubquery
                   WHERE secondTableThisIsInSubquery.carrier = firstTableNotInSubquery .carrier);

Hopefully that helps abit, even if it it just to raise another question which is a good sign too :slight_smile:

If you do need some more depth on this, or this is not clear please feel free to reply and someone will try to help as soon as possible.


That clears everything up! I did not pin together that the subquery had the
ability to A. Call back to the original query or B. Call back to an alias
for a table. I find codecademy gives a lot of assumptive instructions that
you would not even think SQL would be capable of that.
Does this mean that SQL has created a temporary table of some sort when I
refer to F? Especially given matching f.carrier to flights.carrier. Even
though it is not explicitly told that the first carrier match was from
flights the SELECT statement it came from was, so I feel it's a safe
Also, by stipulating that SQL can only Avg distance for matching carriers
for each row of information between the two tables, is this what creates
averages for all of the carriers? Thank you again. Very helpful :slight_smile:


There are a few tweaks that could probably be added that would help with some of these lessons I agree! Always feel free to suggest that where possible, it would benefit many individuals :slight_smile:

That aside then, you could certainly think of the temp table creation in that context: SQL made me a temp table with the new 'alias' I gave it, and so I am now querying this new temp aliased table. (As I write that I found that actually makes sense so it's not a bad stance to start on :slight_smile: )

As to the AVG() sum, I'm not sure if I read that last question right:

You get the average for all carriers, split up by that carrier's details that match the carrier in the outer query, and that is all you get, for the way the query is written right now. A summed average of all the distance for carrier X inside, where X is the reference of the carrier I am looking at outside.

To get the sum of ALL carriers, you would simply remove the 'WHERE' clause, so you are not filtering on anything at all.

If I didn't quite get that right, my apologies, please feel free to elaborate on the question.


Yes that makes sense :slight_smile: In my last question I meant that WHERE carrier =
f.carrier is how we filter for a avg of only matching carriers. That my
understand is that the last clause is how it compiles an avg for each


Bingo! you got it @biteswhenbitten :slight_smile:


hi @mike_in_training why didn't we use GROUP BY carrier instead of WHERE carrier = f.carrier?
What is the difference between the both? I would like to know if this code is correct?

FROM flights
WHERE distance < (
SELECT AVG(distance)
FROM flights
group by carrier);

If this is correct then I think there is no need for an alias of the table.


@mike_in_training could probably shed more light on this topic than I- but
I believe the answer is that would not display the information correctly. Q


Yes, in fact I just ran both versions now with entirely different id


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.