Non-Correlated Subqueries III (what is the 'a')?

I don’t understand why the “a” is there and what the ‘a’ does and what the ‘a’ means.

SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_distance) AS average_distance
FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;

https://gist.github.com/b99936ea23c720b14b4aa24c7d5581c3.

Thank You In Advance :grinning:.

3 Likes

Me neither. I looked at the other answers but I don’t understand. Also why group by 1,2,3?

2 Likes

“a” is the identifier/name assigned to the sub query:

(
SELECT dep_month,
    dep_day_of_week,
    dep_date,
    sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3
) a

So if you reference “a.” like here:

You are referencing the result of the aforementioned sub query.

This dep_month:

Is the dep_month from the result of this:

It’s almost the same as naming columns (the AS):

3 Likes

I’m still new to this so someone please correct me if I misspeak.

You’re making a table inside as part of a query. So how the information is presented in the table (inner query) will have an effect on how you reference it with the second query (outer query).

If you only do GROUP BY 1: It will show each unique entry for dep_month:

|dep_month|dep_day_of_week|dep_date|flight_distance|
|2000-01|Sunday|1/23/2000|10414|
|2000-02|Saturday|2/26/2000|8963|
|2000-03|Saturday|3/18/2000|2641|
|2000-04|Wednesday|4/12/2000|3377|

If you do GROUP BY 1, 2: It will sort like it does above except it’s also going to create extra entries for each unique entry in dep_day_of_week. This means you will now have multiple entries for each unique dep_month as long as it has more than one dep_day_of_week (basically, no longer making it unique to itself) since we’re also including each unique entry of dep_day_of_the_week (results explanation below the data):

|dep_month|dep_day_of_week|dep_date|flight_distance|
|2000-01|Monday|1/31/2000|1302|
|2000-01|Saturday|1/1/2000|328|
|2000-01|Sunday|1/23/2000|2074|
|2000-01|Thursday|1/13/2000|3610|
|2000-01|Tuesday|1/18/2000|2820|
|2000-01|Wednesday|1/12/2000|280|
|2000-02|Friday|2/4/2000|1449|
|2000-02|Monday|2/21/2000|2704|
|2000-02|Saturday|2/26/2000|1095|
|2000-02|Sunday|2/20/2000|1173|
|2000-02|Tuesday|2/15/2000|1515|
|2000-02|Wednesday|2/23/2000|1027|

So if you look above, you’ll see that 2000-01 now has 6 unique days showing, no repeats (it’s missing Friday because 2000-01 had no Friday departures).

If you ONLY do GROUP BY 2 (dep_day_of_week), it will only show 7 entries, one for each day of the week as that’s how many unique options there are for this column.

So the point of using GROUP BY 1, 2, 3 inside of the Inner Query is to present the information properly that you’re going to need for the outer query to pull data from it.

In non-correlated sub-queries, the inner query is always able to create a table on its own. This means you can enter in just the inner query and play with it to see what the data is doing.

3 Likes