Hi, I don’t understand the meaning of ‘a.’ in the query
why I have to add this letter to complete the exercize?
“a.dep_month”
“a.dep_day_of_week”
“a.flight_distance”
“a
GROUP BY 1,2
ORDER BY 1,2”
Non-Correlated Subqueries III
SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_distance) AS average_distance
FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;
Please excuse all the edits (I’m a SQL newb) and I could be wrong but I believe the inner query…
SELECT dep_month,
dep_day_of_week,
dep_date,
SUM(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3
…is being set to a variable of “a”. In turn, the outer queries runs a query on the “a” variable or table.
The inner query is used to create an organized table sorted by month, day of the week, and then I believe for cosmetic reasons(?), month date. The ‘flight_distance’ is a sum of all miles flown across all flights for that day in the month.
The outer query queries this cleaned up and organized temporary table ‘a’ and groups the final output by departure month, then all of the Mondays, Tuesdays, Wednesday’s, etc. for that month and then the average flight distances for each of those sets of days of the week.
In the end, we have the average total distance flown by day of week in each month.
To understand what was happening, it was helpful to run the inner query in the example first and see the table results and then add the outer query around it and again, see the results.
Neutechnica, thanks for the info. I’m also a SQL newb to this, and struggled to understand this setup until I started looking through forums.
There doesn’t seem to be enough lessons covering subqueries or nested queries (though, please let me know if there’s a section I’m overlooking).
danyal1414 - The “a” is being used as the name of the copied table version of “flights”. In the outer query you have:
SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_count) AS average_flights
The “a” is the name of the table and after the period (.) is the name of the column
What I’m still wondering is what is the difference between adding that “a” and using “as xxxx” to, I suppose, alias the entire query. So instead of
SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_distance) AS average_distance
FROM ( … ) a
it would be
SELECT inner_query.dep_month,
inner_query.dep_day_of_week,
AVG(inner_query.flight_distance) AS average_distance
FROM ( … ) as inner_query
Is it just shorthand? Is the “as” superfluous in this case? It seems to work either way. Maybe I missed it, but I feel like this “a” notation was kind of sprung on us in the lesson and not well explained.
I agree this was a pretty huge jump. I think this part of the lesson needs to be revised. Unless I am mistaken I don’t think setting variables has ever been discussed in any of the previous SQL lessons (excuse me if I am wrong or have forgotten). I also did not understand the “a.” I only understood this once it was explained here in the forum. I could only recognize this as setting a variable or “temporary table” from prior knowledge of this from programming languages that would set variables.
I was also stumped in how the outter query was able to AVG(a.flight_distance) being that the flight table did not have a “flight_distance” only “distance”. I later noticed it was set AS flight_distance in the subquery. It seems as if everything is set first in the subquery then calculated above in the outter query. Just like when we set the variable “a”.
I think it would be awesome if there could be links to previous lessons that pertain to what is being taught in the advanced lessons. It would help solidify what we have already learned if it has been forgotten and lead us to faster eureka! moments.
I agree. The FROM( … ) a felt totally sprung on us. I had the same thought that we could have temporarily named this AS inner_query. I only understood this notion as setting it to a temporary table because I have some minor experience programming and in programming we sometimes set variables or “temporary” numbers to a letter. In this case we set a temporary table to a letter.
For the solution, the inner query has “Sum(distance)”. However, to find the average distance travelled for a single day, I think using “Sum(distance)” is a mistake because then the outer query will only have 1 distance to average, which is the total distance travelled that day. Shouldn’t we take out the “sum” and simply puts “distance”–in this way we can average the distances in the outer query?
I thought the subquery could be run as it’s own independent query, but when i entered it and tried to run it as its own, just to mentally put the peices together, it kept giving me a syntax error? What’s happening?
Hi guys, as a fellow newb, I spent some time to figure this out.
Here’s my 2 cents.
1- The reason we can’t AVG in one big query is SQL doesn’t accept AVG(COUNT) or AVG(SUM).
Try it, you’ll get a syntax error (wrong use of aggregate function AVG).
Hence to find the average of Total we must run 2 queries:
1st- SUB query for the COUNT() or SUM() – which is basically treated as a table within a table (if that makes it easier to understand).
2nd- Outer query for the AVG (table of SUM()) or AVG (table of COUNT()) and you find your correct answer.
2- Why do we need to use an alias (a. or x. etc) I tried running the same command without an alias, it worked fine.
Tried running a word instead of a letter (alias. instead of a.) for some reason it didn’t work, I suppose there’s a function for alias.
kwanhui1990: the exercise isn’t very well explained, but:
You’re right, for any given day, the inner query only produces a single value: total distance travelled that day. So the first saturday in February will have 1 value. The second Saturday in February will have 1 value. The third Saturday will have 1, and the fourth Saturday will have 1. But all together, that makes 4 values. 4 different values for total distance, across 4 different Saturdays in February.
The outer query would then take the average of those 4 values. The idea is to be able to say that “On an average Saturday in February 2016, the total distance flown by all flights was…”
I don’t know why you’d want to quote a statistic like that, but that’s what they’ve asked for. I think that’s the real problem with this exercise: it’s hard to get your head around what they’re asking for because what they’re asking for is pretty weird.
Hey,
You need it to reference the rows you selected in the subquery.
in your main (outer) query you select a.dep_month. the a. tells your select to use the dep_month from the inner query and not the one right from the table.
I’m just adding here - since it seems like it’s been a while in the 2 years since this topic was started. The lesson doesn’t explain the “a.” prefix - it shows it in the example, but gives no reference to why it’s added. It looks like this has created confusion for more than just me - so I the replies here have helped me understand what the lesson clearly has jumped over and missed. I don’t know if admins give feedback to those that do the lesson design, but this was extremely frustrating - I thought I’d just brain-dumped something, but… no.
SELECT dep_month, dep_day_of_week, dep_date,
SUM(distance) AS 'flight_distance'
FROM flights
GROUP BY 1, 2, 3
ORDER BY 2, 1;
It will give you that result:
If you query the inner query like this :
SELECT dep_month, dep_day_of_week,
SUM(distance) AS 'flight_distance'
FROM flights
GROUP BY 1, 2
ORDER BY 2, 1;
You will get that result:
What it shows that in the inner query, every column has a function. In the first picture, there are two raws of 5.month and 6 months. So, maybe you can ask how does it affect the outer query?
For the first code, the main result will be:
For the second code:
If you pay attention to the months 5 and 6 on both pictures, it will give you an idea.
The variable ‘a’ used in the Query is not compulsory (Which means if you write the query with it or without it, you still get a result). But in some cases could be useful. For example, you might want to join that nested table with another table, adding an identifier/variable will prevent you from having to repeat your long code. It is simply as though you are given the nested table a name.