Stuck in this exercise: SQL: Data Transformation (4/7)


#1

In this exercise, I do not understand the answer for this answer. Below is the correct code.

SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_count) AS average_flights
FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;

I don't understand:
1) the 'a' in a.dep_month and a.dep_of_week
2) GROUP BY 1,2,3 --- why 1 2 3? What does that mean? In the previous SQL exercise it was grouped in categories such as price, etc.
3) ORDER BY 1, 2 --- I also don't quite get this.
4) And the 'a' at the end of the closing bracket. What does that 'a' do? Why is it there?

I am struggling with this particular problem because it doesn't really explain why these are there. Please help me. Thank you.


#2

First of all, I don't think that's the correct answer, cause it's exactly the same as the sample above the instructions.

To answer your question,
1 & 4 question: The big part inside first "FROM" is set to be "a" by the code:

Above part is 'a'. And that is what you should refer to when dealing with

or

2nd question: GROUP BY 1, 2, 3 means GROUP BY dep_month, dep_day_of_week, dep_date.1, 2, 3 here are 1st column, 2nd column, & 3rd column.

3rd question: Similar to GROUP BY 1, 2, 3.

Hope this helps!


#3

Here is the answer to this exercice:

SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_distance) AS average_distance
FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;


#4

hello i had the same question :slight_smile: thank you for helping, i get the group by 1,2,3 now in terms of columns now .... but I'm still stuck on why its called a... is that like a default?
like could i just use anything instead of the a???
thanks
G


#5

I was wondering the same thing and "a" is an alias used to shorten the code. So instead of writing out the entire table name you can use "AS" earlier in the code then set it to a, then use a. asdfadfsdf later on. I'm still a little foggy on the whole idea too.


#6

I appreciate the help on this one. This exercise seems like a big jump from what I've previously learned- am I missing something? I did go through the intro to SQL course...


#7

Hey according what I understood from the first SQL course when you perform a select what you get is a new table, so in this example as you can see the outer query says FROM(....."the subquery")
as you know,there,you must type the name of the table you want to query, and in fact you are doing it
because inside/subquery, you are performing a select so as consequence you get a new table,
the question now is, how do we refer to that table? and that's why you see the (..."The subquery") a<--
you are naming the table as a, so you can reference to the information of that table.


#8

Is there a reason dep_date is included in this solution? So far as I can tell, it isn't a deliverable, isn't part of the calculation, and removing that line doesn't change the outcome as long as I update the first group by function (delete ",3").


#9

@masternuna
Why do you use SUM(distance) and not AVG(distance) in the inner query?


#10

Apologies for the long explanation. Let us understand it step by step. Be patient. Read it fully to understand.

First of all, run the following code contained in inner query:-

SELECT dep_month,
               dep_day_of_week,
               COUNT(*) AS flight_count
FROM flights;

The output is only one record. But we want no. of flights per day for now month by month.

Now add GROUP BY statement at the end and run it:-

SELECT dep_month,
               dep_day_of_week,
               COUNT(*) AS flight_count
FROM flights
GROUP BY dep_month;

Now we can see only one day(eg. sunday,monday..) per month. But we can't see all days in a 
month

Now add one more GROUP BY statement at the end and run it:-

SELECT dep_month,
           dep_day_of_week,
           COUNT(*) AS flight_count
FROM flights
GROUP BY dep_month,
                    dep_day_of_week;

The output table now shows all the days of each month. Let us call this output table as 'a' and we 
will run an outer query on this output table referring to it as 'a'. "a.flight_count" means we are 
referring to column "flight_count" of table 'a'

We will now see average no. of flights per day in each month. Let us find average no. of flights per
day by running "SELECT AVG(a.flight_count) AS average_flights" on the inner query present in
step 3. Recall 'a' from step three and observe it in step 4 and 5. Add the following outer query and
run it:-

SELECT AVG(a.flight_count) AS average_flights
FROM (
SELECT dep_month,
           dep_day_of_week,
           COUNT(*) AS flight_count
FROM flights
GROUP BY dep_month,
                    dep_day_of_week) a;

We will get only one record. But we want average_flights for all days of each month

So we will add "GROUP BY a.dep_month" to see result of each month:-

SELECT AVG(a.flight_count) AS average_flights
FROM (
SELECT dep_month,
           dep_day_of_week,
           COUNT(*) AS flight_count
FROM flights
GROUP BY dep_month,
                    dep_day_of_week) a
GROUP BY dep_month;

But we get result for only one day of each month. And wrong result also. So we will group by 
dep_day_of_week also

So we will add "GROUP BY a.dep_day_of_week" to see result for all the days of each month:-

SELECT AVG(a.flight_count) AS average_flights
FROM (
SELECT dep_month,
           dep_day_of_week,
           COUNT(*) AS flight_count
FROM flights
GROUP BY dep_month,
                    dep_day_of_week) a

GROUP BY a.dep_month,
                    a.dep_day_of_week;

This will give the average no. of flights per every day for each month. But looking at only one 
column, we can't tell which day each record belongs to. So for visualization purpose, we will add 
"a.dep_month" and "a.dep_day_of_week" in the outer query in SELECT clause

We will add "a.dep_month" and "a.dep_day_of_week" in the outer query in SELECT clause:-

SELECT a.dep_month,
               a.dep_day_of_week,
               AVG(a.flight_count) AS average_flights
FROM (
SELECT dep_month,
           dep_day_of_week,
           COUNT(*) AS flight_count
FROM flights
GROUP BY dep_month,
                    dep_day_of_week) a

GROUP BY a.dep_month,
                    a.dep_day_of_week;




This is the only requirement we had to fulfil. The codeacademy has added dep_date which is not 
required but we have to add it to go to the next lesson. They have also not given result for fridays 
which we can see by adding "WHERE a.dep_day_of_week = Friday" at the bottom before GROUP 
BY clause like:-

SELECT a.dep_month,
               a.dep_day_of_week,
               AVG(a.flight_count) AS average_flights
FROM (
SELECT dep_month,
           dep_day_of_week,
           COUNT(*) AS flight_count
FROM flights
GROUP BY dep_month,
                    dep_day_of_week) a

WHERE a.dep_day_of_week='Friday'
GROUP BY a.dep_month,
                    a.dep_day_of_week;

This will give average_flights of fridays of each month. The result will be same even if we remove 
"a.dep_day_of_week" from GROUP BY clause.(Note: WHERE clause comes before GROUP BY 
clause otherwise we will see syntax error)

1, 2, 3 in GROUP BY caluse shown in codeacademy example is used to save typing time. 1 refers 
to the first element in SELECT clause(here inner 1 resefrs to dep_month in inner quesry and outer 
1 refers to a.dep_month in outer query. Similary 2 and 3.)

#11

I feel exactly the same. I feel like it jumped 10 steps ahead w/o explaining the ‘a’ concept of the subquery. The rest of the class sessions seem to progress nicely and then one was just a sucker punch! Thank you for helping us understand–those of you with more experience.


#12

There was far too big of a jump in content for this course, There has been no mention about a. before here and for this reason I’m stumped at this point and it appears many more before me are also.

It may be a good idea to alter this section for clarity.


#13

Thanks everyone for explanation of main points. That really helped me, when I thuoght I lost my mind.


#14

First off, a big huge thank you to lavajiit, for providing the “missing” lesson. shakes fist at codecademy

As a a couple of extra resources for people like me who benefit from reading documentation, I found these:

https://community.modeanalytics.com/sql/tutorial/sql-subqueries/ which I think provides a decent plain-text description of what’s going on with the aliasing syntax.

https://dev.mysql.com/doc/refman/5.7/en/select.html If you scroll down to the part that reads:

The AS keyword is optional when aliasing a select_expr with an identifier. The preceding example could have been written like this:

it gets into the use of “as” vs just aliasing by pure syntax (as in this example), which was confusing me quite a bit.


#15

This solution passes the test case, but it has a few incorrect results. It fails when more than one flight departs on the same exact day. In this case the sub-query groups all flights leaving on the same day into one row, then the parent query counts the rows as individual flights.

As proof, try this:

SELECT dep_month, 
	   dep_day_of_week, 
       dep_date, 
       distance 
FROM flights
ORDER BY 1,2,3;

check out 2000-02 Saturday, there are two flights departing on 2/26/2000, one with a distance of 550, and one with a distance of 545. But if you try the solutions sub-query:

SELECT dep_month,
	   dep_day_of_week,
	   dep_date,
	   sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3;

And check out 2000-02 Saturday, you will notice it has one row (designating one flight to the parent-query) with a distance of 1095. Thus the accepted answer is incorrect! And oddly enough, a correct answer (though not accepted) can be done without any sub-queries:

SELECT dep_month, 
	   dep_day_of_week, 
       SUM(distance) / CAST( COUNT(*) AS REAL) AS 'average_distance'
FROM flights
GROUP BY 1, 2;

#16

Thank you that really helps! If I understand now - we are naming our subquery as ‘a’

FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3
) a

So that is why in the beginning of the code, we add ‘a.’ to the beginning of each column we are trying to pull

SELECT a. dep_month,
a. dep_day_of_week,
AVG( a. flight_count) AS average_flights

I agree with a few others - it seems that there are a couple gaps in teaching through this module.


#17

Thanks for the explanation!


#18

I did this, except that I loaded the “dep_day_of_week” field before the “dep_month” field in both quieries. Does the aggregation have to be in a particular order of granularity in SQL, or is it just that the tutorial has one specific answer in mind, therefore giving an error when I do it as I said?

Thanks in advance.