4. Non-Correlated Subqueries III


Question is:

1.Using a subquery, find the average total distance flown by day of week and month.
Be sure to alias the outer query as average_distance and the inner query as flight_distance.

Why is my code not working?

my code based on class:

SELECT a.dep_month, a.dep_day_of_week, AVG (a.flight_distance) AS average_distance FROM ( SELECT dep_month, dep_day_of_week, dep_date, COUNT(*) AS flight_distance FROM flights GROUP BY 1, 2, 3) a GROUP BY 1,2 ORDER BY 1,2;


You forgot to select distance in the subquery.

        SELECT dep_month,
               distance, /* you need to select distance, or else your a.distance wont find anything, since it's only looking at data in the table with alias 'a' */
               COUNT(*) AS flight_distance
          FROM flights
         GROUP BY 1,2,3
       ) a      /* this is where the alias is set, in this case the alias is set to 'a' */


Thanks! My mistake! Your comment was very useful.


Hi. We need to recover the distance traveled on the flight, and then make an average of this value. Therefore, it is a sum (not a row count).

 SELECT dep_month,
               sum(distance) AS flight_distance
          FROM flights
         GROUP BY 1,2,3


Hey everybody, this is a possible solution:

SELECT query.dep_month,
AVG(query.flight_distance) AS average_distance
SELECT dep_month,
sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3
) query

You must SUM not COUNT
Best Regards.


Can you (or anyone else) please explain me why do you need to select dept_date as well? As for counting the day-of-the-week and month average I do not see the reason to select dep_date...


i have the same doubt @aleksziev. Did you find an answer for that?


Not yet.

///*post must be at least 20 characters*


i think it is just for additional information


there is a difference in calculation and the results. for example from the table,

dep_month   dep_day_of_the_week   dep_date      distance
2000-01	    Sunday	          1/10/2000	1043
2000-01	    Sunday	          1/23/2000	335
2000-01	    Sunday	          1/9/2000	696

there will be three distinct total of distance for this query

select dep_month, 
               sum(distance) as flight_distance 
               from flights 
               group by 1,2,3;

so, here we can get the exact of the average because the calculation will be

Average for Saturday, 2000-01 = (1043 + 335 + 696) / 3
                              =  691.333333


What is the calculation when dep_date is not included? I'm not understanding what it adds at all.


When included;

Average for Saturday, 2000-01 = (1043 + 335 + 696) / 3
                              =  691.333333

Avg = total distance / no of row selection of(dep_date, dep_month, dep_day_of_week)

Not included;

Average for Saturday, 2000-01 = 2074 / 1
                              =  2074

Avg = total distance / no of row selection of(dep_month, dep_day_of_week)


Okay, so when you don't include the departure date to group by, is adds all of those Saturday 2000-01 flights together and returns them in the alias table as one row. That gives you only one to divide by. Grouping them by date gives you the three to divide by.

Honestly, my grasp of GROUP BY is still pretty slippery, but this helped.



I don't have a great grasp on GROUP BY thing, so please bear with me and let me know if I totally misunderstood- I think both what you posted (and what this exercise accepts) is incorrect.

I don't believe that including dep_date actually returns average by day and month. However I believe that including id instead does. If you run your code, look for the result of Monday in 2003-10. You will see it is 4301. If you run the same code if id instead, it returns 2150.5.

Now, if you run a query to return all the Mondays in 2003-10, you get two rows: id 12038 & 3711. The distances of these two flights are 590 and 3711- the SUM of which is 4301, and the average of which is 2150.5. I believe this is because by including dep_date, it is not taking into account flights that occurred on the same date, and because it is grouping by the date, it ends up adding it and not returning the sub-query as distinct lines. Including and grouping by id instead of dep_date when creating the alias table, it takes flights which occurred on the same date into account. There are 499 entries and only 441 distinct departure dates, so this is presumably affecting other entries as well.

Please let me know if this makes sense or if I've completely misunderstood. Thank you!

Non-Correlated Sub queries III

Hi merendis! Thanks for your feedback, you are right that values are not correct, nevertheless the reason i thinks is:
the inner query calculates sum distance and return one record per day (dep_date) , then outer query calculates average, for the instance 4301 (total count) is the same average value because we have only ONE record ,anyway i consider dep_date is usefull because you need count all flights by day (GROUP BY dep_date let you associate all the flights by one especific day) .. maybe we could find another solution hmm..


Hi! Just noticed this, sorry, I didn't see a notification!

I agree and see what you mean, about why it's returning the incorrect values (you're much better at explaining than me!)

I suppose I'm still not seeing why dep_date is useful, because the prompt doesn't ask for anything about counting flights by day/date, just by day of the week. I'd think we'd want to explicitly avoid grouping flights by specific day in order to not count multiple flights on the same day as one.


Hey merendis! No problem, it happen ( :smile: ) i think that field allows you to differentiate flights by day (is not the same say: flights in monday ( any Monday of the year) that say ( Febrary's Monday) , but to be honest with you, the requirement is a little confuse.


Agree with you orlax, the requirement of the question is confusing.


Yeah unfortunately ! :confused:


4 posts were split to a new topic: 4. I don´t understand why it begins with "a"