SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_count) AS average_flights
FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;
The inner query provides the count of flights by day, and the outer query uses the inner query’s result set to compute the average by day of week of a given month.
instructions:
Using a subquery, find the average total distance flown by day of week and month.
Be sure to alias the outer query as average_distance and the inner query as flight_distance.
solution:
SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_distance) AS average_distance
FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;
my question: In the inner query, why do I need to use
SUM(distance) AS flight_distance
instead of
COUNT() AS flight_distance
like what was used in the example?
I thought I had it done right, but turns out I needed to change the COUNT() to SUM() and not sure why?
Why would count work for the number of flights but not the number of miles/distance?
Not sure if I’m having trouble understanding the code or the math …
From the code the lesson provides, we are looking for average flights on day of week, like Monday, Tuesday, Wednesday,…etc. However, for example, it may not has same amount of flights on every Monday. Therefore, we count the flights on every day basis, like 2010-01-01, 2010-01-09, 2010-01-10,…etc, in the inner query. You can actually run only the inner query to see that. Then group them up by day of week and calculate the average at outer query. The final result is Monday, Tuesday,…, Sunday in each month in average, and you can see some of them are not integer because of the amount of flights is not the same.
Now, in the practice, we are looking for average distance. We need to sum up the distance of flights on each day instead of counting how many flights we have. Therefore, we simply use Sum(distance) at inner query. Then group them up by day of week and calculate the average at outer query. The structure and concept are basically the same.
2/23/2000 shows two flights on Wednesday, 1 flight on Tuesday. Ha, wat?
This exercise was confusing at first because the flight_count was so low. Why is there an average of only 1 flight per day?? I wasn’t sure I understood what the data meant at first.
Why do we group by 1,2,3 in the subquery? Aren’t we just finding the total distance flown each day in the inner query? Why doesn’t simply grouping by date work?
Can i ask about the ‘Group by’?
I have just learn about grouping by a column and don’t know about multicolumns yet.
Can anyone explain more?
p/s: i can find it by myself but i want to put a question here for the learners after can understand the content more. Thanks