Getting off track in SQL lesson

This problem needs fixing!

1.The question asks for the average distance flown by day and by month. It is not clear what is being asked. I took it to ask for average distance flow on all Mondays etc… and then average distance flown all Januaries… etc.

  1. I also don’t understand why the code groups by dep_date along with dep_month and day columns. It seems redundant, but when only grouping by dep_date, I get a different average. Am I missing something here?

SELECT dep_month,
dep_day_of_week,
dep_date,
sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3 --why group by all three columns?

I tried converting the date to show only month and see the if there are any discrepancies but the date is input as text which is annoying. I hope future lessons show how to convert this to datetime.

I ran a query to find errors in the month because I was getting different values of average distance base on grouping by dep_date and dep_month

select id, dep_date, dep_month from
(select id, dep_date, dep_month,
case
when dep_date like ‘1/%’ then ‘01’
when dep_date like ‘2/%’ then ‘02’
when dep_date like ‘3/%’ then ‘03’
when dep_date like ‘4/%’ then ‘04’
when dep_date like ‘5/%’ then ‘05’
when dep_date like ‘6/%’ then ‘06’
when dep_date like ‘7/%’ then ‘07’
when dep_date like ‘8/%’ then ‘08’
when dep_date like ‘9/%’ then ‘09’
when dep_date like ‘10%’ then ‘10’
when dep_date like ‘11%’ then ‘11’
when dep_date like ‘12%’ then ‘12’
end as dep_date_month

,

substr(dep_month, 6,2) as dep_month2
from flights)
where dep_month2 != dep_date_month;

Two flights have opposing months. I believe the day of the week entry has errors as well. This could all be avoided if the dep_date was a datetime field. How to you convert text to datetime?

|9405|9/1/2001|2001-08|
|7027|1/1/2005|2004-12|

Below is my code for average total distance each day of week and each month.
I realized something is very wrong though. The dataset does not include data on when days have no flghts. the AVG function will on average days on which there is a flight not including a 0 value for days with no flights.

–Selects average distance by day
select dep_day_of_week, avg(total_distance)
from

–Subquery to sum distance from every flight grouped by day. The CASE is to order by day correctly.
(select case
when dep_day_of_week = ‘Monday’ then 1
when dep_day_of_week = ‘Tuesday’ then 2
when dep_day_of_week = ‘Wednesday’ then 3
when dep_day_of_week = ‘Thursday’ then 4
when dep_day_of_week = ‘Friday’ then 5
when dep_day_of_week = ‘Saturday’ then 6
when dep_day_of_week = ‘Sunday’ then 7
else null
end as daysnumbered, dep_day_of_week, dep_month,dep_date, sum(distance) as total_distance
from flights
Group by dep_date)

–grouped by days to get average of miles for every day
group by daysnumbered
order by daysnumbered
;

select month, avg(total_distance)
from

–SUBSTR converts dep_month into month only

(select dep_month, substr(dep_month,6,2) as month, sum(distance) as total_distance
from flights

group by 1
order by 1)

group by 1
order by 1;