Experimenting with Non-Correlated Subqueries III


#1

Exercise Link:
https://www.codecademy.com/en/courses/sql-table-transformation/lessons/subqueries/exercises/non-correlated-iii

Problem:
In the learn portion, there is a code that is supposed to be used to discern how many flights can be expected on any given Friday. However, I feel the table inefficiently returns Flight Counts for EVERY day of the week. So, I wanted to make sure the table displayed only the rows with Friday as the value for the a.dep_day_of_week column. I am pretty close. but for some reason I am showing NULL values along with Friday values. No error messages are shown. How do I change this?

Here is the table:

Coding:

SELECT a.dep_month,
       (SELECT DISTINCT a.dep_day_of_week
       WHERE a.dep_day_of_week = 'Friday') AS 'Day of Week',
       AVG(a.flight_count) AS average_flights
  FROM (
        SELECT dep_month,
              dep_day_of_week,
               dep_date,
               COUNT(*) AS flight_count
          FROM flights
         GROUP BY 1,2,3
       ) a
 GROUP BY 1,2
 ORDER BY 1,2;

#2

SQL noob reporting in. (Imagine I'm your classmate sharing what I see, but not knowing any more than you)

It seems to me like you only select dep_day_of_week when it is friday, but you still select the row for all days.
You'd instead need to put the where condition so that it decides whether the row should be included, not just that field.

Also, the condition could be put earlier, in the other query that you are selecting results from.
I don't know just how clever SQL databases are, they might just do this optimization for you (this could be tested with a large database and timing it)


#3

Yeah I get what you're saying, almost like its in the wrong order.

I'm not sure how to change it, but yeah I seem to be communicating to the code "Get me all of the rows with these 3 columns for it, but only show me the Day of Week information if it is Friday.

Hmmm... wonder what kind of moving around I am going to have to do to fix it. I tried putting the where clause in front of the select clause, but that offered my a syntax WHERE error... I was hoping the Select Distinct clause would work for eliminating the rows which didn't include Friday..


#4

Okay, I figured it out:

SELECT a.dep_month AS Month,
       a.dep_day_of_week AS 'Day of Week',
       AVG(a.flight_count) AS average_flights
  FROM (
        SELECT dep_month,
              dep_day_of_week,
               dep_date,
               COUNT(*) AS flight_count
          FROM flights
         GROUP BY 1,2,3
       ) a
       WHERE a.dep_day_of_week = 'Friday'
 GROUP BY 1,2
 ORDER BY 1,2;

I had to bring the WHERE clause outside of SELECT clause and after the Subquery.

So what the code is doing is saying...

Show me a table with the columns: Month, Day of Week, and Average Flights

Collect the flights with matching information for all of these: "Month, Day of Week, Date." Then, count the numbers of flights that match the Month, Day, and Date with each other, label those numbers as Flight Count (Flights on this given day). This creates a new column, which is used higher up in the coding AVG(a.flight_count). The flights are listed per flight, we have to change it to flights per day, so that the program can calculate an average EARLIER in the code, I believe. Confusing, I know!

Then, only take the flights that are departing on Friday. Group them by day and month as well (I think so that it doesn't show each flight on each day, but just that there are flights on said day.) Sort from first flights to last flights.

Screenshot of results below, where it shows how many flights on Average you had per month for Fridays only. So, in December 2000 there was an average of 2 flights per Friday within that month:

In order to see what the Average is for Friday, period:

SELECT 
a.dep_day_of_week AS 'Day of Week',
       AVG(a.flight_count) AS average_flights
  FROM (
        SELECT
    					dep_month,
              dep_day_of_week,
               dep_date,
               COUNT(*) AS flight_count
          FROM flights
         GROUP BY 1,2,3
       ) a
       WHERE a.dep_day_of_week = 'Friday'

I took off the Month SELECT.

So now I am counting the flights per date, day of the week, and month. I am then taking only the day of the week that is for Friday, and averaging them together, as I am not separating them by month as well anymore.

I think it's hard to explain and understand because it seems to run off of a continuous loop, not a single set of commands. So, although at the top it tells me to average all of the counts, I don't even define the counts until later and that's how it works.


#5

I'd want the Friday condition to be in the query that runs first, so that there is less for the second one to do:

SELECT a.dep_month,
       a.dep_day_of_week,
       AVG(a.flight_count) AS average_flights
FROM (
    SELECT dep_month,
           dep_day_of_week,
           dep_date,
           COUNT(*) AS flight_count
    FROM flights
    WHERE dep_day_of_week = 'Friday'
    GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;

The only change from the code in the instructions is to add the WHERE part.

I'm not sure if it matters. It might be super important to do it that way or it might be something that the database happily does on its own. It's probably insignificant here either way, even for a large database. But if there's any quadratic complexity or worse then one would want to prune as early as possible.


#6

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.