FAQ: Subqueries - Non-Correlated Subqueries III

This community-built FAQ covers the “Non-Correlated Subqueries III” exercise from the lesson “Subqueries”.

Paths and Courses
This exercise can be found in the following Codecademy content:

SQL: Table Transformation

FAQs on the exercise Non-Correlated Subqueries III

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Hi, I don’t understand the meaning of ‘a.’ in the query :blush:

why I have to add this letter to complete the exercize?
“a.dep_month”
“a.dep_day_of_week”
“a.flight_distance”
“a
GROUP BY 1,2
ORDER BY 1,2”

Non-Correlated Subqueries III

SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_distance) AS average_distance
FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;

19 Likes

Please excuse all the edits (I’m a SQL newb) and I could be wrong but I believe the inner query…

SELECT dep_month,
dep_day_of_week,
dep_date,
SUM(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3

…is being set to a variable of “a”. In turn, the outer queries runs a query on the “a” variable or table.

The inner query is used to create an organized table sorted by month, day of the week, and then I believe for cosmetic reasons(?), month date. The ‘flight_distance’ is a sum of all miles flown across all flights for that day in the month.

The outer query queries this cleaned up and organized temporary table ‘a’ and groups the final output by departure month, then all of the Mondays, Tuesdays, Wednesday’s, etc. for that month and then the average flight distances for each of those sets of days of the week.

In the end, we have the average total distance flown by day of week in each month.

To understand what was happening, it was helpful to run the inner query in the example first and see the table results and then add the outer query around it and again, see the results.

11 Likes

Thanks for this. Sheesh, huge jump from Non-Correlated Subqueries 2.

3 Likes

Can someone explain why the ‘a’ is there? I don’t understand why we need it? And can someone explain what the ‘a’ does?

2 Likes

Neutechnica, thanks for the info. I’m also a SQL newb to this, and struggled to understand this setup until I started looking through forums.

There doesn’t seem to be enough lessons covering subqueries or nested queries (though, please let me know if there’s a section I’m overlooking).

danyal1414 - The “a” is being used as the name of the copied table version of “flights”. In the outer query you have:
SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_count) AS average_flights

The “a” is the name of the table and after the period (.) is the name of the column

What I’m still wondering is what is the difference between adding that “a” and using “as xxxx” to, I suppose, alias the entire query. So instead of

SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_distance) AS average_distance
FROM ( … ) a

it would be

SELECT inner_query.dep_month,
inner_query.dep_day_of_week,
AVG(inner_query.flight_distance) AS average_distance
FROM ( … ) as inner_query

Is it just shorthand? Is the “as” superfluous in this case? It seems to work either way. Maybe I missed it, but I feel like this “a” notation was kind of sprung on us in the lesson and not well explained.

1 Like

I agree this was a pretty huge jump. I think this part of the lesson needs to be revised. Unless I am mistaken I don’t think setting variables has ever been discussed in any of the previous SQL lessons (excuse me if I am wrong or have forgotten). I also did not understand the “a.” I only understood this once it was explained here in the forum. I could only recognize this as setting a variable or “temporary table” from prior knowledge of this from programming languages that would set variables.

I was also stumped in how the outter query was able to AVG(a.flight_distance) being that the flight table did not have a “flight_distance” only “distance”. I later noticed it was set AS flight_distance in the subquery. It seems as if everything is set first in the subquery then calculated above in the outter query. Just like when we set the variable “a”.

I think it would be awesome if there could be links to previous lessons that pertain to what is being taught in the advanced lessons. It would help solidify what we have already learned if it has been forgotten and lead us to faster eureka! moments. :slightly_smiling_face:

4 Likes

I agree. The FROM( … ) a felt totally sprung on us. I had the same thought that we could have temporarily named this AS inner_query. I only understood this notion as setting it to a temporary table because I have some minor experience programming and in programming we sometimes set variables or “temporary” numbers to a letter. In this case we set a temporary table to a letter.

Hi, I don’t understand why we use a non correlated subquery to calculate the average distance flown per day each month.

In 2003-10, Monday it says the average distance flown is 4301, however if we look at all flights flown in 2003-10 on a Monday, it says the distances flown are: 3711 and 590. Therefore isn’t the average distance flown on a Monday (3711+590)/2 = 2150.5? 4301 appears to be the total distance flown?

I used the following query to work out the averages instead:

SELECT dep_month, dep_day_of_week, sum(distance) / count(id) AS average_distance
FROM flights
GROUP BY 1,2
ORDER BY 1,2;

Is this query right or am I completely misunderstanding the question?

1 Like

Hey. I have the same question. Why need nested queries when you can simply do this:

SELECT dep_month, 
    dep_day_of_week, 
    AVG(distance) AS average_distance
FROM flights
GROUP BY 1,2
ORDER BY 1,2;

Can anybody explain if i missunderstood the exercise or why the subquerry is neccessary in any way?
Thanks in advance & greetings
Tim

Hi Tim, I think I’ve figured it out what we got wrong :slight_smile:. The dataset isn’t large enough to see it but there is a slight difference in our queries.

The question was “Find the average total distance flown each day of every month”. Our query only calculates the average distance flown on each day and not the average of the total distance flown!

Take the following example dataset:

Month Day of Week Date Distance
Feb Saturday 2019-02-23 100
Feb Saturday 2019-02-23 500
Feb Saturday 2019-02-23 1000
Feb Saturday 2019-02-16 100

Our query averages all of the Saturdays together, 100,500,1000,100 = 425.

The nested query first adds the total of each date together 100,500,1000 = 1600 then averages the total of each date 1600,100 = 850.

On average, a single flight travels 425km on a Saturday.
vs
On average, the total distance flown across all flights is 850km each Saturday.

Hope that helps!

1 Like

Hi Paul.

That really helped. Thank you very much. Maybe I was too tired to understand the exercise correctly… :wink: :smiley:

For the solution, the inner query has “Sum(distance)”. However, to find the average distance travelled for a single day, I think using “Sum(distance)” is a mistake because then the outer query will only have 1 distance to average, which is the total distance travelled that day. Shouldn’t we take out the “sum” and simply puts “distance”–in this way we can average the distances in the outer query?

I thought the subquery could be run as it’s own independent query, but when i entered it and tried to run it as its own, just to mentally put the peices together, it kept giving me a syntax error? What’s happening?

Hi guys, as a fellow newb, I spent some time to figure this out.
Here’s my 2 cents.

1- The reason we can’t AVG in one big query is SQL doesn’t accept AVG(COUNT) or AVG(SUM).

Try it, you’ll get a syntax error (wrong use of aggregate function AVG).

Hence to find the average of Total we must run 2 queries:
1st- SUB query for the COUNT() or SUM() – which is basically treated as a table within a table (if that makes it easier to understand).
2nd- Outer query for the AVG (table of SUM()) or AVG (table of COUNT()) and you find your correct answer.

2- Why do we need to use an alias (a. or x. etc) I tried running the same command without an alias, it worked fine.
Tried running a word instead of a letter (alias. instead of a.) for some reason it didn’t work, I suppose there’s a function for alias.

Try it out and let me know if this works for you.

Ali

This problem needs fixing!

1.The question asks for the average distance flown by day and by month. It is not clear what is being asked. I took it to ask for average distance flow on all Mondays etc… and then average distance flown all Januaries… etc.

  1. I also don’t understand why the code groups by dep_date along with dep_month and day columns. It seems redundant, but when only grouping by dep_date, I get a different average. Am I missing something here?

SELECT dep_month,
dep_day_of_week,
dep_date,
sum(distance) AS flight_distance
FROM flights
GROUP BY 1,2,3 --why group by all three columns?

I tried converting the date to show only month and see the if there are any discrepancies but the date is input as text which is annoying. I hope future lessons show how to convert this to datetime.

I ran a query to find errors in the month because I was getting different values of average distance base on grouping by dep_date and dep_month

select id, dep_date, dep_month from
(select id, dep_date, dep_month,
case
when dep_date like ‘1/%’ then ‘01’
when dep_date like ‘2/%’ then ‘02’
when dep_date like ‘3/%’ then ‘03’
when dep_date like ‘4/%’ then ‘04’
when dep_date like ‘5/%’ then ‘05’
when dep_date like ‘6/%’ then ‘06’
when dep_date like ‘7/%’ then ‘07’
when dep_date like ‘8/%’ then ‘08’
when dep_date like ‘9/%’ then ‘09’
when dep_date like ‘10%’ then ‘10’
when dep_date like ‘11%’ then ‘11’
when dep_date like ‘12%’ then ‘12’
end as dep_date_month

,

substr(dep_month, 6,2) as dep_month2
from flights)
where dep_month2 != dep_date_month;

Two flights have opposing months. I believe the day of the week entry has errors as well. This could all be avoided if the dep_date was a datetime field. How to you convert text to datetime?

|9405|9/1/2001|2001-08|
|7027|1/1/2005|2004-12|

Below is my code for average total distance each day of week and each month.
I realized something is very wrong though. The dataset does not include data on when days have no flghts. the AVG function will on average days on which there is a flight not including a 0 value for days with no flights.

–Selects average distance by day
select dep_day_of_week, avg(total_distance)
from

–Subquery to sum distance from every flight grouped by day. The CASE is to order by day correctly.
(select case
when dep_day_of_week = ‘Monday’ then 1
when dep_day_of_week = ‘Tuesday’ then 2
when dep_day_of_week = ‘Wednesday’ then 3
when dep_day_of_week = ‘Thursday’ then 4
when dep_day_of_week = ‘Friday’ then 5
when dep_day_of_week = ‘Saturday’ then 6
when dep_day_of_week = ‘Sunday’ then 7
else null
end as daysnumbered, dep_day_of_week, dep_month,dep_date, sum(distance) as total_distance
from flights
Group by dep_date)

–grouped by days to get average of miles for every day
group by daysnumbered
order by daysnumbered
;

select month, avg(total_distance)
from

–SUBSTR converts dep_month into month only

(select dep_month, substr(dep_month,6,2) as month, sum(distance) as total_distance
from flights

group by 1
order by 1)

group by 1
order by 1;