Is 4. Non-Correlated Subqueries III Bugged? My Best Explanation


#1

I used many methods from the tools codecademy provided in order to run some tests on this exercise. What I came to conclude is:

  1. The example that's provided is just a template, you do not copy it word by word (compare my code below to the example).

  2. The function of GET BY and ORDER BY and how they're being used in this exercise.

  3. The debate for the use of dep_date and the different outputs we could get. (identifying this as a bug)

There was also a debate on if the code needs to only post dep_day_of_week to Friday, but I considered it as a "What if..." condition before the example they provided, and that it's not part of the instructions. I tried to have it only post for Friday regardless, just have a clear mind.

I've seen a few people make the mistake of using the example's COUNT(*) function. If you read the instructions, it states it wants to know the AVERAGE distance traveled, and have real number stored in flight_distance, yet people do COUNT(distance) AS flight_distance. The COUNT(distance) only counts how many rows have a value if it's not NULL. We don't want to know how many rows it's using, we need the value in the row. So, simply put distance AS flight_distance so we can average out flight_distance in the OUTER SELECT. AVG(a.flight_distance) AS average_distance

With GROUP BY it'd be best to test yourself and see what the values do to your output. Between SELECT and FROM are the columns that have data. For this exercise, when you call GROUP BY 2 it just arranges the data (in this column) from other identical data into groups. To understand what I mean, go to your exercise page, at the top type SELECT dep_day_of_week FROM flights LIMIT 20; next, replace LIMIT 20; to GROUP BY dep_day_of_week; (or 1 if you don't want to type it out) and notice the difference. So, we now know GROUP BY gets rid of duplicate values and organizes the column.

In the INNER SELECT, I grab the month (1), days of the week (2), date (3), and distance (4) from the flights table. Stored distance into flight_distance and removed any duplicated values from the months, days of the week, and date. To test the output, I'd copy the INNER SELECT and put it at the top (over your current code) and see what happens. If you remove 1, 2, and/or 3 from GROUP BY you'll see how the data gets organized. A good example is SELECT dep_month, dep_day_of_week, dep_date, distance FROM flights GROUP BY 2; you'll see it only shows the only 7 days worth of information. ORDER BY is self explanatory, just orders the selected columns by ascending or descending order.

With understanding FROM ( condition ) a how I see it, the a is a data point, or a variable where it stores the condition and can be called in the OUTER SELECT, it's what bridges the INNER SELECT to the OUTER SELECT. You could use it like this: SELECT fly.dep_month FROM ( condition ) fly GROUP BY 1; or whatever makes it easier for you.

This is how I'm understanding everything from what I've been taught up until now, so I might be incorrect on a lot of things. I stand by my line of coding, and I know it can be improved, but I believe what I have it what any other beginner would have, and it follows the instructions. If I'm way off and it's completely wrong, then please tell me and explain why it's completely off. Thank you!


Here is what I think the code should be as "passable". Please read my submitted bug report at the bottom of this page to understand why I think this course is bugged and that this code should have worked. Here are the two screen captures I took when trying to justify my points:
dep_date can't organize
passable code
(links are from puu.sh, a quick-link screen-capture page)

SELECT a.dep_month, a.dep_day_of_week, AVG(a.flight_distance) AS average_distance
FROM (
  SELECT dep_month, dep_day_of_week, dep_date, distance AS flight_distance
  FROM flights
  GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;

https://www.codecademy.com/en/courses/sql-table-transformation/lessons/subqueries/exercises/non-correlated-iii?action=resume


Using dep_date as a text will throw off organizing the table, it's data should be varchar(255) like dep_month (http://puu.sh/qF4gL/2b256814c7.png).
Also, even without the use of dep_date to organize the table, I know I'm using the correct code to get the correct output, but I believe since the list isn't organized properly, it won't let me pass the course (http://puu.sh/qF5eE/b67de81cdf.png).
You can clearly see in this screen capture that the dep_month is organized, and every day with data for each month is posted (not duplicated), and the correct values on the average_distance is posted to the corresponding day/month. The only issue here is dep_day_of_week isn't organized, which I believe is the issue on why it won't let me pass this exercise.
I checked the forums for an answer, but it seems like people were stuck and gave up on this course. I find it a bit of surprising that not one person could provide a hint on the solution.


#2

SELECT a.dep_month,
a.dep_day_of_week,
AVG(a.flight_distance) AS average_distance
FROM (
SELECT dep_month,
dep_day_of_week,
dep_date,
SUM(distance) as flight_distance
FROM flights
GROUP BY 1,2,3
) a
GROUP BY 1,2
ORDER BY 1,2;

Works


#3

Ironically this gets the same output as my code, I did get the checkmark to pass this exercise so thank you. I didn't test if my code still worked after 24 hours to see if it was a bug or not, because theoretically it's passable but you found a working solution, so I'll just put this as the resolve to save everyone the frustration! Thank you again!


#4

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.