This particular lesson of codeacademy’s isn’t very good. Everything is being told and nothing is being taught, so I decided I would try to make a guide to help out other users like myself learn what is going on with SQL in this section. This is more of an approach of how to play with SQL in order to learn it. I encourage the user to play around to see things for themselves - the editor allows this and we should take advantage of that. It’s long, but hopefully clearer.
TL;DR it doesn’t actually sort by time, there is no time clause in the code. f.id < flights.id is more about the ordering in the new table rather than the value of the truth statement, and so the inner subquery won’t work at all without the outer one. Subquery essentially acts as a for loop. Follow this guide, it takes like 5 minutes and you can see for yourself.
First, the basic
SELECT * FROM flights ORDER BY carrier;
The result you get should be all of the columns in the table, ordered by carrier.
SELECT (col_name, col_name)i.e.
SELECT carrier, id, originand now change the order of of columns to
SELECT origin, id, carrier- the difference in order shows up in the result.
SELECT COUNT(*)to see the total number of rows
SELECT COUNT(id)to retrieve the total of non-null rows with ids
- You can change
ORDER BY carrierto another column name, i.e.
ORDER BY id
- Add a
WHEREclause under FROM flights:
WHERE id < 300or any number
- Play with count, select, and where to see different results in the table.
Now I have something like this:
SELECT COUNT(*) FROM flights WHERE id < 200 ORDER BY id;
Now I want to figure out what’s going on with
flights f and
f.id < flights.id so let’s see how we can play with this “
f” to see what’s going on.
id < 200to
flights.id < 400(change the number to verify a new result), run and you should see a new number count come up. Leave this alone.
- Don’t use
f, pick a random letter or word, I choose “
pickle” and change the line
FROM flights pickleand run the code again.
You should receive an error that says “no such column flights.id” when flights.id clearly worked before. This means that in the line FROM flights f, the SQL is essentially renaming the table “flights” to “pickle”
- Now try changing
flights.id < 400to
pickle.id < 100and run. It should work, returning the count of how many flights with an id below the number given.
- Now change the clause to
WHERE pickle.carrier = 'AA'to test.
- Again if you have
FROM flights picklethen
WHERE flights.carrier = 'AA'wouldn’t work, go ahead and try again to see. flights has been renamed.
- Lastly, change the clause to
WHERE pickle.id < flights.idand run. Then try
WHERE pickle.carrier = flights.carrierand there will be an error both times saying that flights.id and flights.carrier do not exist.
This is because we renamed “flights” to pickle and now when the code is referencing “flights” it needs to come from the outer query, the next “FROM flights” in the code, so we need to go ahead and look at the outer query.
Now, the code on the left of the page works, and we’ve got only the code inside the subquery, but errors come up when it runs on its own. We know that what is in the subquery is essentially a count, a number. So let’s look at the code in the higher structure with a random number to see what happens. Comment out with “
/*” and try this:
SELECT carrier, id, 200 +1 AS flight_sequence_number FROM flights ORDER BY carrier;
Notice that on the left of the page it says carrier, id, (SELECT COUNT(*) … ) +1. Notice that comma after “id” and know that the result in the subquery is a number when it’s working, and also part of the selection, i.e. when our WHERE clause isn’t
pickles.id < flights.id or
pickles.carrier = flights.carrier, so I went ahead and just put in a number and left the “+1”. Now run the code and and you will see a table of carrier, id, and a flight sequence number of 201 for EVERY entry. This wasn’t what we wanted, but we know this runs without errors.
Since the flights are being sequenced by time, let’s add dep_time to the selection and to see the sorting by carrier, I’m going to add an extra WHERE clause:
SELECT carrier, id, dep_time, 200 +1 AS flight_sequence_number FROM flights WHERE flights.carrier = 'AA' ORDER BY flight_sequence_number;
and you will see a result of all AA flights with the same sequence number. Now let’s throw in our subquery from earlier, replacing the “200 +1” which gives us:
SELECT carrier, id, dep_date, dep_time, (SELECT COUNT (*) FROM flights pickle WHERE pickle.id < flights.id AND pickle.carrier=flights.carrier) +1 AS flight_sequence_number FROM flights WHERE flights.carrier = 'AA' ORDER BY flight_sequence_number;
Now what you will see is a table of AA flights, ordered by ascending id number. I believe the instructions on the left said it wanted it to be ordered by time, but there is nothing in the code selecting that anything be ordered by time, instead it is where pickle.id < flights.id and pickle.carrier = flights.carrier. Plus if you look at the times in the table’s results it’s all over the place. This leads me to believe that what pickle.id < flights.id is saying, is to put the numbers in the new table, pickle, in ascending order. It’s about order, not about the value of the truth statement.
- change it to flights.id < pickle.id or just change the “<” to “>”. Run it. id order switches.
- change it to
pickle.carrier = 'AA'and lower
WHERE flights.carrier = 'DL'and you’ll get a completely messed up flight sequence chart.
*change it to
flights.carrier = 'AA'and
WHERE flights.carrier = 'DL'and again, you’ll get a messed up sequence chart with the only value being ‘1’
- Get rid of the “+1” and you will see the exact same chart, in the same order, except the sequence will start with “0” instead. That +1 isn’t telling it to sequence, the +1 is just telling it what index to start at - that subquery essentially acts as a for loop.
- Get rid of the second WHERE clause with a comment out line " --" in front.
My final code:
SELECT origin, id, dep_date, dep_time, (SELECT COUNT (*) FROM flights pickle WHERE pickle.id < flights.id AND pickle.carrier=flights.carrier) +1 AS flight_sequence_number FROM flights -- WHERE flights.carrier = 'DL' ORDER BY carrier, flight_sequence_number;
And what I get is a list of flights, sequenced by their id number, ordered by carrier. Now, for the solution, just change “carrier” to “origin” and you have:
SELECT origin, id, dep_date, dep_time, (SELECT COUNT (*) FROM flights pickle WHERE pickle.id < flights.id AND pickle.origin=flights.origin) +1 AS flight_sequence_number FROM flights -- WHERE flights.carrier = 'DL' ORDER BY origin, flight_sequence_number;
You can keep playing around and switching out variables to get a better sense of the subquery, and what works and what doesn’t. Ultimately, I think because the subquery is a count, SQL knows to treat it like a sort of for loop, which is why it doesn’t work when you just throw in a random number without all of the conditions of the subquery’s WHERE clause.
Hope this helps someone. This exercise actually did help me figure it out as I was stuck for quite some time on this section.