Hello! I’m currently working on the Multiple Tables with Reddit exercise and have a question about the answer to Q9. I have the correct answer, but I don’t fully understand how the code is working. The prompt is to find the highest-scoring post for each subreddit using INNER JOIN with posts as the left table. Here is my code, which based on the hint is correct:
I don’t understand how the post title is being associated with the post score. From my understanding, this code is scanning through the database and returning the highest score for each subreddit as max_score as an aggregate calculation. Do aggregate calculations like MIN and MAX automatically make the rest of the query reference the row with the MIN or MAX value? If not, how is the post title being tied to the post score when it is being pulled by the MAX function? I’m not sure if I explained my question clearly, so please let me know if this doesn’t make sense. Thanks for the help!
Have you find the answer?
Your grouping means that several rows are going to be considered at once for each group, with MAX it is effectively returning the row with the greatest matching
p.score value from each of your groups. You can think of
MAX as more of a filtering tool in that case, out of the several grouped up rows the one where the maximum is found is returned. If you then query columns you’ll get the data only from the row that MAX selected.
In a simplified form using
SELECT MAX(score), name FROM ... the table shown below would return
12, 'Eddy' (we see the details from the matching row, the others are filtered out).
With GROUP BY you’re basically doing the same thing over several groups.
If it helps wrapping your head around it you’ll sometimes see things like MAX used in the
WHERE clause and similar situations where you don’t actually need the maximum value itself, just some data associated with that maximum value (selectively filter that row basically).