Any tips/tricks for how to decide what to use to PARTITION BY and GROUP BY? I can’t seem to get the hang of it yet. Thanks!
PARTITION BY is used for window functions, while GROUP BY is for aggregates.
Specifically, GROUP BY puts aggregates together in one or more rows; PARTITION BY sits inside of an OVER() clause and helps you write a window function.
Taking the Climate Change exercise for example here:
SELECT state, MAX(tempc) FROM state_climate GROUP BY 1; will work, because GROUP BY applies to aggregates. But replace it with PARTITION BY, and it will not. Likewise, the code:
PARTITION BY state ORDER BY tempc ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS '2019_temp', state, year, tempc FROM state_climate;
Will work, but replacing PARTITION BY with GROUP BY and it will fail.
Or to take two similar queries:
GROUP BY year;
Will give you a table that shows you the average temperature for every year in the state_climate table.
However, the query:
SELECT AVG(tempc) OVER ( PARTITION BY year ) AS avg_temp, year FROM state_climate;
Will give you a separate result for each row in your table. This can be very confusing. You can see the results here:
Hope this helps clarify the difference! You use GROUP BY when you need to find an aggregate for multiple items, but you use PARTITION BY when you need to put a potential result against each row.
e.g. if I want to compare the highest temperature for a state in every row, to its current temperature for that year, I would use OVER ( PARTITION BY …), but if I just want the highest temperature for every state, I would use MAX(x) … GROUP BY state.