Like other posters have noted, not many seem to have completed this challenge.
I’m not very confident with my results. I get something that looks reasonable, but wouldn’t mind some feedback. I posted all 4 task here.
On task 6 my code is:
SELECT RANK() OVER (ORDER BY MIN(tempc) ASC) AS ‘coldest_rank’, *
FROM state_climate
GROUP BY year;
Result: I get mainly North Dakota from 1970 and before, with some Maine in between. Sounds reasonable.
On 7 my code is:
SELECT RANK() OVER (PARTITION BY state
ORDER BY MAX(tempc) DESC) AS ‘warmest_rank’, *
FROM state_climate;
Result: From this I get no result. What is wrong?
On task 8 my code is:
SELECT NTILE(4) OVER (PARTITION BY state ORDER BY AVG(tempc) ASC) as ‘quartile’, [year], state, AVG(tempc) avg_yearly_temp
FROM state_climate
GROUP BY year, state
Result: This gives me a long list of 1st quartiles for Alabama, then 2nd quartiles for Alabama etc. Looks reasonable, and the coldest quartiles are generally more historic.
On 9 my code is:
SELECT NTILE(5) OVER (PARTITION BY AVG(tempc) ORDER BY AVG(tempc) DESC) as ‘quintile’, [year], state, AVG(tempc) avg_yearly_temp
FROM state_climate
GROUP BY year, state
Result: Again, I get mostly North Dakota with a dash of Maine.
9: SELECT NTILE(5) OVER (PARTITION BY AVG(tempc) ORDER BY AVG(tempc) DESC) as ‘quintile’, year, state, AVG(tempc) avg_yearly_temp FROM state_climate GROUP BY year, state
I think, but I can’t guarantee that q.6 and q.7 are just a difference of interpretation. I’ve added my interpretations below (in text, not code) which are generally simpler queries than you’ve attempted. You might not agree with me in all these cases. Since it’s an open ended project I don’t suppose it matters and just practising SQL is worthwhile so you can decide whether to try and alter your own code or not. Afraid I don’t know the right answer for these questions.
My interpretation of step 6 was that you just wanted to RANK all data by tempc, that is from “any state or year”. As such there’s no need for grouping, partitioning or aggregate functions. Re-reading it and I’m on the fence a little since it mentions the coldest ranked years but ranking a cold year by the coldest temperature in any state (rather than say an average of all states) is a bad metric since a single state with a cold spell suddenly makes it seem like the entire US was cold for that year (which isn’t guaranteed). Since it’s an open ended project you can solve it both ways if you like.
For 7 it’s ranking again but partitioned by state, once again I’d interpret this as supposedly returning multiple values per state which are then ranked or the RANK serves no benefit. It would be a repeat of just getting MIN or MAX like in step 3/4 otherwise. You could group by state and then rank but the RANK still seems pointless since grouping and ORDER BY could do the same without window functions.
If this one isn’t clear then have a look at the execution order of SQL queries as it’s very useful to know. As more general advice it’d be good to get out of the habit of using * and instead query the actual rows you want, it’s easier to read and edit and more practical for larger tables.
My interpretation of step 8 was to split the temperature into quartiles (ordered by year) for each state. In this case you don’t need an aggregate as tempc itself is already an average for that year in that state. There’s also no call for pre-window grouping in this case as you want the window to operate on every year in each state anyway, the partition allows for this.
No. 9 is simpler than 8 in this case as all you do is change the function and skip the partitioning so the query itself skips any and all paritions or groups and aggregates.