Using GROUP BY with column names gives incorrect results with CASE block

So, I was talking to someone with experience with SQL, and they told me that when using GROUP BY statement, I should refer to aggregated columns by their name, rather than by number.

For example, I should use the first lot of code, not the second:

SELECT imdb_rating, COUNT(name)
FROM movies
GROUP BY imdb_rating;
SELECT imdb_rating, COUNT(name)
FROM movies
GROUP BY 1;

They said that it makes the code more readable and maintainable, and avoids issues when the order of columns is changed later.

However, using this convention breaks down when I attempt to solve this puzzle: Https://www.codecademy.com/paths/data-science/tracks/sql-intermediate/modules/dspath-sql-aggregates-projects-and-applications/projects/sql-hackernews

If I wanted to avoid using numbers in the following code using a CASE block, what should I put the the GROUP BY clause? If I use source, it returns all rows as if they matched the ELSE statement.

SELECT CASE
	WHEN url LIKE '%github.com%' THEN 'GitHub'
	WHEN url LIKE '%medium%' THEN 'Medium'
        WHEN url LIKE '%nytimes%' THEN 'New York Times'
	ELSE 'irrelevant'
END AS 'source',
COUNT (*)
FROM hacker_news
GROUP BY 1;

I can see the point with this convention, and i agree. But of course, they are exceptions. Here i would absolute use number for column.

1 Like