Do column references have to follow the order the columns are listed in the SELECT?

Question

Do column references have to follow the order the columns are listed in the SELECT?

Answer

No, once you list the columns after the SELECT, they can be referenced by the order they appeared, starting from 1 for the first listed column.

You are not limited to referencing them in the exact order they were listed, like

GROUP BY 1, 2, 3

You can freely use the references in any order, like you would normally without using references.

GROUP BY 3, 1, 2

However, when using references, it is important to always keep in mind what numbers referenced which column, as it can become confusing as you list more columns in the SELECT. It is a convenient shortcut, but not necessarily always the best choice.

9 Likes

in the example:

SELECT ROUND(imdb_rating),
COUNT(name)
FROM movies
GROUP BY 1
ORDER BY 1;

Why we are using GROUP BY 1 twice?

2 Likes

1 represents the column you are querying…i.e. imdb_rating
GROUP BY 1 means GROUP BY imdb_rating and
ORDER BY 1 means ORDER BY imdb_rating
Assuming you have more than one column in your query, then the preceding columns will be
numbered as well i.e. imdb_rating = 1, downloads = 2, price = 3, id =4

Another example below
SELECT imdb_rating, downloads, price, id
COUNT(name)
FROM movies
GROUP BY 2
ORDER BY 3;

Note that
GROUP BY 2 means GROUP BY downloads
ORDER BY 3 means ORDER BY price

21 Likes

Thanks for your explanation is very clear.
I understood the concept.

3 Likes

SELECT category,
price,
AVG(downloads(round 1))
FROM fake_apps
GROUP BY 1, 2;

In this example, I’m trying to get the price to round to 1 dp but it’s not working. Could anyone help please?

I am new to this, but try this and see if it is what you are going for:

SELECT category,

price,

ROUND(AVG(price),1)

FROM fake_apps

GROUP BY 1, 2;

2 Likes

i interpreted the question in a way that i encountered my error, hopefully it will be useful for future peeps and or, give a fuller understanding of the SQL language

column references HAVE to follow the order based on the columns you listed in SELECT but listing columns based on the order of the table will result into an error

Try this

SELECT category, ROUND(price, 1), AVG(downloads)
FROM fake_apps
GROUP BY 1, 2;

This will do the following:

  1. Select each category
  2. Select each price (within each category) and round it to 1 decimal place
  3. Select the average download at each price
  4. The entire result table will first be grouped by categories listed in the first column
  5. In the second column, each price present in each category will be listed
  6. In the third column, the average download of apps at each price will be listed.
1 Like

Hey, there!

Looking at your code, it seems you are trying to get the download average to round to 1 decimal point. If that’s the case, try this:

SELECT category, price,
ROUND(AVG(downloads),1) AS ‘Average Downloads’
FROM fake_apps
GROUP BY 1, 2;

However, if you really are trying to get the price to round to 1 decimal point, follow the advice from @illuminatusprimi.

1 Like

Why this query does not retrieve any result?

SELECT category, price, AVG(downloads)

FROM fake_apps

GROUP BY 1, 2, 3;

In this instance, 1 is category, 2 is price, and 3 is the average of the downloads. I don’t think you can group average downloads since they are all unique numbers and there is nothing to group…

1 Like

Good Day Lisa,

It is I would say one of those edge cases, you certainly can GROUP BY the average downloads and it would work, but of course the chances of any of them averaging to the same number is very slim. So it does work but would also appear like it is not working as visually you would likely get the same results as if you hadn’t used GROUP BY.

This does get quite confusing and is easy to think it is wrong.

I just wanted to add my thoughts to this topic.

In the lesson it refers to the following:

SELECT ROUND(imdb_rating),
   COUNT(name)
FROM movies
GROUP BY ROUND(imdb_rating)
ORDER BY ROUND(imdb_rating);

And I would agree using the reference to the column you want to GROUP BY or ORDER BY is easier and less error prone it can also be confusing when you are selecting a large number of columns from a table.

Imagine you are requesting 10 columns, you now to have to work out the correct reference to the column you want and then maybe you come back later and add an extra column before the one you are grouping by, your GROUP BY will now be using the wrong reference.

There is however a easier way and more verbose way. If we take the above query we can change it to the below:

SELECT ROUND(imdb_rating) AS Rounded_Rating,
   COUNT(name)
FROM movies
GROUP BY Rounded_Rating
ORDER BY Rounded_Rating;

This will achieve exactly the same result as the first query but you can see at a glance which column you are grouping and ordering by.

If you also take into account the scenario where you later come back and add a new column you will still be using GROUP BY on the correct column.

This is called an ALIAS.

You can find some documentation about it here: W3 Schools Alias Docs

4 Likes

@willypt @lisahuang99 @shortbottom
I think @lisahuang99 is correct in this case though, @willypt is overgrouping in this case. AVG is an aggregate function based on the groups but by grouping every selected column (including itself) it has nothing to AVG.