When using GROUP BY, do we always have to group by a selected column?

Question

When using the GROUP BY clause, do we always have to group by one of the selected columns listed after SELECT?

Answer

No, you can GROUP BY a column that was not included in the SELECT statement.

For example, this query does not list the price column in the SELECT, but it does group the data by that column.

SELECT name, downloads
FROM fake_apps
GROUP BY price;

However, usually we do include the grouped by column in the SELECT for the sake of clarity, so that it’s easier to see what rows belong to which group.

15 Likes

I tire it and it did not work out… in the example project on CAmy it gave me only the SUM.

In your example, what name and download will be then displayed in each line?

The display will start with price = 0.0, but what name and downloads will it display then?

In my experience, it will display the name, downloads of the last item having a price = 0.0

This question is confusing.

6 Likes

I tried running i and it gives out a result. Not sure how is it calculating?

can anyone help?

1 Like

The above code should not be used. Every non aggregate column used in SELECT statement must be added to GROUP BY clause.

2 Likes

Hi All,

I have tried, That wont work for me,

My Query was given below,

-- 'GROUP BY' Clause
-- It is Clause in SQL that is used with aggregate functions. It is used in collaboration with the SELECT statement to arrange
-- identical data into groups.
-- This 'GROUP BY' Statement comes after any 'WHERE' Statement, but before 'ORDER BY' OR 'LIMIT'

IF EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='GroupClause')
BEGIN
	TRUNCATE TABLE GroupClause
	DROP TABLE GroupClause
END
IF NOT EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='GroupClause')
BEGIN
	CREATE TABLE GroupClause
	( 
		ID INT,
		MovieName VARCHAR(200),
		ReleaseYear VARCHAR(200),
		UserRating FLOAT,
);
	INSERT INTO GroupClause(ID, MovieName,ReleaseYear,UserRating)
				   VALUES (1,'Iron Man','2009',86.2),
					(2,'Win Man','2018',86.1),
					(3,'Heartin Man','2008',25.2),
					(4,'Lucky Man','2009',34.45),
					(5,'Warner Man','2008',25.23),
					(6,'Evil Man','2010',67.1),
					(7,'Spider Man','2010',78.98),
					(8,'Lone Ranger','2015',67.11),
					(9,'Green Lanrern','2011',87.22),
					(10,'Wonder Woman','2012',34.45),
					(11,'X Men','2012',99.12),
					(12,'Fast and Furious','2012',100),
					(13,'Failure Man','2013',86.1),
					(14,'Successful Man','2013',55.35),
					(15,'Pain Man','2014',12.12),
					(16,'Cry Man','2018',24.24),
					(17,'Vanish Man','2015',36.35),
					(18,'Torque Man','2010',06.87),
					(19,'Stupid Man','2018',55.98),
					(20,'Gaint Man','2008',56.76),
					(21,'Faithful Man','2014',57.90);
END

IF EXISTS(SELECT * FROM GroupClause)
BEGIN
	SELECT ROUND(AVG(UserRating),2) AS AvgRating, ReleaseYear FROM GroupClause
	GROUP BY ReleaseYear

	SELECT SUM(ID), MovieName,ReleaseYear FROM GroupClause
	GROUP BY UserRating
	
END

My SQL Version :

Microsoft SQL Server 2019 (RTM) - 15.0.2000.5

Thanks,
Thiruanvukkarasu G

2 Likes

Basically, GROUPBY performs the same as DISTINCT? How are they different exactly?

2 Likes

I tried running it as well, but added the ‘price’ category.
Here’s the code:

SELECT name, downloads, price
FROM fake_apps
GROUP BY price;

The output was:

Screen Shot 2021-01-23 at 8.56.12 PM

So, what can we observe? It has indeed been grouped by price, by why does it show only the name of one app?

Let’s verify by looking at the entire table again:

SELECT *
FROM fake_apps;

It does appear that it took the first app that used each price and listed them in our ‘GROUP BY’ query result. So the first app that has a $0.0 price shows up, the first one that has a $0.99 price shows up, etc. etc. From what I understand, it depends on the order they were initially entered in the table.

Not a very practical example, but now we know it’s doable.

9 Likes

How can we order by function we aggregated?
For example:

SELECT category, SUM(downloads)
FROM fake_apps
GROUP BY category;

How can we order by SUM(downloads)?

(I tried

SELECT category, SUM(downloads)
FROM fake_apps
GROUP BY category
ORDER BY 2;

but it didn’t work.)

This is working for me.

1 Like

Thanks. i wasn’t paying attention and probably missed something.

Also works

SELECT category, sum(downloads) 
FROM fake_apps
group by category
order by sum(downloads);
2 Likes

By using

are you ordering by columns from the SELECT? Being 1 → category and 2 → sum(downloads) ??

2 Likes

@ja_saldivar YES. when you use column numbers with ORDER BY, the first column we pass after SELECT is 1 and the second is 2 and so on.

If we SELECT all columns using *, then first column id would be 1.

2 Likes

@sebastienbaillargeon
Sorting by SUM(downloads).
This code works exactly as expected, sorting the resulting table by the sum of downloads in ascending order

SELECT category, SUM(downloads) AS 'Total Downloads'
FROM fake_apps
GROUP BY category
ORDER BY 2 ASC;

But it gives the wrong order when I use the alias ‘Total Downloads’ in the ORDER BY portion of the code.

SELECT category, SUM(downloads) AS 'Total Downloads'
FROM fake_apps
GROUP BY category
ORDER BY 'Total Downloads' ASC;

Can anyone explain why?

I’m pretty sure it’s because aliases aren’t stored as values that can be accessed by subsequent code, even if it’s in the very same query like you have here (as far as i know, SQL doesn’t have variables).

I tried running

SELECT category, SUM(downloads) AS 'Total Downloads'
FROM fake_apps
GROUP BY category
ORDER BY '' ASC;

and got the same unordered result as when you did ORDER BY ‘Total Downloads’.

It seems like an interesting loophole of sorts, where ordering by any string will still give you a query result, while attempting to query using a blank/incomplete ORDER BY statement will return nothing at all.

It may interest you to know that there’s another language, PL/SQL, that allows for variable creation and use, in addition to other more robust language features. I just learned about it recently and found it quite cool!

No, the SQL query SELECT name, downloads FROM fake_apps GROUP BY price; is not valid because the GROUP BY clause is grouping the rows by the “price” column, but the “name” and “downloads” columns are not included in either the GROUP BY clause or an aggregate function.

When using GROUP BY, you must include all selected columns in the GROUP BY clause or in an aggregate function. Otherwise, the database system will raise an error.

If you want to group the rows by “price” and include the “name” and “downloads” columns in the result, you can use an aggregate function on these columns, such as SUM, AVG, or MAX. For example, the following query calculates the total downloads and maximum name for each price:

SELECT price, SUM(downloads) AS total_downloads, MAX(name) AS max_name
FROM fake_apps
GROUP BY price;

This query groups the rows in the “fake_apps” table by the “price” column and calculates the sum of downloads and the maximum name for each group. The resulting table will have three columns: “price”, “total_downloads”, and “max_name”.

@supriya.falle @guiwald it’s a poor example. When you don’t specify the aggregate function it’s the same and for strings just the first name that fits the group by column (in this case, first name from each price) is displayed.

@g.thirunavukkarasu I think you need to ask this elsewhere. There’s not enough information shared from your code - the best I can think of is that you are summing ID (which is a weird choice) and the columns you are grouping by does not make logical sense. Movie name and release year are not really something you would aggregate by group, especially by user rating. What are you trying to achieve here?

@rinshad somewhat but not quite. GROUPBY does return distinct rows of the column you are grouping by, but better than DISTINCT in that it would also perform aggregate functions at the same time, like taking the sum, average, or find the max or min values within that distinct entry among all rows.

1 Like