Can a WHERE clause be applied with a HAVING statement in the same query?

Question

Can a WHERE clause be applied with a HAVING statement in the same query?

Answer

Yes, you can absolutely apply a WHERE clause in a query that also utilizes a HAVING statement.

When you apply a WHERE clause in the same query, it must always be before any GROUP BY, which in turn must be before any HAVING.

As a result, the data is essentially filtered on the WHERE condition first. Then, from this filtered data, it is grouped by specified columns and then further filtered based on the HAVING condition.

Example

/* 
This will first filter the movies with a box_office > 500000.
Then, it will group those results by genre, and finally restrict
the query to genres that have more than 5 movies.
*/

SELECT genre, ROUND(AVG(score))
FROM movies
WHERE box_office > 500000 
GROUP BY genre
HAVING COUNT(*) > 5;
17 Likes

This information really helps! :partying_face:

7 Likes

This helped a lot Thanks

2 Likes

@jephos249 Thanks a lot … :v:

1 Like

why would COUNT (*) > 5 works , when * calls for all columns?

is it because the data is already restricted to ‘genre’ because of the ‘GROUP BY’ line?

Would 'HAVING COUNT (genre) > 5 provide the same result?

4 Likes

Hi.

  1. From my point of view, “COUNT(*)” is properly working especially because of its asterisk.

Because COUNT only shows us rows with not empty values. And if in some column there had been an empty value, and we used that column as an argument to COUNT (as in “COUNT(some_column)”), then that row wouldn’t have been counted and our result would have been spoiled. But with asterisk we would check all the columns for values. And add +1 for the whole row if we find some.

  1. Yes, also it is working because of our use of grouping.

Without “GROUP BY”, in output there would be only one row with some not expected values.

  1. Yes, as i’ve mentioned before, it would provide the same result only if in the “genre” column there wouldn’t be any empty values in the cells. Otherwise, the program will just not see those rows.
6 Likes

@jephos249 it helped a lot

Does anyone know why " The total number of apps at each price point would be given by COUNT(*) . "?

COUNT just counts the number of rows, since you’ve grouped them up already you’re simply counting the number of rows at each grouped price point, giving you the total number of apps at each price point.

2 Likes

Just to make sure I understand - a genre with a movie of less than 500,000 would not be counted toward the at least 5 movies per genre right? becaue it already filtered those out from the where statement. Thank you!

Useful information … :slightly_smiling_face:

@dancemaster that’s correct. The order as they said in the OP goes WHERE, GROUP BY, HAVING. Any rows filtered out by WHERE will not be included in the aggregate functions performed through GROUP BY.

hello dear Friends, i think in my P.O.V COUNT() > 10 doesn t make sense here (because the number of 10 has no meaning : 10 of what?). Alternatively, according to the given “fake_apps” example, since we wnat to exclude groups that have less then 10 apps, i think we should rather use COUNT(name) >10 insteed of COUNT() >10.

SELECT price,
ROUND(AVG(downloads)),
COUNT(*)
FROM fake_apps
GROUP BY 1
HAVING COUNT(name) > 10;

Hi @abdouhaji,

For that specific exercise, your idea of using HAVING COUNT(name) > 10 works great, and you’re right that you will get the exact same result, but only because all apps in the fake_apps table have a NON-NULL name:

SELECT * FROM fake_apps WHERE name IS NULL;

--Total rows: 0

If there was an app stored in that table with a NULL value for the name column, your solution would lead to a result that is different from when using HAVING COUNT(*) > 10. But why?

This is because COUNT(*) will tell us the total number of rows, independently of any NULL values, whereas COUNT(name) will tell us how many NON-NULL values are stored for the name column (which could be different than the total number of rows of the fake_apps table).

I just wanted to add this comment here to highlight that the results could be different if the dataset of that exercise was different. In our projects, the best solution will always depend on what we want to analyze. It is up to us to decide the better approach: “should I use HAVING COUNT(name) > 10, or HAVING COUNT(*) > 10? Does it make a difference?”.

Regards from Brazil

1 Like