COUNT(*) AS ‘count’ is the second column in the output table.
We want to sort base on popularity, so we sort based on the count, descending order.
To do this, we use ORDER BY. And now we need to reference the second column, which has all the count.
We can reference it with the number 2 - second column. Try to replace this with 1, and you will see it is sorted alphabetically because 1 refers to the first output column, or the names.
Also try to replace 2 with ‘count’ (without the quotes) - it still works. This is because the column is named ‘count’. Remember the first line, AS ‘count’. This is the name of the column.
Hope this helps.
Edit: It is called a column reference. Read more online.
As long as each ‘first_name’ belongs to a real user and at the same time each user in the table ‘users’ has a unique id, why in the solution of the exercise we don’t simply write :
COUNT (id) FROM users instead of COUNT(*) AS ‘count’ FROM users ?
I suppose it would be simplier and more precise. Moreover I think that actually, this is the usefullness of the ‘id’ parameter , to count the number of users, no matter what the statement is.
What I mean is that COUNT(*) always gives us the total number of rows in a table (correct me if I’m wrong) while COUNT(id) may not (where id is supposed to be the primary key).
Whenever we get a new dataset, we should first check if both COUNT(*) and COUNT(id) give the same number of rows. If not, id is not really the primary key. Check the data codebook and/or assign unique id numbers to the rows where id is NULL.
Let me know if I miss something here. I’m eager to learn from a different viewpoint.