FAQ: Code Challenge: Aggregate Functions - Code Challenge 2

This community-built FAQ covers the “Code Challenge 2” exercise from the lesson “Code Challenge: Aggregate Functions”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Web Development
Data Science

FAQs on the exercise Code Challenge 2

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

What is the count variable and why is it a 2 in this case?

4 Likes

COUNT(*) AS ‘count’ is the second column in the output table.
We want to sort base on popularity, so we sort based on the count, descending order.
To do this, we use ORDER BY. And now we need to reference the second column, which has all the count.
We can reference it with the number 2 - second column. Try to replace this with 1, and you will see it is sorted alphabetically because 1 refers to the first output column, or the names.
Also try to replace 2 with ‘count’ (without the quotes) - it still works. This is because the column is named ‘count’. Remember the first line, AS ‘count’. This is the name of the column.
Hope this helps.

Edit: It is called a column reference. Read more online.

3 Likes

As long as each ‘first_name’ belongs to a real user and at the same time each user in the table ‘users’ has a unique id, why in the solution of the exercise we don’t simply write :

COUNT (id) FROM users instead of COUNT(*) AS ‘count’ FROM users ?

I suppose it would be simplier and more precise. Moreover I think that actually, this is the usefullness of the ‘id’ parameter , to count the number of users, no matter what the statement is.

1 Like

Hi, when we are told:
What are the most popular first names on Codeflix?

Use COUNT() , GROUP BY , and ORDER BY to create a list of first names and occurrences within the users table.

Order the data so that the most popular names are displayed first.

I did:

SELECT COUNT(first_name), first_name
FROM users 
GROUP BY first_name
ORDER BY first_name DESC;

But the answer is:

SELECT first_name, COUNT(*) AS 'count'
FROM users
GROUP BY first_name
ORDER BY 2 DESC; 

Is there something wrong with what I did?

there seems to be a difference:

https://stackoverflow.com/questions/3003457/count-vs-countcolumn-name-which-is-more-correct

but i don’t think you have NULL values.

3 Likes

As discussed in this Stackoverflow thread on “In SQL, what’s the difference between count(column) and count(*)?”, COUNT(*) will include rows with NULL values. Just in case id contains NULL values (it shouldn’t, but it could happen due to the mistake in data entries), using COUNT(*) is safer. That’s my humble opinion.

How so? I think the business logic is going to determine which count operation to use.

What I mean is that COUNT(*) always gives us the total number of rows in a table (correct me if I’m wrong) while COUNT(id) may not (where id is supposed to be the primary key).

Whenever we get a new dataset, we should first check if both COUNT(*) and COUNT(id) give the same number of rows. If not, id is not really the primary key. Check the data codebook and/or assign unique id numbers to the rows where id is NULL.

Let me know if I miss something here. I’m eager to learn from a different viewpoint.

lets say we want the number of people who own a car, so lets say we have column named car, which is nullable, we can do:

select count(car) from persons;

this way, we get how many people own a car.

my point, we can do more with count then just * or id. That is what you are missing.

1 Like

That’s true. Thank you for your answer!

1 Like

and then we haven’t covered relations yet.

this answered my question perfectly. Thanks. so it is Order BY the second column in the new table not the original table. Thanks again

What’s the difference between using DISTINCT and GROUP BY?

There certainly are differences:

https://stackoverflow.com/questions/164319/is-there-any-difference-between-group-by-and-distinct

Why is it incorrect if we count the first name instead of all columns?

This has already been covered in this topic? Like here:

and here:

Capture 04

Porque preciso renomear a contagem de usuários? Isso vale para toda vez que utilizar o COUNT?