Can we apply DISTINCT to a SELECT query with multiple columns?

Question

Can we apply DISTINCT to a SELECT query with multiple columns?

Answer

Yes, the DISTINCT clause can be applied to any valid SELECT query. It is important to note that DISTINCT will filter out all rows that are not unique in terms of all selected columns.

Feel free to test this out in the editor to see what happens!

Example

Let’s assume that in the Codecademy database there is a table bugs which stores information about opened bug reports. It might have columns like course_id, exercise_id, reported_by, reported_date, report_url, etc. For the purpose of this example, let’s say that this is our table:

id			course_id	exercise_id	reported_by
1			5			4			Tod
2			5			4			Alex
3			5			3			Roy
4			5			4			Roy
5			7			4			Alex
6			7			8			Tod
7			14			2			Alex
8			14			4			Tod
9			14			6			Tod
10			14			2			Roy

Community Manager would like to know the names of the users who have reported bugs in order to send them a special Thank You note. We can use a SELECT query with DISTINCT keyword to pick unique values from the reported_by column:

> SELECT DISTINCT reported_by FROM bugs;

reported_by
Alex
Tod
Roy

Awesome! Exactly what we were expecting!

Our coworker would like to know in which exercises bugs have been reported. This gets trickier because now we have to query two columns: course_id and exercise_id. Let’s try to use the same approach as before:

> SELECT DISTINCT course_id, exercise_id FROM bugs;

course_id	exercise_id
14			2
5			4
14			4
14			6
5			3
7			4
7			8

Is this the result we were hoping for? Yes. It is true that there are duplicated values in the course_id and exercise_id, but every row is unique (there are no two rows with the same value in course_id and exercise_id).

19 Likes

Lukerab is right. I’m testing in this exercise and here’s what happens.

SELECT DISTINCT year, genre
FROM movies;

I have a 2010 for action and a 2010 for comedy.
I have duplicates of the genre, and duplicates of the year, but only if the combination of genre/year is distinct.

This leads me to wonder how I can get additional columns without having the DISTINCT apply to them.

I tried variations of the following query and can’t get any results:

SELECT name, imdb_rating, DISTINCT genre
FROM movies;

I’m not sure why it’s not working to use DISTINCT in this format.

If do this:
SELECT DISTINCT year, genre, name
FROM movies;

Then I get multiple 2008 Action movies because they have different names.
The combination of all 3 columns is distinct because the name differs.

Does anyone know how to display an additional column, like name, without it being part of the “distinct”? Like if I want to see all the names of movies that have a distinct combination of year/genre for example?

2 Likes

Does anyone know how to display an additional column, like name, without it being part of the “distinct”? Like if I want to see all the names of movies that have a distinct combination of year/genre for example?

For that, you would just make another query, because you don’t want the data to be distinct.

3 Likes

Hello Myke. I had the same doubt you have. You can’t do what you want to do, because when you use “distinct”, you group the rows, so we can’t know from wich row to take the data, when you try to ungroup it. eg:

you have the following table:

name, year

test name, 2010
other name, 2010
another name, 2011
another, 2012

if you select distinct by year, you’ll have:
2010, 2011, 2012

if you select distinct by name and year, you’ll have all the data, since there are no duplicates.

Now, let’s imagine we can do what you’re trying to. Select distinct only for one column, like year. The return would be something like this:

test name, other name, 2010
another name, 2011
another, 2012

That’s not possible, because you can’t have multiple values for a single field value. You can’t have two names.

I don’t know if you still have this doubt, but I hope it helps someone.

I think for this you need to use GROUP BY statement
like:

SELECT *
FROM movies
GROUP BY YEAR, GENRE

It outputs the same as the distinct except you can include other columns without it having distinct feature.
With this you can see all other columns, you can even select specific columns like:

SELECT name, imdb_rating
FROM movies
GROUP BY genre