Paths/analyze-data-with-sql/tracks/analyze-data-sql-calculate-and-summarize-data/modules/analyze-data-sql-practice-aggregate-functions/projects/sql-the-met

https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-calculate-and-summarize-data/modules/analyze-data-sql-practice-aggregate-functions/projects/sql-the-met

here is my query:
SELECT * FROM met WHERE department = ‘American Decorative Arts’

whereas the suggested answer is
SELECT COUNT(*)
FROM met;

Where is my misunderstanding coming from? The second outputs less entries/rows.

Which step are you talking about?

Step 2, mainly curious why the second answer has around 3.9k compared to my first query using WHERE, which contains 4.2k rows or entries

There’s only 3948 rows in the database so 4K+ seems a little off. Your query doesn’t actually utilise a count so what are you basing that number on?

I think you might be mixing up the id numbers which I’d assume are not guaranteed to be incremental, so you have id’s like 1, 2, 5, 6, 10 or something along those lines. Stick with count for the actual row count :slight_smile:.

My first query should only return a row which contains the string specified in the WHERE. So I’m curious how it is generating extra rows? Ok, I checked and it seems IDs are not preferential (it seems they are not the PRIMARY key, or the database is generating extra entries somehow). Seems weird but yea, I should definitely use COUNT.

To verify, I ran:
‘SELECT COUNT(*) FROM met WHERE department = ‘American Decorative Arts’;’

which returned the 3.948k expected. I’m still curious why the database has extra entries however using the first query.

You could modify this table and insert a new row with an id of 50,000 and it’d be perfectly valid. As your test showed, when you actually use COUNT you get the correct number of rows. There are no “extra entires” but the id column is probably not a simple incremental sequence.

I scrolled through and it seemed sequential. I believe it is due to COUNT not counting NULL values, so if any data was null or not entered, it would likely be treated as null (that still doesn’t really make sense but I’ll accept that for now cause I’m spending a bit too much time on this simple problem).

You can have a wee web search if you like on how to check properly but even in the first few rows returned you have very visible gaps, be very careful when making assumptions about data-

14 	American Decorative Arts 	Coin 	Five-dollar Liberty Head Coin 	Christian Gobrecht 	1907 	Gold 	
33 	American Decorative Arts 	Bust 	Bust of Abraham Lincoln 	James Gillinder and Sons 	1876 	Pressed glass 	United States
1 Like

Good point. I was thinking of writing a python script that analyzes the id’s but I suppose I should think smarter, not harder lol. Thanks for the help!

1 Like