Unexpected output when querying using a LIKE operator

Hi there,

I came across an unexpected output when writing a query for the last question of Module 3 Project 2 of Analyze Data with SQL

In the Image below you can see that I’ve added a GROUP BY clause so that I view the total amount of rows for each keyword rather than DISTINCT rows that just contain the word ‘gold’ OR ‘silver’.

What I don’t understand is how it has given me both the gold and silver groups in the output from just adding [ medium LIKE ‘gold’]. I probably would overlook this but since the syntax of coding languages normally has to be so specific, I cant help but think I’ve overlooked something

Sorry for the weird question, its late my brain is quite foggy :sweat_smile:.

I seem to be getting to a point where the amount of of different clauses I’ve learnt is making the process of understanding all the possible combinations they could go together incredibly difficult.

When you use two % signs (wildcards) at the beginning and end of the item in the LIKE clause that will match any values that have the word gold in any position.

Also, the question prompts the learner to write a CASE statement, as noted in the Hint:

SELECT CASE
   WHEN medium LIKE '%gold%'   THEN 'Gold'
   WHEN medium LIKE '%silver%' THEN 'Silver'
   ELSE NULL
  END AS 'Bling',
  COUNT(*)
FROM met
WHERE Bling IS NOT NULL
GROUP BY 1
ORDER BY 2 DESC;

And, yes, I totally know what you mean about the process of understanding everything that you’re learning about SQL. It can get to be overwhelming at times. Just try to break it down into smaller bits and really understand what the questions are asking of you. Make a drawing too if that helps you understand the query.

1 Like

Adding LIKE to the GROUP BY is probably not what you want. @lisalisaj covers what I believe is the correct way to approach this.

I’m not 100% sure how this query actually works tbh. I think that what happens is the LIKE expression returns a boolean value based on whether a medium contains a match or not (so 0 or 1). So matching gold for example happily matches all the mediums containing '%gold%' somewhere. I think every medium that doesn’t match gets grouped together under an effective False or 0. Since you’ve already removed most of the items with the WHERE clause the items have silver or gold in them somewhere and the COUNT is roughly correct.

So your set of grouped mediums might be-

`gold`
`gold with diamonds`
`silver with gold inlay`
`bronze, silver artwork`

The LIKE effectively turns these into 0 or 1, so LIKE '%gold%' would instead group-

1
1
1
0

Which is only a two group set, 1 or 0.

So you have a result consisting of two groups and the fact you have two is due to LIKE producing 1 or 0 and nothing to do with your WHERE clause (your WHERE clause could be removed entirely and there would still be two group assuming there was at least one match and one non-match).

I think the medium name simply uses the first matching column (you can see this behaviour with a normal GROUP BY) and it just so happens to be silver on its own (the filtering makes this more likely).

For the following query-

SELECT
  medium, COUNT(*)
FROM
  met
GROUP BY medium LIKE '%gold%'
ORDER BY COUNT(*) DESC;

the result is:

medium COUNT(*)
Pressed glass 3907
Gold 41

If you change your WHERE filter, e.g WHERE medium LIKE '%gold%' OR medium LIKE '%silver%' OR medium LIKE '%iron%' you’ll find the gold count remains identical whereas the other count increases but the name is not longer 'gold' (at least in the example above). The fact there are still two columns is, I believe, down to LIKE being 0 or 1.

Output from example in which WHERE clause was changed…

medium COUNT(*)
Gold 41
Brass, iron 459

Note that the medium isn’t ‘iron’ on its own but whatever the first matching medium happened to be.

There’s one quirky bit in this query normally in that a few items contain both gold and silver so the numbers can change based on how you organise the query. For exactly how GROUP BY works with LIKE you’d need to search the web or dig deep in the docs- https://www.sqlite.org/lang_select.html#resultset because I’m kinda guessing at this point (although I think it’s a half-decent guess).

When I run your code, I get the following:

You can see that the count is incorrect.
(The results above it are using the CASE statements).
So, with the WHERE clause as you have it written—the items are grouped by gold AND gold + other mediums (and using ORDER BY w/the aggregate)