SQL Course - Hacker News Project - Challenge 7

Hi All,

Course Link: Analyse Data with SQL
Section 3, Extra Practice, How to Hack Hacker News, Challenge 6.

Added a screenshot of the solution to this challenge. Can someone please explain the logic behind the COUNT function being there - in that specific line? Haven’t we finished the CASE statement using END AS - What exactly is it counting there?

I understand theoretically it’s supposed to be counting the number of occurrences where either of the conditions is fulfilled - but why there? What’s the logic behind COUNT function being placed there - and what it does?

The CASE statement goes through the values in the rows and filters them by the logic/conditions provided then they(each occurrence) are counted.

Source	              COUNT(*)
Github	                27
Medium	                12
New York Times	13
Other	               3948
3 Likes

Thanks for your reply here.

Would you suggest there’s an alternative way to write this…something that shows the logical flow of things?

Or any alternative way to write at all which would show it more explicitly as to what its counting?

Right I’m looking at the statement and thinking that the computer is going like this…

Case 1 fulfilled? No – Move to next.
Cas 2 fulfilled? Yes - Done! Display output for this condition…

Then all of a sudden count everything.

Is there more explicit way of showing that the computer is counting the outcomes in the CASE STATEMENTS?

Hope my request make sense.

But the question asks one to create 2 columns. One is via CASE statement logic and the other column is count the number of records in the table that have that logic applied to it.

“Add a column for the number of stories from each URL using COUNT() .
Also, GROUP BY the CASE statement.
Remember that you can refer to a column in GROUP BY using a number.”

Each row is evaluated by the condition (if the URL matches) and there’s a running total I guess in the background. COUNT(*) is applied after all the conditions from the CASE statement have run. I hope that makes sense(?)

Thus,

Summary
SELECT CASE
   WHEN url LIKE '%github%'
THEN 'Github'
   WHEN url LIKE '%medium.com%'
THEN 'Medium'
    WHEN url LIKE '%nytimes.com%'
THEN 'New York Times'
    ELSE 'Other'
  END AS 'Source', COUNT(*)
FROM hacker_news
GROUP BY 1;

Is there another way of writing the COUNT statement then?

Usually, instead of writing COUNT(*) all - I write COUNT(the specific column where it needs to be counted)

Is it possible to give a name to the coulmn generated using the CASE statement - suppose give it a name and refer it into our COUNT function?

Basically, is there a way to refer the column generated by the case statement and count the number of values…where its GITHUB WHERE = URL etc

Irrespective of the question - I was hoping to write this better to explicitly say its counting the values in the column generated by the CASE statement (And of course, what specifically I am counting)

the CASE statement & COUNT(*) aggregate function are two separate operations.

1 Like

yes, i re-read multiple times and looked at the syntax. I think I’m understanding where I’m getting wrong.

Can you have a look at this code I wrote - It’s not working - but could you explain why it’s not?

Explaining in english as to what my syntax says, display a list of titles GROUPED by the case statement.

I thought this would give me a column 1 = which is source and the second column as a count of titles matched with the corresponding value which case statement gave to them in the table (by looking at their URL)

What went wrong here?

Screenshot 2022-02-04 at 12.43.53 AM

You’re also using WHERE instead of WHEN.
The way you have that written, you’re looking at the title column and then there’s a CASE statement with conditions applied to it (applied to the items in the title col). So, your results will be the titles of the stories and the link of the stories classified as such…

title	                                                                                             Source
                                                                                                     Other
"Climate Calamity" Rejected In Favor Of "Climate Disruption     Other
McConnelling	                                                                            Other
'Oldest' computer music unveiled [2008]	                            Other
10 Marketing Hacks To Boost Your Startup	                           Other
10 best iPad games we can't wait to play                                   Other
1024 by 2012	Other
11 Untranslatable Words From Other Cultures	                 Medium
15 years of Ars: The time-shifts tweets and top speeds changing the world	Other
2015 Ford Galaxy – Release Date	Other

That’s not counting the total number of stories that result from each URL

Screenshot 2022-02-04 at 1.05.25 AM

If I replace title with url - it still won’t work.

By my understanding, I would be looking at the URL column - and then CASE statement applied to it, no?

Applied new code snippet. Where would it be going wrong?

In fact - I just restructured the correct query and it still didn’t work.

Snippet here:

Screenshot 2022-02-04 at 1.07.20 AM

The SELECT statement will have the resultant columns. Why would you want to list the URL and then the CASE statement in the results? Unless you wanted to list the URL for some reason?

Sure, you could use COUNT(*), and then the CASE statement:

SELECT COUNT(*), CASE
   WHEN url LIKE '%github%'
THEN 'Github'
   WHEN url LIKE '%medium.com%'
THEN 'Medium'
    WHEN url LIKE '%nytimes.com%'
THEN 'New York Times'
    ELSE 'Other'
  END AS 'Source'
FROM hacker_news
GROUP BY 2;

Which results in:

COUNT(*)	Source
27	Github
12	Medium
13	New York Times
3948	Other

But, you need to use “WHEN” rather than “WHERE” in your CASE statement…

So sorry there - I think I did a grave syntax mistake there. Thanks for pointing it out.

I’m missing something further - please check the code snippet and the resultant output. The below one is for Codeacademy and the above one is what I typed.

Why do you think I’m getting different values - and moreover NY Times getting missed altogether.

It skipped NYT b/c your syntax is missing the closing “%” at the end of the LIKE:
Should be:
WHEN url LIKE '%nytimes.com%' THEN 'New York Times'

Also, if you do a quick search and look at the URL column, you can see that there isn’t always a URL there.
Ex:

SELECT url
FROM hacker_news
LIMIT 10;

There’s lots of 0’s.

Your number counts are off b/c you’re not counting all rows. Just ones where there’s a url populating that row. Whereas using COUNT(*) includes all values in the rows including duplicates and NULL values.

Also, sometimes when debugging code, it helps to read the code from the bottom, up to the top. Your brain processes it differently and you can catch typos. A second set of eyes also helps. :slight_smile:

2 Likes

Hey,

So I went back and restudied all the topics - turns out there were some understandings that were missing. Thanks for pointing those out - I redid the entire project and also saw the video.

You’re right - debugging code from the bottom helps. And having an extra pair of eyes too - and hence this forum and discord server.

Thanks for the help. I’ve moved to the next project now.

2 Likes