Multiple Tables with Reddit

Hi everyone!
I have a pretty dumb question, but I really need help to unstuck. I suppose I don’t get the general idea of multiple join, so I can’t understand ON which attributes I have to join tables here: https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-join-data/modules/analyze-data-sql-practice-joins/projects/sql-multiple-tables-with-reddit

My code:

WITH popular_posts AS (
SELECT *
FROM posts
WHERE score >= 5000
)
SELECT subreddits.name, popular_posts.title, popular_posts.score
FROM subreddits
INNER JOIN popular_posts
ON ??? = ???
ORDER BY popular_posts.score DESC;

Hi and welcome to the forums!

The general idea with a properly normalised database is that every table, usually except one should have a foreign key that allows it to link up correctly the the base tables primary key. This means that we can just join on these ID’s and guarantee that we’re getting the right info into the right place. So if we do a quick inspection of the two tables you are trying to join here, subreddits and posts, we see the column names as follows:

  1. subreddits
    • id
    • name
    • created_date
    • subscriber_count
  2. posts
    • id
    • title
    • user_id
    • subreddit_id
    • score
    • created_date

We can see that in posts, we have a column named subreddit_id. This is the foreign key we require for linking to the subreddits table, and the idea is that if there’s a post in the posts table, and it has a subreddit_id of 7, then this tells us that the subreddit with id of 7 in the subreddits table is where that post came from. So when performing joins, we want to find these common columns and join on those. Therefore our join query would be:

FROM subreddits
INNER JOIN popular_posts
ON subreddits.id = popular_posts.subreddit_id

This tells the SQL query builder if it finds a number 7 in subreddit_id, this corresponds to the number 7 in subreddits.id. And thus the join should perform correctly. This is the first 5 results of that query:

name title score
aww Picture of a kitten 149176
programming Running DOOM on a toaster 143728
news Promising advances made toward cure for cancer 136532
programming Codecademy releases their new database courses 133728
IAmA I am Gill Bates, founder of Macrohard. Ask me Anything. 96731

We can see that “Picture of a kitten” has matched to “aww” and “Running DOOM on a toaster” has matched to “programming”, so this gives us a good idea our join has performed correctly. It’s not always as easy as that in the real world (I work with medical data and it’s really not that simple without a microbiology background) but in this dataset it would be pretty obvious if there was a mismatch.

Hopefully that explains the join and helps you understand it a bit better, any other questions let me know!

4 Likes

Thank you so much for such a detailed answer and explanation!

1 Like

I am having the same problem this is what I came up with and its still not returning any results
WITH popular_posts AS(

SELECT *

FROM posts

WHERE score >= 5000

)

SELECT subreddits.name, popular_posts.title, popular_posts.score

FROM subreddits

INNER JOIN popular_post

ON subreddits.name = popular_posts.title

ORDER BY posts.score DESC;

Even when I remove the with clause it shows the query however it doesnt show any results what am i doing wrong

I’d be careful with table names here, is it popular_post or popular_posts (check each instance) and same for ordering (is the posts table actually included in this part of the query or just a temporary table with a different name?).

If you’re posting code to the forums it’s well worth viewing- How do I format code in my posts?. It’ll stop the forums marking up your code (which can mess up alignment and remove certain characters).