VR Startup Company - Q4

Exercise

Hello! I have a question about Question 4 in the VR Startup Company exercise. The prompt is to list the names of projects that were not chosen by any employees. My code is:

I don’t understand why the code needs the “WHERE current_project IS NOT NULL” statement to run correctly. There are no null values for project_id, so why does it matter if null is included in the list of values that the project ids are being compared against? The code does not run at all without that statement.

I was also wondering if there is a log somewhere that I’m missing, that would really be helpful… Thanks for the help!

5 Likes

The solution is essentially worked backwards. This code is doing two things:

  1. The first WHERE clause is finding all of the projects that have been taken by an employee. The WHERE current_project IS NOT NULL clause is finding all of the projects which have an employee; the IS NOT NULL makes sure only the projects that are being worked on are selected.
  2. All the projects which do not meet the above description are the selected (WHERE project_id NOT IN…); all the projects which have not been chosen by an employee.

I hope this helps!

I was wondering the same thing. When I change the code to say ‘WHERE project_id IN (…)’ I don’t have to include it. Why is it needed for NOT IN but not for IN?

I’m confused as well. Here’s my code:

SELECT project_name
FROM projects
LEFT JOIN employees
ON projects.project_id = employees.current_project
WHERE employee_id IS NULL;

I got the same answer, but I’m assuming my logic is faulty. I didn’t even think to use a subquery. What type of questions should I ask myself to determine if an IN/NOT IN would be a suitable clause to use?

Your logic is correct.

Subqueries often achieve what joins do. But they work in different ways is all.

If we try to query whether project_id exists in current_project, then SQL will try to evaluate, say:
is 3 = NULL
which will return False, and SQL will move onto the next row.

On the other hand, asking whether given project_id DOES NOT exist in current_project runs into the following problem:
is 3 <> NULL
which is not knowable - and SQL does not return anything back.

2 Likes

Your response makes the most sense among the answers here for me, but honestly it’s still unclear.
I mean, why does only “NOT IN” run into this issue?

If I write a simple code, e.g.

SELECT *
FROM employees
WHERE current_project <> 3;

or

SELECT *
FROM employees
WHERE current_project > 3;

…then shouldn’t this run into the same problem as the “NOT IN” case you explained? Shouldn’t in that case it also be uncertain whether the null value is bigger/smaller/not equal etc.?
However, with the above code I still get an output… why?

You’re right, any comparison with Null will be uncertain and should give a False always, i.e, 1>Null, 1=Null etc., should be a False. But 1 <> Null will be a special kind of a problem! (The short version is that the Not operator will force the evaluation to True )

Consider a <> comparison with Null: is 1 <> Null?
let’s rewrite this as: is Not 1 = Null?

We know that 1 = Null should give us a False. We also know that ‘Not False’ evaluates to ‘True’. And since any comparison with Null should ideally only be giving us a False, SQL simply does not return anything back.

Note that the way SQL handles Null values can be changed, but I think this explanation holds for this example atleast. Hope that helps!

2 Likes