How to Use NOT IN With Subquery in PostgreSQL

Bilal Shahid Feb 02, 2024
  1. Use the NOT IN Operator With Subquery in PostgreSQL
  2. Use the NOT EXISTS Operator as a Better Alternative
How to Use NOT IN With Subquery in PostgreSQL

The NOT in the NOT IN reverses what would result from simply using the IN operator. The right-hand side of the NOT IN operator has to have a subquery where more than one column is returned to check if the expression matches the data or not.

NOT IN tends to return true if the expression is not found in the subquery data returned.

Let us go ahead and try to understand some common problems and issues faced by users when using the standard SQL NOT IN in PostgreSQL.

Use the NOT IN Operator With Subquery in PostgreSQL

In PostgreSQL, if you use NOT IN to ensure that none of your expressions matches a particular set of data, you must ensure no NULL value is in the subquery data returned.

What does that mean exactly?

Let’s go ahead and try to understand this with the help of an example. We will be creating a table HORSE with two columns; ID and Color.

CREATE TABLE horse (
    ID int PRIMARY KEY,
    Colour TEXT
);

Now let us insert some values as well.

INSERT INTO horse values (1, 'black'), (2, 'brown'), (3, 'white');

Let us go ahead and CREATE another table for RIDER.

CREATE TABLE rider (
    ID int PRIMARY KEY,
    horse_id int
);

You can run any of the codes given above in PGADMIN and PSQL.

Let us insert some values in both tables:

INSERT INTO horse values (1, 'black'), (2, 'brown'), (3, 'white');

INSERT INTO rider values (1, 1), (2, 2), (3, 4)

Here, you can see that the id 3 in the RIDER has the horse 4, which does not exist in the HORSE table. We have done this to ensure the use of NOT IN in our example.

Suppose we want to trim out this RIDER 3 from our RIDER table. How do we do that?

select * from rider
where horse_id not in (select id from horse)

Output:

sample_tab

Now, what if, instead of the clean values in the HORSE table, we also had some nulls. Let us modify our INSERT statement for the HORSE table.

INSERT INTO horse values (1, 'black'), (2, 'brown'), (3, 'white'), (NULL, NULL);

On a side note, when INSERTING NULL values in the PRIMARY KEY column, remove the PRIMARY KEY constraint from the table to allow NULL insertion.

Here, our NULL HORSE has a NULL Color, so when we run the query for NOT IN as above, we get the following:

Output:

sample_tab2

So what just happened? Wasn’t it supposed to return the ID 4 as that doesn’t exist in the HORSE table?

Well, let’s understand how the NOT IN works. The NOT IN operator works using the AND operator. If all the rows to be searched return true, it will return true.

So something like this would substitute the NOT IN:

NOT IN (ROW 1) AND NOT IN (ROW 2) AND NOT IN (ROW 3) .....

In our case, where the NOT IN returns true for all the first three data sets, it won’t return any value for the NULL column as the PostgreSQL documentation quotes:

If all the per-row results are either unequal or null, with at least one null, then the result of `NOT IN` is null

Having a NULL return from the NOT IN would render all other true’s false, so our table returns nothing.

How do we solve this?

The first methodology would be to prevent any NULL insertions into the table. Still, that is useless if we already have tables created in the database and now want to run queries to get data.

Hence, we have to look at other solutions to solve this effectively.

Use the NOT EXISTS Operator as a Better Alternative

Let’s use the following query:

select horse_id from rider r
where not exists
(select id from horse h
where h.id = r.horse_id)

This tends to finally return the value 4 to us even though nulls are present and is an effective strategy. Let’s see how it works.

The EXISTS clause returns true if the subquery returns anything, meaning any single row, and false otherwise. So when we want to find the HORSE that is missing, we tend to return the values from the HORSE table equal to the IDs in the RIDER table.

The subquery returns more than one row, and EXISTS becomes true, making NOT EXISTS FALSE.

Eventually, our final query selects the HORSE_ID from the RIDER that is not equal to the IDs provided from the HORSE table. In our case, that is 4, and thus our query works perfectly.

However, NOT EXISTS has caused performance losses when used against the NOT IN operator.

select horse_id from rider
full join horse on rider.horse_id = horse.ID
where horse.ID is null

Output:

sample_tab3

So it returns a NULL and our value that is not found; 4. How?

When we do a full join on the condition that both IDs are similar, it also tends to return the rows that aren’t similar. It will return the set (NULL, NULL) from HORSE and (3, 4) from RIDER as they both are unmatched.

Hence we can use this to our benefit and write the NULL condition at the end to return these unmatched rows.

When we write that the Horse.ID is NULL, it will select the HORSE_ID from the rider that is NULL. In this case, the first set (NULL, NULL) is included; so is the set (3,4). Why?

This set does not contain a NULL but is also unmatched. So our FULL JOIN also sets NULL values for this in its returning table.

Hence, we get this value output to us as a result.

Today we looked at implementing the NOT IN operator for values that contain NULL. We hope that this helps you and expands your knowledge base.

Author: Bilal Shahid
Bilal Shahid avatar Bilal Shahid avatar

Hello, I am Bilal, a research enthusiast who tends to break and make code from scratch. I dwell deep into the latest issues faced by the developer community and provide answers and different solutions. Apart from that, I am just another normal developer with a laptop, a mug of coffee, some biscuits and a thick spectacle!

GitHub