How to Insert on Duplicate Update in PostgreSQL

Bilal Shahid Feb 02, 2024
  1. the ON CONFLICT Clause in PostgreSQL
  2. Use Either a Custom Made Trigger or Function to Implement Upsert in PostgreSQL
  3. Use Sequential Statements (Less - Recommended) to Implement Upsert in PostgreSQL
  4. the COMMON TABLE EXPRESSIONS (CTE) to Implement UPSERT in PostgreSQL
How to Insert on Duplicate Update in PostgreSQL

This article will introduce the INSERT ON DUPLICATE UPDATE in PostgreSQL. What does INSERT ON DUPLICATE UPDATE mean? And why is it used?

Whenever you INSERT records into a table, you add a data set with a unique PRIMARY KEY. It could be an ID or an auto-generated number that is different for each data set.

Let’s suppose you insert a record with a PRIMARY KEY that conflicts with a data set already present with that PRIMARY KEY.

In that case, you have two options. Either update that row with your new data set or leave the update and keep the original.

What you tend to choose depends on how you want to sort your data and keep it in your database.

the ON CONFLICT Clause in PostgreSQL

Upsert, the short term for UPDATE ON INSERT VIOLATION, can be implemented in PostgreSQL in the following query. First, let’s suppose we make a table for CAR and insert a row.

create table car(
	id int PRIMARY KEY,
	owner TEXT
);

insert into car values (1, 'John');

Now let’s go ahead and try inserting a value (1, 'Mark') into this table. What do you think will happen?

The first thing that will happen is that there will be a VIOLATION error like the one given below.

Output:

ERROR: duplicate key value violates unique constraint "car_pkey" DETAIL: Key (id)=(1) already exists.

Now, do you understand why the PRIMARY KEY violation happens? Because Mark also has key 1 already present for John.

To avoid this, we’ll use the ON CONFLICT clause.

insert into car values (1, 'Mark')
on conflict (id) do update
set id = excluded.id,
	owner = excluded.owner;

Now, whenever our query finds a conflict, it will UPDATE the row present in the table for that conflicting PRIMARY KEY and set the ID and the OWNER of the pre-existing data to these new keys.

But, what is that EXCLUDED table after all? The EXCLUDED table represents the rows proposed for insertion, as specified by the PostgreSQL documentation.

Thus, we use the EXCLUDED table to get the rows pushed from the INSERT query and then use them.

Suppose you don’t want to do anything about the violation and go about keeping the original. You are better off using this syntax.

insert into car values (1, 'Mark')
on conflict (id) do nothing;

The DO NOTHING ignores any changes possible.

To perform these same queries in psql, use the above statements but capitalize the KEYWORDS. Never write INSERT as insert, or you’ll run into an error.

Use Either a Custom Made Trigger or Function to Implement Upsert in PostgreSQL

Here, we will explain how to make a FUNCTION that will fire off whenever INSERT or UPDATE is called for implementing UPSERT. Similarly, you can make a TRIGGER by changing the return type to TRIGGER.

create or replace function upsert_imp (idt int, ownert TEXT) returns void as
$$
	Begin
		loop
			update car set owner = ownert where id = idt;
		if found then
			return;
		end if;
		begin
			insert into car values(ownert, idt);
			return;
		end;
		end loop;
	end;
$$ language plpgsql;

The function is pretty simple, with a loop that runs for checking the ID for each row; if it matches the parameter given, then it returns the table after UPDATE; else, it INSERTS.

You can call the above like this.

select * from upsert_imp(1, 'Mark');

While making a TRIGGER, make sure to replace LOOP with a FOR or a valid IF check so that it does not spin indefinitely in violation of a condition.

You can use the UPDATE call but combine it with the following INSERT statement to work.

insert into car values(4, 'Toyota Supra') on conflict do nothing;

You may even use a NOT EXISTS clause, but this easy statement works fine. It will either INSERT the row if there is no duplication or skip entirely.

insert into car select 2, 'Toyota Supra' where not exists (Select 1 from CAR where id=2);

However, there’s a condition. In many systems, there could be a RACE condition.

If someone DELETES a row while you INSERT a row, your row will be lost. For that, you can use a TRANSACTION.

Enclose the INSERT statement with a BEGIN and COMMIT to ensure it is now a TRANSACTION.

begin;
insert into car select 2, 'Toyota Supra' where not exists (Select 1 from CAR where id=2);
commit;

However, PostgreSQL already has implicit BEGIN and COMMIT added to each statement, so there is no need for explicit declarations.

the COMMON TABLE EXPRESSIONS (CTE) to Implement UPSERT in PostgreSQL

First of all, what are COMMON TABLE EXPRESSIONS?

The CTE works as a temporary table inside a QUERY to store values that may be used later on. It behaves like the NEW table used in TRIGGER.

The query can be written as follows. We first define WITH, which creates a temporary table with the values 2, Jackson and 5, Maria.

Then these values are passed into the UPSERT query where it creates NV from the NEW_VALUES table, and if they turn out to be already present in the CAR table, it updates the values accordingly.

WITH new_values (id, owner) as (
  values
     (2, 'Jackson'),
	 (5, 'Maria')

),
upsert as
(
    update car
        set id = nv.id,
            owner = nv.owner
    FROM new_values nv
    WHERE car.id = nv.id
    RETURNING car.*
)

This returns a CAR.*, meaning all the rows from the CAR table back.

To call the above:

INSERT INTO car (id, owner)
SELECT id, owner
FROM new_values
WHERE NOT EXISTS (SELECT 1
                  FROM upsert up
                  WHERE up.id = new_values.id);

Here, we check if the table created in the UPSERT when it returns the CAR.* contains the value already; if it doesn’t, the INSERT INTO works. However, if it does, the UPSERT already handles the modification inside.

a Note on ON CONFLICT DO UPDATE in PostgreSQL

As with each solution, some problems tend to interfere either with performance or space or the coding efficiency in general.

In our queries, whenever we call the ON CONFLICT DO UPDATE, we can assume that behind this, in our system, there can be multiple users accessing the database and issuing the commands at the same time.

So if USER 1 issues INSERT DO UPDATE simultaneously as USER 2, there will be a VIOLATION error again. Why?

Because when USER 2 calls the INSERT, it does not know whether the row exists or not at that time; hence it comes into a conflict. Both users issue the queries simultaneously, and the database runs them concurrently.

This sort of issue is called a RACE CONDITION, and you can read up on them later if you like.

Along with the RACE CONDITION come different methods to solve this issue. People argue that a TRANSACTION could be a possible answer.

But even if the TRANSACTION guarantees separating the query, it won’t ensure that the query is safe from VIOLATION. Why?

If USER 1 TRANSACTS is its query, you can’t guarantee that once the TRANSACTION runs, it will lock the query and won’t let the others proceed. Because TRANSACTIONS only prevent the changes from being shown to other users and just the option of abandoning them and rolling back.

Even if you use SERIAL TRANSACTIONS, which claims to run QUERIES in order, there can be errors such as the one below.

commit failed: ERROR:  could not serialize access due to read/write dependencies among transactions

Hence, you are better off using ADVISORY LOCKS and SAVE POINTS. ADVISORY LOCKS tend to lock, preventing your queries from interfering and working effectively.

Even though they might reproduce errors, but still have a very low tendency to do so.

SAVE POINTS, on the other hand, are contained inside TRANSACTIONS. You can use these SAVEPOINTS to revert to your changes whenever you want to abandon a change during a TRANSACTION and rollback.

So in case an INSERT fails, for example, you just don’t run past the error, you can now roll back to the SAVE POINT, perform the UPDATE, and then go about working as usual.

We hope you’re now clear on the basics of UPSERT and have an in-depth understanding of how it works and the possible CONSTRAINTS.

Our team works effectively to deliver the best of both worlds to you in terms of knowledge and efficient coding algorithms. We believe that the better you learn the nooks and crannies of something, the more effective you will be in delivering solutions and working around them.

Author: Bilal Shahid
Bilal Shahid avatar Bilal Shahid avatar

Hello, I am Bilal, a research enthusiast who tends to break and make code from scratch. I dwell deep into the latest issues faced by the developer community and provide answers and different solutions. Apart from that, I am just another normal developer with a laptop, a mug of coffee, some biscuits and a thick spectacle!

GitHub

Related Article - PostgreSQL Upsert