PostgreSQL SQL Patterns | AdaptiveGears

Upserting Data Without Duplicates in PostgreSQL

Preferred approach with unique index:

INSERT INTO documents (id, content, created)
VALUES ('abc123', 'content here', NOW())
ON CONFLICT (id) DO NOTHING;

ON CONFLICT requires a unique index or primary key on the conflict column. When no unique constraint exists, use WHERE NOT EXISTS:

INSERT INTO documents (id, content, created)
SELECT id, content, created
FROM source_table s
WHERE NOT EXISTS (
  SELECT 1 FROM documents d WHERE d.id = s.id
);

Deleting Duplicate Rows with ctid

ctid is the physical tuple ID. When rows have no unique identifier, use ctid to keep one and delete the rest:

DELETE FROM documents a
USING documents b
WHERE a.slug = b.slug
AND a.ctid < b.ctid;

Keeps the row with the highest ctid (typically the most recently inserted).

Non-blocking Index Operations with CONCURRENTLY

CREATE INDEX locks the table. CONCURRENTLY avoids the lock but requires AUTOCOMMIT:

DROP INDEX CONCURRENTLY IF EXISTS documents_slug_idx;
CREATE UNIQUE INDEX CONCURRENTLY documents_slug_idx ON documents(slug);

In SQLAlchemy, set isolation_level="AUTOCOMMIT" on the engine.

Checking Index State from pg_index

Query pg_index to check if an index is unique or valid:

SELECT i.indisunique, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname = 'documents_slug_idx';

indisvalid = false means the index is broken (e.g., failed CONCURRENTLY operation). Drop and recreate.

References

PostgreSQL INSERT Documentation