Microsoft open sources pg_durable: Durable workflows inside PostgreSQL

I once spent a three-day weekend building a custom job queue. I had a cron job that ran every minute, selected rows with status = 'pending', hit an external API, updated the status to running, then completed, and tried to handle retries by incrementing an integer. It worked beautifully until the database server restarted on Monday night. Half my jobs were stuck in running forever. The other half ran twice because the API timeout happened right as the server went down.

That is the kind of architectural scar tissue that makes you look at a project like pg_durable and feel a weird mix of relief and terror.

Yesterday, coffeemug submitted a GitHub repository to Hacker News with a simple title: pg_durable: Microsoft open sources in-database durable execution. Within hours, the thread reached 341 points and 79 comments. It is a Microsoft project, but not the kind that lives behind Azure enterprise firewalls; it is open-source, written in Rust, and built to run on bare-metal PostgreSQL 17 or 18.

The story in one sentence

pg_durable is a PostgreSQL extension that lets you write long-running, fault-tolerant workflows in SQL, checkpointing execution state between steps so that a database crash or restart resumes exactly where it left off, without requiring any external queues, workers, or orchestrators like Redis, Celery, or Temporal.

A Postgres workflow runs two steps and checkpoints, the server crashes, then it restarts and resumes at the next step from the saved history.

Why this hit the front page

The Hacker News audience has a long-standing, volatile love affair with PostgreSQL. For the last few years, the dominant trend in web development was to split everything into stateless containers, external job workers, and specialized message queues. Your database was supposed to be a dumb, passive storage engine.

But this state of affairs came with a tax. If you wanted to fetch some rows, call an API, and update the database, you had to write app code, setup Celery or Temporal, configure Redis, write a retry loop, and handle network failures by hand.

pg_durable appeals to the part of the developer brain that wants to throw away that entire stack. If the state is in the database anyway, why not run the control flow there too? It fits perfectly into what one commenter called “the year of the Postgres queue” - a quiet rebellion against the complexity of modern cloud architecture.

Under the hood: how pg_durable executes

Unlike a simple plpgsql script that locks up your connection, pg_durable runs as a native PostgreSQL background worker. It exposes a custom SQL domain-specific language (DSL) with operators like ~> (sequential execution) and |=> (fan-out/parallel execution).

Under the hood, pg_durable is built on top of two lower-level Rust libraries:

duroxide - a durable task framework that provides deterministic replay, state checkpointing, timers, and sub-orchestrations.
duroxide-pg - a PostgreSQL-backed state provider that persists this state in a dedicated duroxide.* schema inside the database.

When you run a workflow, it looks like this:

SELECT df.start(
    'SELECT id FROM documents WHERE processed = false LIMIT 100' |=> 'batch'
    ~> 'UPDATE documents SET processed = true WHERE id = ANY($batch)'
);

The background worker runs the query, grabs the output, checkpoints it to the duroxide state schema, and schedules the next step. If your database server restarts mid-batch, the worker reads the duroxide history, sees which steps completed, and resumes. It also handles external API calls durably via df.http(), ensuring that network failures don’t leave your transaction in limbo.

The thread, honestly

The comment section quickly split into a debate on where business logic belongs. The classic application engineering perspective, voiced by junto, was immediate:

“This smells like stored procedures. You can’t unit test it. You can’t version it. Business logic in the database, (hidden brain problem), harder to isolate noisy workloads, no observability, scaling pressure lands solely in Postgres, lack of IO, especially API calls.”

This is the standard grievance of anyone who has inherited a legacy system with 10,000-line PL/SQL packages that nobody understands. When logic lives in the database, it feels hidden from your git history, your linters, and your unit tests.

But the database pragmatists pushed back. Commenter dpark wrote a detailed counter-point:

“You can unit test stored procedures in exactly the same way you could test any other SQL. You have to spin up a DB to do it. But if you can’t test your stored procedures, you’re admitting you have no way to test your SQL which is your real problem… Stored procedures often drastically reduce IO when used correctly and thereby improve scalability.”

The compromise, as moomoo11 suggested, might be to keep business logic in the app layer but use pg_durable for database-level operations:

“same but this could be useful for db level things that are not business logic related. i have always had maintenance packages for this type of stuff. if i could deploy them alongside the database itself that could be kind of cool.”

There is also a valid concern about scaling. If you run workflows in Postgres, you are trading cheap, easily-scaled CPU cycles in your application tier for expensive, hard-to-scale CPU cycles on your primary database. If your workflow spends hours processing data, your database CPU might peg at 100%, taking down your user-facing queries.

Decision checklist

Before you install pg_durable, consider where your bottlenecks live:

Use pg_durable if…	Skip pg_durable if…
Your workflows are mostly database-centric (e.g. data staging, ETL, pgvector embedding updates).	Your workflows require heavy external service orchestration (like calling multiple non-HTTP APIs).
You want to simplify your stack and delete Redis, Celery, or Airflow.	You cannot install custom Rust-based extensions or run background workers in your environment (e.g., standard AWS RDS).
You want runbooks and background jobs to survive a database restart natively.	Your database is already CPU-bound and cannot handle the extra load of background worker processes.

Getting started with pg_durable

To run pg_durable locally, you need PostgreSQL 17 or 18, Rust, and the cargo-pgrx toolchain. Microsoft compiles Debian packages, but for local development, you build from source:

# Clone the repository
git clone https://github.com/microsoft/pg_durable.git
cd pg_durable

# Build and run the extension inside a pgrx-managed Postgres
cargo pgrx run pg17

Inside the PostgreSQL terminal, initialize the extension:

CREATE EXTENSION pg_durable;

One detail the README notes: the background worker role (pg_durable.worker_role, which defaults to postgres) must be a superuser, because it needs to bypass Row-Level Security (RLS) to manage workflow execution state across different database users.

pg_durable is currently in preview, but it is already integrated into Microsoft’s new cloud service, Azure HorizonDB. It is a fascinating bet. In a world that spent a decade moving compute as far away from the data as possible, Microsoft is betting that some developers would rather just write SQL and go home early.

Discussion on Hacker News · Source: github.com · Submitted by coffeemug