PostgreSQL HA with Patroni: Failover, PgBouncer & Restore

PostgreSQL at Scale: HA, Patroni, PgBouncer, and DR

Reusable production assets included4 downloadable templates · MIT licensed

Practical guide scope

Who this is for

Database owners, SREs, backend leads, and CTOs operating PostgreSQL under production load

Where it applies

PostgreSQL systems with replication, Patroni, PgBouncer, strict latency targets, or business-critical restore requirements

Problems this guide helps solve

Connection storms and long transactions create latency before CPU looks saturated.
Replication and failover exist but promotion behavior is not rehearsed.
Backups report success without a measured restore.
Scaling decisions are based on instance size instead of workload evidence.

PostgreSQL at scale is a reliability problem before it is a hardware problem. Replication, failover, connection pressure, query plans, locks, autovacuum, checkpoints, storage latency, backups, and application retry behavior interact. A database can show normal CPU while p99 latency grows, sessions wait on locks, replicas fall behind, and customer requests time out.

This guide is for database owners, SREs, backend leads, and CTOs operating PostgreSQL under real production load. It explains how the parts fit together, which problems each control solves, and how to turn architecture theory into a measured implementation and validation plan.

The practical goal is not “maximum throughput.” It is predictable behavior: bounded latency, controlled concurrency, safe failover, recoverable data, enough capacity headroom, and clear operating decisions during pressure.

Start with workload and failure model

Before choosing bigger hardware, Kubernetes, Patroni, a managed database, read replicas, or sharding, define the workload.

Capture:

Peak and normal requests per second.
Read/write ratio.
p95 and p99 query latency for critical operations.
Active, idle, waiting, and idle-in-transaction sessions.
Long transactions and lock waits.
WAL generation rate and replica replay lag.
Checkpoint frequency and storage latency.
Table and index growth.
Autovacuum lag, dead tuples, and bloat indicators.
Backup duration, recovery point, and last successful restore test.

Also document the failure model:

Can the application tolerate stale reads?
What is the maximum acceptable data loss?
How long may writes be unavailable?
What happens when the primary disappears?
What happens when a replica is hours behind?
What happens when all application pods reconnect at once?
Which migrations make rollback unsafe?

Without this baseline, scaling work becomes expensive guessing.

Build a PostgreSQL connection budget

Many PostgreSQL incidents are connection incidents. Application replicas scale out, every process opens its own pool, a deploy restarts all workers, and the database spends memory and scheduler time managing sessions instead of executing useful work.

Create a connection budget from the database inward:

Reserve sessions for administration, monitoring, backups, replication, and maintenance.
Decide the safe application server connection count.
Divide that budget across PgBouncer pools or application instances.
Add a small controlled reserve pool.
Test deployment and autoscaling behavior against the budget.

Useful checks:

select state, count(*)
from pg_stat_activity
group by state
order by count(*) desc;

select wait_event_type, wait_event, count(*)
from pg_stat_activity
where wait_event is not null
group by 1, 2
order by 3 desc;

select pid, usename, application_name, now() - xact_start as age, state, left(query, 120)
from pg_stat_activity
where xact_start is not null
order by xact_start;

A high max_connections value is not a capacity strategy. It can postpone rejection while increasing memory pressure and making overload slower to recover from.

PgBouncer: choose pool mode from application behavior

PgBouncer can protect PostgreSQL from connection storms, but its pool mode changes session semantics.

Pool mode	Best for	Main risk
Session	Applications requiring session state	Fewer multiplexing benefits
Transaction	Most stateless web requests	Session-level settings and prepared behavior need review
Statement	Narrow specialized workloads	Transactions are not supported across statements

Before using transaction pooling, review:

Session variables and SET behavior.
Temporary tables.
Advisory locks.
Prepared statements and driver behavior.
LISTEN/NOTIFY usage.
Long transactions.
Connection reset behavior.

The implementation example below provides a conservative starting point, but pool size must come from the real connection budget.

Query latency, locks, and autovacuum

Adding replicas or larger instances does not fix poor query plans, long transactions, lock contention, or tables that autovacuum cannot keep up with.

A practical performance review includes:

pg_stat_statements by total time, mean time, calls, and rows.
Query plans with representative parameter values.
Long-running and idle-in-transaction sessions.
Blocked and blocking process chains.
Dead tuple growth and autovacuum timestamps.
Index usage and write amplification.
Checkpoint timing and WAL volume.
Storage latency during peak write windows.

Example blocking-chain input:

select
  blocked.pid as blocked_pid,
  blocker.pid as blocker_pid,
  now() - blocked.query_start as blocked_for,
  left(blocked.query, 120) as blocked_query,
  left(blocker.query, 120) as blocker_query
from pg_stat_activity blocked
join pg_locks blocked_locks on blocked_locks.pid = blocked.pid and not blocked_locks.granted
join pg_locks blocker_locks
  on blocker_locks.locktype = blocked_locks.locktype
 and blocker_locks.database is not distinct from blocked_locks.database
 and blocker_locks.relation is not distinct from blocked_locks.relation
 and blocker_locks.page is not distinct from blocked_locks.page
 and blocker_locks.tuple is not distinct from blocked_locks.tuple
 and blocker_locks.virtualxid is not distinct from blocked_locks.virtualxid
 and blocker_locks.transactionid is not distinct from blocked_locks.transactionid
 and blocker_locks.classid is not distinct from blocked_locks.classid
 and blocker_locks.objid is not distinct from blocked_locks.objid
 and blocker_locks.objsubid is not distinct from blocked_locks.objsubid
 and blocker_locks.pid != blocked_locks.pid
join pg_stat_activity blocker on blocker.pid = blocker_locks.pid
where blocker_locks.granted;

Use this as evidence for incident analysis, not as a reason to kill sessions automatically without understanding the transaction and business impact.

Patroni, consensus, and failover discipline

Patroni coordinates leader election and PostgreSQL promotion through a distributed configuration store such as etcd or Consul. It is effective when the failure model, consensus health, routing, watchdog or fencing assumptions, and application reconnect behavior are understood.

Useful checks:

patronictl list
etcdctl endpoint health --cluster
curl -fsS http://127.0.0.1:8008/cluster
psql -X -c "select pg_is_in_recovery();"
psql -X -c "select client_addr, state, sync_state, write_lag, flush_lag, replay_lag from pg_stat_replication;"

The runbook must define:

When automatic failover is allowed.
When promotion must be blocked because lag exceeds RPO.
Who approves manual failover.
How clients discover the new primary.
How split-brain risk is controlled.
How the old primary is fenced or rebuilt.
How replicas, backups, and monitoring continue afterward.
Which application transaction proves write recovery.

A patronictl success message is not enough. Measure client reconnect time and verify that only one intended primary accepts writes.

Bare metal, VMs, Kubernetes, or managed PostgreSQL

The correct platform depends on workload, team maturity, storage, compliance, recovery objectives, and desired control.

Platform	Best for	Strength	Main responsibility
Bare metal PostgreSQL	Predictable I/O and critical databases	Direct control over storage and kernel behavior	Provisioning, HA, backup, patching, hardware failure
VM-based PostgreSQL	Teams needing isolation and familiar operations	Good balance of control and manageability	HA, storage, patching, recovery automation
Kubernetes operator	Mature platform teams with tested storage	Consistent declarative automation	Operator behavior, storage, node failure, backup, upgrades
Managed PostgreSQL	Teams prioritizing operational offload	Strong baseline automation	Limits, cost, provider failure model, application resilience

Kubernetes does not remove database engineering. It adds scheduling, storage, network, and operator behavior that must also be tested during failover and recovery.

Monitoring before users feel pain

A useful PostgreSQL dashboard should show leading indicators, not only CPU and disk percentage.

Monitor:

p95/p99 query latency for critical operations.
Active, waiting, and idle-in-transaction sessions.
Connection usage against the documented budget.
Lock waits and long transactions.
Replica write, flush, and replay lag.
WAL generation rate and archive failures.
Checkpoint duration and frequency.
Buffer cache behavior and storage latency.
Dead tuples, autovacuum progress, and table growth.
Backup success, last recoverable point, and restore test age.

Alerts should express impact and action. “Database CPU high” is weak. “Primary connection budget is 90% consumed, checkout p99 doubled after deploy, and queue depth is rising” is useful.

Backup and restore testing

Backups become a recovery strategy only after a clean restore proves the path.

A restore drill should answer:

Which backup and recovery data were used?
What exact point was recovered?
How long did restore take?
Which credentials and infrastructure were required?
Did PostgreSQL start cleanly?
Did consistency and application smoke tests pass?
Did backup, WAL archive, replication, and monitoring resume?
Which gaps became owned follow-up actions?

Store the drill record with timestamps, commands, selected outputs, recovered point, duration, and validator approval. This evidence is more valuable than a green backup job alone.

PostgreSQL scaling decision matrix

Approach	Best for	Reliability impact	Complexity
Query and transaction tuning	Slow critical operations and lock contention	Often the safest highest-value improvement	Medium
PgBouncer and pool budgets	Connection storms and high application concurrency	Protects the server and improves overload behavior	Medium
Vertical scaling	Storage, memory, or CPU pressure with simple architecture	Fast capacity increase with a finite ceiling	Low/Medium
Read replicas	Read-heavy workloads with acceptable consistency rules	Reduces read pressure but adds lag and routing complexity	Medium
Patroni HA	Business-critical primary availability	Controlled promotion when consensus and routing are correct	High
Sharding or service decomposition	Workloads beyond one primary design	Large scaling ceiling with major application complexity	Very high

Disaster Recovery Runbook Template — recovery ownership, RPO/RTO, commands, and validation.
Kubernetes Production Readiness Checklist — storage, secrets, rollout, and operational controls around stateful workloads.
Load Balancing: Comparative Architectures — routing and health checks for primary or service failover.
Security Evidence Operations Model — access, backup, change, and incident evidence.

Key takeaways

PostgreSQL scaling begins with workload and failure evidence.
Connection budgets and pooling often matter before larger hardware.
Query plans, locks, autovacuum, WAL, and storage behavior need direct measurement.
Patroni improves HA only when consensus, routing, fencing, and client reconnect are tested.
Monitoring must connect database saturation to customer impact.
Backups are credible only after a clean restore meets RPO and RTO.

Operational takeaway

Treat PostgreSQL as a reliability system: baseline the workload, control concurrency, measure query and lock behavior, rehearse failover, test restore, and keep enough headroom for failure and deployment—not only normal traffic.

Need a PostgreSQL reliability review?

SteadyOps can review workload behavior, Patroni, PgBouncer, replication, queries, backups, monitoring, and failover, then produce a prioritized implementation and validation plan.

Implementation blueprint

Use this sequence to turn the theory into an auditable production change. Adjust commands, thresholds, and ownership to the real environment before execution.

Baseline the workload

Capture peak connections, top queries, lock waits, WAL rate, checkpoint behavior, autovacuum health, disk latency, table growth, and replica lag.
- Peak window is included
- p95/p99 query latency is visible
- Long transactions have owners
Control concurrency before adding hardware

Set application pool budgets, use PgBouncer where appropriate, add statement and transaction timeouts, and stop deploys from opening unbounded sessions.
- Connection budget is documented
- Pool mode matches application behavior
- Timeouts are tested
Design and test failover

Validate Patroni and consensus health, promotion rules, client routing, fencing assumptions, replica rebuild, backup continuity, and application retry behavior.
- Only one writable primary exists
- Client reconnect time is measured
- Replica rebuild procedure is known
Prove backup restore and capacity headroom

Restore into a clean environment, record recovered point and duration, then load-test critical queries with realistic concurrency.
- Restore meets RTO
- Recovered point meets RPO
- Disk and connection headroom remain after failover

Configuration and command examples

Examples are conservative starting points. Review security, version compatibility, failure behavior, and rollback before production use.

PgBouncer transaction-pooling baseline

Adjust pool sizes to the real PostgreSQL connection budget and test session-dependent features before using transaction mode.

[databases]
app = host=postgres-primary port=5432 dbname=app

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 40
reserve_pool_size = 10
reserve_pool_timeout = 3
server_idle_timeout = 60
query_timeout = 30
client_idle_timeout = 300

Read-only saturation and lag checks

Use these queries during baseline reviews and after deploy or failover events.

select state, count(*)
from pg_stat_activity
group by state
order by count(*) desc;

select pid, now() - xact_start as age, state, wait_event_type, wait_event, left(query, 120)
from pg_stat_activity
where xact_start is not null
order by xact_start;

select client_addr, state, sync_state, write_lag, flush_lag, replay_lag
from pg_stat_replication;

Production validation checklist

Connection usage stays below the documented server budget during peak load.
The top slow queries have plans, owners, and a remediation decision.
Replication lag alerts before the replica becomes unsafe for failover.
A controlled failover preserves writes and reconnects clients within the target.
A clean restore meets the documented RPO and RTO.
Backups, WAL archive, PgBouncer, and monitoring continue after promotion.

Official references

Reusable assets

Download templates and validation files

Use these files as reviewed starting points. Keep the source link and version when sharing or adapting them.

Markdown

PostgreSQL failover drill

Preconditions, failure injection, promotion decision, routing, client reconnect, replication recovery, and evidence.

Download →SQL

PostgreSQL health checks

SQL checks for role, replication, sessions, waits, long transactions, and recovery validation.

Download →HAProxy

HAProxy routing example

A health-check-based primary and replica routing starting point for Patroni.

Download →INI

PgBouncer configuration example

A conservative transaction-pooling starting point with explicit limits and timeouts.

Download →

Templates are provided under the MIT License. Production use still requires environment-specific review and testing.

Stable reference

Version, testing scope, and citation

Version: 1.0.0
Last reviewed: Jul 10, 2026
Tested with: PostgreSQL 12–16 · Patroni 3.x · etcd 3.5 · HAProxy 2.x · PgBouncer 1.22+
License: CC BY 4.0 for the article; MIT for downloadable templates
Permanent URL: https://steadyops.best/articles/postgresql-at-scale/

Yuri Osipov. "SteadyOps PostgreSQL Failover Drill." SteadyOps, version 1.0.0, reviewed 2026-07-10. https://steadyops.best/articles/postgresql-at-scale/

PostgreSQL HA review

Need PostgreSQL failover and restore validated end to end?

Send the PostgreSQL version, topology, Patroni or managed-service details, connection path, backup method, and last failover or restore result.

Request PostgreSQL HA Review Review service scope

Focused request

Need PostgreSQL failover and restore validated end to end?

Send your current stack and the production risk. Optional commercial details can be added after the technical context.

Practical guide scope

Who this is for

Where it applies

Problems this guide helps solve

Start with workload and failure model

Build a PostgreSQL connection budget

PgBouncer: choose pool mode from application behavior

Query latency, locks, and autovacuum

Patroni, consensus, and failover discipline

Bare metal, VMs, Kubernetes, or managed PostgreSQL

Monitoring before users feel pain

Backup and restore testing

PostgreSQL scaling decision matrix

Related SteadyOps reading

Key takeaways

Operational takeaway

Need a PostgreSQL reliability review?

Implementation blueprint

Baseline the workload

Control concurrency before adding hardware

Design and test failover

Prove backup restore and capacity headroom

Configuration and command examples

PgBouncer transaction-pooling baseline

Read-only saturation and lag checks

Production validation checklist

Official references

Download templates and validation files

PostgreSQL failover drill

PostgreSQL health checks

HAProxy routing example

PgBouncer configuration example

Version, testing scope, and citation

Need PostgreSQL failover and restore validated end to end?

Need PostgreSQL failover and restore validated end to end?