Battling with SQLite in a Concurrent Environment

Battling with SQLite in a Concurrent Environment

How we migrated our SSO platform's test suite to SQLite, broke it, and fixed it by understanding database locking and connection pooling.

Mike Chumba Mike Chumba
3 min read
592 words

SQLite is often dismissed as a “toy” database, suitable only for mobile apps or local development. But when you push it into a high-concurrency server environment, you quickly discover that it is a robust, battle-hardened engine — provided you respect its physics.

We recently migrated our SSO platform’s test suite to run on SQLite to improve local development speed. What started as a simple configuration change turned into a deep dive into database locking, connection pooling, and the harsh reality of SQLITE_BUSY.

Here is the story of how we broke SQLite, and how we put it back together.

Phase 1: The Locking Nightmare

Our initial approach was naive: we simply swapped the Postgres driver for SQLite and kept our existing connection pool configuration (100 max connections). We fired up our integration tests with 8 parallel workers, expecting blazing fast results.

Instead, we got a sea of red.

Tests were failing with 500 Internal Server Error and generic Database busy messages. The root cause? SQLite allows multiple readers, but only one writer at a time.

When 8 concurrent workers tried to write to the database simultaneously, they created a bottleneck. Thread A would grab a lock. Thread B would try to write, find the file locked, and panic.

SQLite Lock Contention

Phase 2: The Band-Aid (Retries)

We tried to brute-force our way out. If the database is busy, surely we just need to wait longer?

  1. Increased SQLite timeout: We bumped the busy_timeout to 30 seconds.
  2. Application-level Retries: We wrapped our transaction logic in a retry loop with exponential backoff.

This improved things slightly, but it introduced a new problem: Latency Spikes.

Requests that should take 50ms were taking 15 seconds. The retry queue became a storm. Workers were holding open transactions while waiting for locks, further starving other readers. We hadn’t solved the contention; we had just queued it up.

Retry Storm

Phase 3: The “Raw SQL” Trap

We realized standard deferred transactions (BEGIN) were part of the problem. They don’t acquire the write lock until the first write statement. By then, another connection might have already grabbed it, leading to a deadlock.

We switched to BEGIN IMMEDIATE. This forces the transaction to acquire the write lock upfront.

However, our ORM (SeaORM/SQLx) runs on a connection pool. When we executed:

db.execute("BEGIN IMMEDIATE").await?;
db.query("...").await?;
db.execute("COMMIT").await?;

There was no guarantee that the query or COMMIT would run on the same connection that started the transaction! The pool would reclaim the connection, and we ended up with errors like cannot start a transaction within a transaction.

The Breakthrough: Reader/Writer Split

The solution was to stop fighting SQLite’s nature and design our architecture around it.

If SQLite only supports one writer, our application should only have one writer.

We implemented a Split Connection Architecture:

  1. Reader Pool (db): A standard pool for read-only operations. Multiple connections, fully concurrent.
  2. Writer Pool (db_writer): A specific pool logically restricted to max_connections = 1.

We refactored our critical with_retrying_transaction helper to conditionally use this dedicated writer connection for SQLite builds:

// SQLite-specific path
#[cfg(feature = "db_sqlite")]
{
    // Acquire the exclusive writer connection
    let txn = db_writer.begin().await?; 
    // ... perform operations ...
    txn.commit().await?;
}

This ensures that all write operations are serialized in the application layer before they even hit the database. No more fighting for locks. No more retry storms.

Reader/Writer Architecture

The Result

Our test suite now passes consistently with 8 workers. Timeouts are gone.

SQLite is an incredibly capable engine, but it is not PostgreSQL. By acknowledging its single-writer constraint and architecting our application to respect it, we achieved the high-performance, stable local environment we wanted.