Case Study: Optimizing SQLite SSO for 25M MAUs

Case Study: Optimizing SQLite SSO for 25M MAUs

AuthOS is a comprehensive, open-source identity platform engineered to provide a self-hosted, database-agnostic alternative to services like Auth0 and it was largely inspired by the speed of Netlify’s GoTrue. AuthOS supports 3 database flavors: SQLite, PostgreSQL, and MySQL.

Mike Chumba Mike Chumba
7 min read
1447 words

Executive Summary: Over the last few months, I have been working on a Rust based authentication API that finally got a name: AuthOS. It has everything you’ll ever need from an authentication solution. AuthOS supports 3 database flavors: SQLite, PostgreSQL, and MySQL. The optimization journey, this case study, focuses on SQLite backend despite the tweaks improving speed across the other two backends. I successfully worked around the limitations of SQLite bringing the system to a state I believe can handle 40 Million daily requests. The benchmarks were not so surprising because the app is built with Rust, but some architectural decisions had limited the Ferrari engine to a tuktuk performance. I have been experimenting with Rust and even did a benchmark for device authorization on the article “Anatomy of a Bottleneck” but later decided why not go full blown authentication platform… I have always loved the GoTrue authentication system from Netlify for its speed, and yet, for an authentication solution, I have never liked the idea of limiting the user to a single database engine - Netlify’s GoTrue uses MariaDB, I wanted more!


1. The Starting Point: “The Death Spiral”

I started with a functional, compliant single-sign-on (SSO) API. It handled the complex user journeys I needed, such as Platform Owners approving organizations, admins managing webhooks, and end-users logging in via standard OAuth flows. But when I pushed it, the reality was harsh. Under synthetic load testing, my “Ferrari” collapsed at just 130 requests per second.

  • Throughput: ~130 req/s (Saturated)
  • Latency (P95): 17.69 seconds (Unusable)
  • Diagnosis: The system was suffering from Resource Starvation.
    1. CPU Blocking: The heavy lifting of cryptography (specifically the Argon2 password hashing used during user registration and login) was running on the async event loop, choking the runtime.
    2. Write Contention: Every single API request was fighting for a lock. Whether it was the auth_callback handler updating a “last seen” timestamp or the audit logger recording a user.login event, synchronous writes were locking the SQLite database, forcing every read operation to queue up indefinitely.

Abstract Architecture Comparison Fig 1: Evolution from a blocking, synchronous monolith to a non-blocking, event-driven architecture.


2. The Architectural Transformation

I wasn’t about to give up on SQLite. I knew the raw power was there, I just had to unlock it. I applied a series of optimizations, systematically identifying and decoupling the bottlenecks that were holding the engine back.

Phase 1: Unblocking the Runtime (CPU)

  • The Problem: One user logging in was freezing the server for everyone else. When password verification hit the server, the Argon2 verification halted the entire event loop for 50ms.
  • The Solution: I moved the heavy cryptographic work to tokio::task::spawn_blocking and guarded it with a Semaphore limited to 2x my CPU cores.
  • The Result: Authentication became an isolated, throttled process. A spike in logins could no longer bring down the system for users who were already authenticated.
// BEFORE: Blocking the async runtime
async fn verify_password(password: &str, hash: &str) -> bool {
    // ✗ Runs on the main thread, freezing the server for 50ms
    argon2::verify(password, hash) 
}

// AFTER: Offloaded with concurrency limits
static CPU_LIMIT: Semaphore = Semaphore::new(8); // Max 8 concurrent hashes

async fn verify_password(password: &str, hash: &str) -> Result<bool> {
    // ✓ Acquire permit (wait if too busy)
    let _permit = CPU_LIMIT.acquire().await?; 
    
    // ✓ Run on a background thread, unblocking the API
    tokio::task::spawn_blocking(move || {
        argon2::verify(password, hash) 
    }).await?
}

Phase 2: The “Indestructible” Audit Actor (I/O)

  • The Problem: My requirement for strict audit trails was killing performance. Every request triggered a disk write. In SQLite, that means exclusive access.
  • The Solution: I implemented an Asynchronous Buffered Actor.
  • The Trade-off: I traded strict immediate durability (logs sit in RAM for ~1s) for massive throughput.
  • The Result: User requests return immediately. The actor handles the heavy lifting of batch writing and retries in the background, absorbing those pesky SQLite BUSY errors without the user ever noticing.
// BEFORE: Synchronous Writer
async fn log_event(event: Event) {
    // ✗ API request hangs here until disk I/O completes
    db.insert(event).await; 
}

// AFTER: Asynchronous Actor
async fn log_event(event: Event) {
    // ✓ Instant return! Pushes to an in-memory channel
    audit_sender.send(event).await; 
}

// Background Worker (The Actor)
async fn run_audit_actor(mut receiver: Receiver<Event>) {
    let mut buffer = Vec::new();
    while let Some(event) = receiver.recv().await {
        buffer.push(event);
        if buffer.len() >= 100 || timeout_elapsed() {
            // ✓ Efficient batch write in background
            db.insert_batch(buffer).await; 
            buffer.clear();
        }
    }
}

Abstract Audit Actor Pattern Fig 2: The Buffered Actor pattern decouples high-volume writes from the critical latency path.

Phase 3: Traffic Hygiene (Logic)

  • The Problem: My code was too eager. Writing “Last Seen” timestamps on every request was flooding the Write-Ahead Log (WAL) for no good reason.
  • The Solution: I implemented Debouncing. I taught the system to chill out, and it now only updates the database if more than 5 minutes have passed since the last write.
  • The Result: Write IOPS dropped by ~90%.

Traffic Hygiene Diagram Fig 3: Smart debouncing prevents the database from drowning in identical updates.

// BEFORE: Write on every request
async fn track_activity(user_id: Uuid) {
    // ✗ Floods the WAL with identical updates
    db.update_last_seen(user_id, now()).await; 
}

// AFTER: Smart Debouncing
async fn track_activity(user_id: Uuid) {
    let last_seen = cache.get(user_id);
    
    // ✓ Only write if > 5 minutes have passed
    if last_seen < now() - Duration::from_mins(5) {
        db.update_last_seen(user_id, now()).await;
        cache.set(user_id, now());
    }
}

Phase 4: Fail Fast Concurrency (Stability)

  • The Problem: When I really hammered the system (1000 concurrent users), the queue just kept growing until requests timed out after 30 seconds.
  • The Solution: I implemented TryAcquire semantics on the Login Semaphore and set the SQLite Writer timeout to a strict 3 seconds.
  • The Result: The system now knows its limits. It sheds load (returning 429 Too Many Requests) instantly when saturated, ensuring that active users, those inside the club, still get sub-5ms responses.

Fail Fast / Load Shedding Diagram Fig 4: The ‘Bouncer’ (Semaphore) keeps the club from getting overcrowded, so everyone inside can dance.

// BEFORE: Unbounded Queue
async fn login(creds: Credentials) -> Result<Token> {
    // ✗ Queues forever if system is overloaded (30s timeouts)
    let _permit = CPU_LIMIT.acquire().await?; 
    process_login(creds).await
}

// AFTER: Fail Fast Protection
async fn login(creds: Credentials) -> Result<Token> {
    // ✓ Fails instantly if queue is full
    let _permit = CPU_LIMIT.try_acquire().or_else(|_| {
        Err(Error::TooManyRequests("System saturated, try again later"))
    })?;
    
    process_login(creds).await
}

3. The Final Scoreboard

The results were validation of the “SQLite First” philosophy.

MetricInitial StateOptimized State (Moderate Load)Optimized State (Saturation)
Throughput~135 req/s (Unstable)210 req/s (Stable)467 req/s (Maxed)
P95 Latency17,690 ms4.88 ms2,720 ms
Error Rate0% (but unusable)0.00%0.01%
BottleneckCPU & DB LocksNone (Server Bored)DB Connection Pool Limit

Abstract Performance Metrics Fig 3: Latency dropped by 4 orders of magnitude while throughput tripled.


4. Key Engineering Takeaways

  1. SQLite is not a toy: With proper tuning (WAL mode, dedicated Writer Actor, IMMEDIATE transactions), I proved that SQLite can sustain 467 requests/second mixed with heavy writes. This isn’t just a prototype; it’s an engine capable of serving ~25 Million MAUs on a single $5 VPS.
  2. Async $\neq$ Non-blocking: Just because I marked a function async didn’t make it fast. Offloading the CPU-bound work was mandatory.
  3. ACID is a Spectrum: I didn’t break ACID; I just chose where to apply it. Critical user data remains strictly consistent, while telemetry and logs became eventually consistent.
  4. The “5ms” Victory: Achieving a 4.88ms P95 latency on a full authentication flow, database, logic, and crypto included, is something I’m incredibly proud of. It’s the kind of performance usually reserved for expensive in-memory systems.

Conclusion

I have engineered a system capable of handling ~40 Million requests per day on a single node using a local file database. This architecture proves that complexity, shifting to Postgres, Redis, or Microservices, should only be introduced when you actually hit physical limits. For 99% of us, the Ferrari engine of SQLite is more than enough.


Appendix: The 25M MAU Estimation

To back up the claim of supporting 25 Million Monthly Active Users (MAUs) on a single $5 node, here is the math derived from the load test results:

The Math:

  1. Benchmark Throughput: The optimized system sustains 210 requests/second at moderate load.
  2. Daily Capacity: 210 RPS $\times$ 86,400 seconds = ~18.1 Million requests/day.
  3. Requests per User: I assume an average of 4 authentication requests per Daily Active User (DAU).
  4. DAU Capacity: 18.1M / 4 = 4.5 Million DAUs.
  5. DAU/MAU Ratio: Assuming a standard engagement ratio of 20%: $$ 4.5\text{M DAU} / 0.20 = \textbf{22.5 Million MAUs} $$

I rounded this to 25M given that the “saturation” limit was actually higher (467 RPS), suggesting even more headroom.

References:

  • Estimating MAU from DAU: “The DAU/MAU ratio helps estimate the total active user base from daily traffic. For SaaS, a 13–20% ratio is a standard benchmark.” (Mixpanel: What is DAU/MAU? )