Black line technical cover showing repeated token round trips between an API gateway and an authorization server.

Is Your API Gateway Burning 800ms on Every Request?

What if a perfectly working API gateway was quietly fetching the same valid service token thousands of times a day? This field guide shows how to find the hidden token tax before it becomes normal.

Mike Chumba Mike Chumba
7 min read
1469 words

Correct software can still be wasteful software.

That was the irritating lesson hiding inside one API gateway route. It passed traffic. It attached the right Authorization header. The auth server returned valid service tokens. The dashboards stayed green.

And every request still paid an extra 800ms for work the gateway had already done.

The bug was not dramatic enough to wake anybody up. It did not throw 500s. It did not corrupt data. It just made every caller wait while the gateway fetched another token it did not need.

The root cause was simple:

The gateway was fetching a brand new service token on every single request, even though each token was actually valid for hours.

This is how latency becomes normal. Each component works, so everyone stops looking. The waste lives between the components, where ownership is vague and timing is easy to ignore.

Baseline API gateway flow where every client request triggers a fresh service token callout before reaching the backend.
Don’t just measure your backend. Measure the auth tax you’re paying in front of it.

The Token Tax

Enterprise APIs often carry two kinds of authorization:

  • Client auth: The user proves they’re allowed to talk to the API.
  • Service auth: The gateway proves to the backend that it’s allowed to forward the request.

That second layer is where the tax hides. The gateway calls an authorization server, gets a service-level bearer token, and attaches it to the upstream request.

In our setup, the token response looked like this:

{
  "access_token": "eyJraWQiOiIwYzFj...",
  "refresh_token": "VrjLhXmOH3dX...",
  "scope": "read write",
  "token_type": "Bearer",
  "expires_in": 38399
}

Look at expires_in. The token lived for almost ten hours. The gateway treated it like a single-use receipt.

For every single incoming request, our policy chain was doing this:

  1. Fire an HTTP Callout to the auth server.
  2. Parse the JSON to grab the access_token.
  3. Inject the Authorization: Bearer <token> header.
  4. Route the request to the backend.
  5. Throw the token away.

At low traffic, you can get away with this. At real traffic, it is a latency leak and an auth-server load generator.

Measure Before Touching Policy

Do not start by changing gateway policy. Start by proving the waste.

The metric we care about here is time to first byte (TTFB), measured from the client side.

A boring curl command is enough:

curl -s -o /dev/null \
  -w "dns=%{time_namelookup} connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total}\n" \
  https://api.example.com/some-route

Run that a few times and compare:

  • The direct backend speed (if you can bypass the gateway safely).
  • The gateway route with the auth callout turned on.
  • Repeated calls just a few seconds apart.
  • The raw response time of the auth server itself.

You’re trying to answer one specific question:

Is the gateway doing useful, unique work on every request, or is it just doing the exact same work over and over?

In our case, the answer was blunt. The gateway was doing repeated work. The callout cost around 800ms, and the token returned across nearby requests was identical.

The Clean Fix We Wanted

The fix should have been boring: cache the token.

If it’s a cache miss:

  1. Hit the auth server.
  2. Store the token in the cache.
  3. Attach the backend Authorization header.

If it’s a cache hit:

  1. Read the token directly from the cache.
  2. Completely skip the auth server.
  3. Attach the backend Authorization header.

The flow we wanted was dead simple:

Cache lookup
  -> hit: set Authorization header
  -> miss: HTTP Callout -> store token -> set Authorization header

Gravitee had all the pieces that sounded right: a Cache resource, a Cache policy, Expression Language (EL) conditions, an HTTP Callout policy, Transform Headers, and Groovy scripts for the cases where the policy chain runs out of road.

Our intended setup looked like this:

StepPolicyWhat it does
1Cache lookupCheck the cache using a stable service-token key.
2HTTP CalloutFetch a new token only if the cache was empty.
3Cache storeSave the fresh token to the cache with a TTL just under expiry.
4Transform HeadersAttach Authorization: Bearer <token> to the request.

We figured a simple condition on the HTTP Callout would do the trick:

{#context.attributes['access_token'] == null}

If the access_token is already in the context, skip the callout. If it’s missing, go fetch it.

Clean. Reasonable. Wrong for this gateway version.

Expected cache lookup behavior compared with Gravitee response cache behavior.
Our plan failed because we thought we were getting a key-value store, but the policy was built to be a response cache.

The Cache Was the Wrong Tool

The older Gravitee Cache policy was not a general-purpose key-value store sitting politely in the middle of a request pipeline.

Its natural job is response caching: it stores the upstream response content, status codes, and headers, and then aggressively returns that cached response the next time a matching request comes in. The official Gravitee Cache policy documentation even describes it this way: it caches upstream responses to completely avoid calling the backend.

That is not a small distinction. It changes the whole design.

We wanted the cache to behave like this:

cache hit -> grab token -> put token in context -> keep going down the policy chain

But the policy’s internal logic did this:

cache hit -> return the cached response immediately to the caller -> stop processing

We wanted a storage primitive. We had a response shortcut.

(Quick note: If you’re running a newer version of Gravitee, look at the Data Cache policy. Gravitee’s newer Data Cache documentation describes arbitrary key-value operations, including storing auth tokens before an HTTP Callout. That is the feature we wished we had.)

For our production system at the time, though, the only tool we had was that older, response-oriented Cache policy.

The Sandbox Closed the Back Doors

The next instinct was Groovy.

If declarative policies could not read and write to the cache the way we needed, maybe a script could:

def cache = context.getComponent('authTokens')
def cachedToken = cache.get('auth_token_general')

if (cachedToken != null) {
    request.headers.set('Authorization', "Bearer ${cachedToken}")
} else {
    // call auth server, store token, set header
}

The gateway rejected it immediately:

{
  "message": "Failed to resolve method [ class io.gravitee.policy.groovy.utils.AttributesBasedExecutionContext getComponent java.lang.String ]",
  "http_status_code": 500
}

Our script didn’t have access to the full execution context. It was running inside a restricted wrapper.

That door was closed.

Our next attempt used static fields to hold the token in memory:

@groovy.transform.Field static String cachedToken = null
@groovy.transform.Field static long tokenExpiresAt = 0L

The sandbox blocked that annotation.

Then we tried cheating with JVM properties:

System.setProperty('example.auth.token', token)

The sandbox blocked that too.

Three blocked persistence routes in the Gravitee Groovy sandbox: cache component access, static fields, and system properties.
Every obvious way to save state in-memory was blocked by the sandbox.

The Sandbox Was Right

It is easy to treat the sandbox as the villain. That is usually lazy thinking.

Gateway scripts sit on the request path. If they can do anything, they eventually will: filesystem access, strange network calls, JVM internals, unsafe shared state, memory leaks, and logic that only one person can explain.

The failures gave us useful constraints:

  • We couldn’t depend on hidden gateway internals.
  • We couldn’t hack together persistent state inside Groovy.
  • We couldn’t demand that our operations team change the gravitee.yml whitelist.
  • We needed a solution that our API team could configure without restarting the gateway.

That changed the question.

Not:

How do we force the Cache policy to behave like a generic token store?

But:

How do we use the Cache policy’s default “response caching” behavior to solve our problem?

That question led to the Loopback Pattern.

Do This Before You Fix It

Before changing your own gateway policies, gather the facts that make the fix defensible:

QuestionWhy it matters
How long does the service token actually live?You’ll need to set a cache TTL that is shorter than the token’s total lifespan.
Is the exact same token shared across multiple routes?If you’re using a shared service account, you can use one static cache key.
Is the auth response identical if you make requests back-to-back?Identical tokens prove that reusing them is actually safe.
How much latency does the callout add?This gives you the actual business case to spend time fixing it.
Can your gateway cache arbitrary key-value pairs?If it can’t (like in our v3 setup), you’ll need an architectural workaround.
Can your team actually change gateway config?Altering sandbox rules or adding plugins might require admin approvals you don’t have.
What happens if auth fails?A cache miss needs to fail gracefully without poisoning your cache with bad data.

If you can answer those, you are not hand-waving about caching. You are identifying a safe piece of repeated work and removing it from the hot path.

Next: Use The Tool You Actually Have

The next post builds the workaround: a tiny internal “Loopback API.” Instead of fighting response caching, we use it.

No Groovy hacks. No gateway restarts. No custom plugins.

One internal API returns exactly the thing we wanted to cache in the first place: the service token.

Part of the "Gravitee API Caching" series