Stop Fighting the Gateway. Cache the Response It Wants.

The gateway refused to be our token store.

That was annoying, but it was also useful. It forced us to stop treating the platform like a programmable blank slate.

In the previous post, we found the waste: every request through the gateway triggered a fresh service-token callout even though the token was valid for hours. The obvious fix was to cache the token. The gateway said no in the usual enterprise way: rigid policy behavior, sandbox boundaries, and configuration options that almost do what you want.

The breakthrough came from a different question:

If the Cache policy insists on returning a full, cached HTTP response… why don’t we just make that cached response exactly the thing we need?

That is the Loopback Pattern.

Loopback cache flow showing a cache miss going to the auth server and a cache hit returning the stored token response. — The trick is to cache the whole token response behind a tiny internal API, and let your business APIs call that instead.

Stop Trying to Build a Token Store

In our initial, failed design, we were trying to force Gravitee’s Cache policy to act like this:

read token from cache -> put token into a context attribute -> continue the request flow

But the older Cache policy we had access to was hardwired to do this:

read response from cache -> return that response immediately to the caller -> stop processing

So we changed the shape of the caller:

What if the caller was supposed to receive that cached response?

Think about it: the real auth server is already returning a perfectly formatted JSON response:

{
  "access_token": "eyJraWQiOiIwYzFj...",
  "refresh_token": "VrjLhXmOH3dX...",
  "scope": "read write",
  "token_type": "Bearer",
  "expires_in": 38399
}

And that JSON response is exactly what our existing HTTP Callout policy already knows how to parse!

So, instead of having each of our business APIs call the real auth server directly, we decided to have them call a tiny, internal API instead:

Business API -> HTTP Callout -> Auth Loopback API -> real auth server

That “Auth Loopback API” exists for one reason only: to cache and return the auth response.

The Architecture Is Two APIs

The final architecture uses two APIs inside the gateway.

The first is your standard Business API. It handles the incoming client request, fires off a callout to get a service token, injects the Authorization header, and routes traffic to the backend.

The second is our Internal Loopback API. It’s not a real product API, and it’s definitely not exposed to the public. It’s just a tiny proxy sitting in front of the real auth server. Its only job is:

Return a valid service-token response, preferably straight from the cache.

Configuration map showing the business API HTTP Callout target changed from the auth server to the localhost loopback API. — The Business API keeps its exact same policy shape. We just point the callout target to localhost.

With this setup, the request path looks like this:

Client
  -> Business API
  -> HTTP Callout to http://localhost:8082/loopback
  -> Auth Loopback API
  -> Cache policy
      -> hit: return cached auth JSON immediately
      -> miss: call real auth server, store the response, return auth JSON
  -> Business API parses out the access_token
  -> Business API sets the Authorization header
  -> Backend

Nothing downstream knows this happened. That is why it is safe.

Make the Loopback Boring

The loopback API should be boring.

Setting	Value
API name	`Auth Loopback`
Entrypoint	`/loopback`
Internal URL	`http://localhost:8082/loopback`
Backend target	`http://example.com/authorization-server/token`
Public exposure	none
Main policy	Cache

Notice that the backend target is the exact same auth endpoint our business APIs used to call directly.

There’s no Groovy scripting here. No Transform Headers. No tricky response manipulation. No custom plugins.

It is a transparent proxy with a Cache policy in front of it. That is the whole trick.

Cache Configuration Is a Security Decision

You want to use a cache time-to-live (TTL) that is safely shorter than your actual token lifetime.

In our case, the token lifetime was roughly ten hours, so we set the cache to eight hours. That buffer gives you plenty of room for clock drift, delayed requests, and generally conservative expiry behavior.

Cache setting	Example value	Why we set it
Time to live	`28800` seconds	A hard eight-hour ceiling.
Time to idle	`28800` seconds	Evict the token if it’s unused.
Cache key	`service-token-general`	We use one shared service credential.
Scope	API-level, if safe	All callers share the same service token.
Methods	include `POST`	The token endpoint uses POST.
Response condition	2xx only	Do not cache auth failures!

The crucial detail is the cache key.

Because every single business API in our setup used the exact same service credentials, the key could be completely static:

service-token-general

If your setup is more complex—say, you have different credentials per environment, per tenant, or per scope—you need to build that into the key:

service-token-{environment}-{backend}-{scope}

Never use a single static key if the token you get back differs based on the caller or the permission set.

Wiring Up the Business APIs

Before we made the change, the HTTP Callout URL looked like this:

HTTP Callout URL:
http://example.com/authorization-server/token

After the change, we simply pointed it to our new internal API:

HTTP Callout URL:
http://localhost:8082/loopback

Literally everything else stayed the same:

The same request method.
The same JSON body.
The same Content-Type.
The same JSONPath extraction rules.
The same Transform Headers step.

The extraction still looked exactly like this:

access_token = {#jsonPath(#calloutResponse.content, '$.access_token')}

And attaching the header to the backend request still looked like this:

Authorization: Bearer {#context.attributes['access_token']}

This is why the pattern was so safe to introduce. The business API didn’t need to learn a single thing about caching. It still asked for a token, and it still got a token response.

Prove It With Two Requests

The proof is easy to see.

Fire one request against your business API with a cold cache:

TTFB: 1.416641

Now, fire the exact same request again a few seconds later:

TTFB: 0.635187

Check the backend logs—the access token matched across both requests. The jti and issued-at claims were identical. That means the second request completely bypassed the auth server.

Two horizontal latency bars showing a 1416ms cold cache request and a 635ms warm cache request. — The first request paid the auth round-trip tax. The second request grabbed the cached response.

Failure Modes Decide Whether This Is Production-Ready

A loopback API is small, but it is still production architecture. Treat it that way.

Don’t cache failures

Make absolutely sure you only cache successful token responses.

If the auth server starts returning 401, 403, 429, or 500 errors, your loopback API should pass that error right back to the caller without storing it. If you accidentally cache a failure, you’ll turn a temporary five-second auth blip into an eight-hour total outage.

Keep your TTL shorter than token expiry

If your token is good for ten hours, don’t cache it for ten hours. Cache it for eight or nine. Setting a conservative expiry is so much cheaper than spending a week debugging weird, intermittent 401 errors from upstream servers.

Protect your internal endpoints

Your loopback API should only be reachable from the gateway itself or from trusted internal network paths. Remember: it’s a token endpoint now. Treat it like one, even if it’s “just running on localhost.”

Log your cache hits and misses

You need enough visibility to answer these questions at a glance:

Did the gateway actually contact the auth server?
How often is the cache missing?
Are cache misses clustering around your deploys or container restarts?
Are auth failures getting cached by accident?

Plan for multiple gateway nodes

If you’re running a cluster of gateway nodes, you need to know whether your cache is local to each node or distributed across them. A local cache is usually fine, but remember that each node will have to pay the “first miss” penalty to warm up its own memory.

Why This Works

The Loopback Pattern works because it stops fighting the platform.

The Cache policy wanted to cache and return a full HTTP response. The loopback API made that exact behavior incredibly useful.

The Groovy sandbox wanted to block us from hiding persistent state in memory. The loopback pattern completely avoids hidden persistence.

Our business APIs wanted to keep their existing HTTP Callout and header transformation logic. The loopback pattern let them keep it without changing a single line of JSONPath.

The result worked because it used boring machinery where it naturally belonged.

The Lesson

The best gateway fix often makes the abstraction you already have useful.

In this case, the gateway wanted to cache full responses. So we gave it a full response worth caching.

Stop Fighting the Gateway. Cache the Response It Wants.

Stop Trying to Build a Token Store

The Architecture Is Two APIs

Make the Loopback Boring

Cache Configuration Is a Security Decision

Wiring Up the Business APIs

Prove It With Two Requests

Failure Modes Decide Whether This Is Production-Ready

Don’t cache failures

Keep your TTL shorter than token expiry

Protect your internal endpoints

Log your cache hits and misses

Plan for multiple gateway nodes

Why This Works

The Lesson

Part of the "Gravitee API Caching" series

Related Articles

Is Your API Gateway Burning 800ms on Every Request?

Stop Trying to Build a Token Store

The Architecture Is Two APIs

Make the Loopback Boring

Cache Configuration Is a Security Decision

Wiring Up the Business APIs

Prove It With Two Requests

Failure Modes Decide Whether This Is Production-Ready

Don’t cache failures

Keep your TTL shorter than token expiry

Protect your internal endpoints

Log your cache hits and misses

Plan for multiple gateway nodes

Why This Works

The Lesson

Part of the "Gravitee API Caching" series

Related Articles

Is Your API Gateway Burning 800ms on Every Request?

Share Article