FCA register API rate limit explained, and what to do when you hit it

25 April 2026

The FCA Financial Services Register API is free, useful, and rate-limited at 50 requests per 10 seconds per developer key. The official documentation mentions this limit in passing and offers no further detail. In practice the limit is sliding, not bucketed, and breaching it returns an HTTP 429 with a vague message and no Retry-After header.

For the small workloads typical of an early integration (handful of lookups per minute), the limit never fires. As soon as you start backfilling, doing bulk verification at onboarding, or polling for change detection, it fires constantly. This post is a working reference for what the limit actually does and four patterns for designing around it.

The exact behaviour

Through trial and error against the live API, the limit behaves as follows:

The window is rolling, not aligned to a clock boundary. If you make 50 requests at 10:00:00.000 and one more at 10:00:09.999, you get the 51st rejected. Wait until 10:00:10.001 and you can make another.
The window is per developer key, not per IP. Rotating IPs does nothing.
Rejections come back as HTTP 429 with a JSON body of {"Status":"FSR-API-02-04-32","ResponseMessage":"...","Data":[]}. The status code is reliable; the message string varies.
There is no Retry-After header. The smart move is to assume one second and back off exponentially from there.
Sustained breaches do not appear to escalate to a longer ban; you can resume immediately once the window clears. This is unusual and pleasant.

Strategy 1: cap your client to 4 requests per second

The simplest design is to never send more than 4 requests per second. This sits comfortably under 50 per 10s without needing to track the rolling window precisely. A token-bucket implementation in 20 lines:

class TokenBucket {
  private tokens: number;
  private lastRefill = Date.now();
  constructor(private readonly capacity: number, private readonly refillPerSec: number) {
    this.tokens = capacity;
  }
  async take() {
    while (true) {
      const now = Date.now();
      const elapsed = (now - this.lastRefill) / 1000;
      this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillPerSec);
      this.lastRefill = now;
      if (this.tokens >= 1) { this.tokens -= 1; return; }
      await new Promise(r => setTimeout(r, 250));
    }
  }
}

const bucket = new TokenBucket(8, 4); // burst 8, sustained 4/sec

async function call(url: string) {
  await bucket.take();
  return fetch(url, { headers: authHeaders });
}

This is enough for any use case below a few thousand lookups per hour, with no special error handling required because rejections never happen.

Strategy 2: cache aggressively

Most FCA lookups in production are repeated. The same firm is checked at onboarding, again on the next transaction, again at quarterly review. A 5-minute response cache can cut upstream calls by 90% or more in any workload that has any repeat structure at all.

async function getFirmCached(frn: string) {
  const cached = await cache.get(`firm:${frn}`);
  if (cached && Date.now() - cached.ts < 5 * 60 * 1000) {
    return cached.data;
  }
  const fresh = await call(`${FCA_BASE}/Firm/${frn}`);
  await cache.set(`firm:${frn}`, { data: fresh, ts: Date.now() });
  return fresh;
}

The trade-off: if you generate compliance evidence (audit certificates, signed confirmations), the timestamp on those records reflects when you fetched, not when you cached. Bypass the cache for evidence generation; use it freely for routine lookups.

Strategy 3: handle 429 with bounded retry

If you cannot guarantee your client respects the limit (multi-process worker pools, third-party libraries you do not control), accept that 429s will happen and design for them:

async function callWithRetry(url: string, maxAttempts = 5) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const r = await fetch(url, { headers: authHeaders });
    if (r.status !== 429) return r;
    // Back off: 1s, 2s, 4s, 8s, 16s
    const wait = Math.min(16000, 1000 * 2 ** attempt);
    await new Promise(res => setTimeout(res, wait));
  }
  throw new Error('Rate-limited after retries');
}

Two notes: cap retries (the example uses 5) so a stuck loop does not pile up; and add jitter if you have multiple worker processes hitting the same limit, otherwise they all retry in lockstep and breach again.

Strategy 4: bulk pattern with deferred fan-out

For large batch jobs (verifying every counterparty on a books-to-records reconciliation, for instance), neither caching nor retry helps because the requests are unique. The pattern that works:

Submit the batch to a queue (SQS, Cloudflare Queues, plain Redis list).
A small pool of workers consumes the queue with the rate-limit honoured globally across them, using a shared token bucket in Redis or DynamoDB.
Workers write results to a per-batch results table.
Caller polls or webhooks on completion.

The maths: at 4 requests per second you complete 14,400 lookups per hour. For most batch jobs this is fine. For larger jobs, you split across multiple developer keys; the FCA does not penalise this and it is the closest thing to an officially supported pattern.

What you cannot do

Two things people try and they do not work:

Rotate keys per request. Each key has its own 50/10s window, so two keys give 100/10s, but each request from a key counts against that key. You cannot get above 50/10s per key by rotating; you have to actually distribute the load across keys.
Use unauthenticated endpoints. The web search at register.fca.org.uk is not API-callable in a way that survives any meaningful volume. The web layer rate-limits more aggressively, returns HTML, and changes structure without notice.

When to outsource the rate limit entirely

If you find your team writing token buckets, retry loops, and key-rotation logic instead of features, that is a signal. The FCA limit is fixed; the only way to genuinely exceed it is to share a pool of keys across many customers and amortise. That is what an aggregator does, and our FCA Verification API is one. We pool requests across our customers, hold a higher effective ceiling, cache responses for 5 minutes by default, and present a single endpoint with a stable rate limit you set in your tier. If you stay below a few hundred lookups per hour the four strategies above will serve you. Beyond that, paying someone else to hold the bucket is usually cheaper than maintaining the buckets yourself.