> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate limits

> Layered rate-limit stack covering API keys, dashboard sessions, agent dispatch, tool routes, webhook ingress, and per-customer-number floods. Fixed-window Redis primitive; fail-open posture.

Vorel enforces rate limits at multiple layers: API key, tenant aggregate, per-tool, dashboard
user, webhook ingress, per-customer-number. The most-restrictive applicable layer wins. Hitting a
limit returns `429 rate_limited` with standard `Retry-After` + `X-RateLimit-*` headers.

## The full stack

| Layer                        | Limit            | Window | Bucket key                        | Where it fires                                                  |
| ---------------------------- | ---------------- | ------ | --------------------------------- | --------------------------------------------------------------- |
| Per-API-key                  | **200 req/min**  | 60s    | `apk:<key_id>`                    | Inside `v1Handler` wrapper, after auth verify                   |
| Per-dashboard-user           | **200 req/min**  | 60s    | `dashboard:<userId>`              | Dashboard auth middleware, after auth resolve                   |
| Per-tenant aggregate         | **5000 req/min** | 60s    | `tenant:<tenantId>:total`         | Across every authenticated surface in your tenant               |
| Per-(tenant, agent-router)   | **1000 req/min** | 60s    | `jwt:<tenantId>:<agentRouterSub>` | Agent dispatch chokepoint                                       |
| Per-(tenant, tool)           | **50 req/min**   | 60s    | `jwt:<tenantId>:<toolSub>`        | Each internal tool endpoint                                     |
| Per-IP webhook               | **500 req/min**  | 60s    | `webhook:<source>:<ip>`           | `/api/webhooks/whatsapp\|voice\|auth` (before signature verify) |
| Per-(tenant, customer-phone) | **30 req/min**   | 60s    | `customer:<tenantId>:<phone>`     | WhatsApp + voice inbound, post-payload-parse                    |

Every layer uses the same Redis-backed fixed-window primitive (`checkRateLimit`). Windows roll
automatically: the bucket key embeds `floor(now / window)` so a fresh window allocates a fresh
bucket.

## Which layer your traffic hits

<AccordionGroup>
  <Accordion icon="key" title="Public API calls (Bearer-authed)">
    A call to `/api/v1/*` with a valid API key passes through:

    1. **Per-API-key** (200 req/min): gated inside the `v1Handler` wrapper after auth.
    2. **Per-tenant aggregate** (5000 req/min): applied via the same path so a runaway script
       in one tenant can't accumulate quota by issuing N keys.

    Per-(tenant, tool) doesn't apply to the public API today (those buckets are for
    agent-side internal tool calls). The dashboard and per-IP-webhook limits don't apply either;
    different surfaces.
  </Accordion>

  <Accordion icon="user" title="Dashboard pages (authenticated session)">
    A page render or server action under `/(dashboard)/*` passes through:

    1. **Per-dashboard-user** (200 req/min): applied in the dashboard auth middleware after the
       session resolves.

    Tenant-aggregate doesn't gate dashboard sessions today; the per-user gate is the primary
    floodguard for human navigation.
  </Accordion>

  <Accordion icon="webhook" title="Inbound webhooks (signed)">
    WhatsApp / voice / auth webhooks pass through:

    1. **Per-IP webhook** (500 req/min): applied **before** signature verification, so a flood
       can't make us pay for the verification cost on every request.
    2. **HMAC signature verification**: reject 401 on mismatch.
    3. **Per-(tenant, customer-phone)** (30 req/min): applied **after** payload parse, so a
       single compromised customer number can't burn through tenant quota.
  </Accordion>

  <Accordion icon="bolt" title="Agent dispatch + tool routes (JWT-authed)">
    The router → sub-agent → tool flow uses short-TTL JWTs (5-min for tool calls, 2-min for
    worker calls):

    1. **Per-(tenant, agent-router)** (1000 req/min): gates the dispatch chokepoint.
    2. **Per-(tenant, tool)** (50 req/min): gates each internal tool endpoint after the agent
       router fans out.
    3. **Per-tenant aggregate** (5000 req/min): same global ceiling that public API + tool
       traffic both share.

    Public-API calls and agent-side tool calls live in different buckets even when they hit the
    same tool: the bucket key embeds the JWT sub for agent calls and the API-key id for public
    calls.
  </Accordion>
</AccordionGroup>

## Hitting a limit (the response)

```http theme={null}
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 12
X-RateLimit-Limit: 200
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714903260

{
  "error": {
    "code": "rate_limited",
    "message": "per-key rate limit exceeded"
  }
}
```

* **`Retry-After`**: seconds until the current window rolls. Honour this; the limiter clamps
  to ≥1 so you don't hot-loop.
* **`X-RateLimit-Limit`**: the bucket's limit (e.g. `200`).
* **`X-RateLimit-Remaining`**: count remaining in the current window (`0` when limited).
* **`X-RateLimit-Reset`**: UNIX epoch seconds when the current window rolls.

The body's `code` field tells you **which** layer you hit:

* `'rate_limited'` for the public-API surfaces.
* The `message` carries the surface-specific detail (`per-key rate limit exceeded`,
  `webhook ingress rate limit exceeded`, etc.).

## Fail-open posture (and why)

**The limiter fails OPEN on Redis blips.** A Redis outage admits the request rather than 429-ing
every customer: better to serve traffic than to falsely block it.

The trade-off is explicit: a sustained Redis outage means rate limits stop functioning. Mitigations:

* **Alarm on the `rate_limit.redis_failed` log spike.** A blip is fine; a sustained spike pages
  the operator.
* **Defence in depth.** The application-tier rate limit is one of three layers: our edge WAF
  (per-IP global) and per-channel BSP-level limits (the WhatsApp provider's throttling, voice
  transport quotas) also apply. A Redis failure removes the application-tier gate but not the
  others.

This is the right call pre-customer; reconsider when traffic gets serious enough that "fail-open
during Redis outage" is materially worse than "429 everyone during Redis outage."

## Honouring rate limits (client side)

When you get a `429`:

1. **Read `Retry-After`.** Honour it as the minimum back-off.
2. **Don't loop tighter than `Retry-After`.** Add jitter so multiple clients don't all retry at
   the same window-roll millisecond.
3. **Keep separate buckets per resource.** If you're hammering `/v1/conversations` and getting
   429'd, that doesn't mean `/v1/leads` is rate-limited too; they share the per-API-key bucket
   but only count failures against you under that bucket.
4. **For around-the-brain workflows** (n8n nurture loops, post-booking nudges), `Retry-After`
   handling is built into the stock n8n HTTP node.

## Plan-aware limits (planned)

The current numbers are platform-wide. Per-customer plan-based ceilings (raise the per-tenant
aggregate to 20,000 req/min on Pro plans, etc.) are implemented as a parameter on the limit
helpers but not customer-bound today (we don't have customer plans yet pre-customer). When the
billing model launches, these ceilings flex per-plan at the call site without code changes
elsewhere.

## What's NOT enforced today

* **Per-resource-class quotas.** No daily/monthly cap on "how many leads can your tenant create";
  the per-key + tenant aggregate is the only ceiling.
* **Per-key burst credits.** Fixed window only; no token-bucket burst allowance. A request at
  second `:00` and one at `:59` both count toward the same window's `200`.
* **Custom plans.** No way to bump a single tenant's per-API-key limit without a code change today.

## Related docs

* [API introduction](/api-reference/introduction): surface overview
* [Authentication](/api-reference/authentication): per-API-key issuance + scopes
* [Webhooks](/api-reference/webhooks): inbound rate-limit specifics
* [Security overview](/security/overview): full rate-limit table

{/* verified-against: apps/web/src/lib/rate-limit.ts rateLimitWebhookByIp (limit=500), rateLimitTenantTotal (limit=5000), rateLimitByJwtSub (caller-provided), rateLimitByCustomerNumber (limit=30) */}

{/* verified-against: apps/web/src/middleware.ts dashboard:<userId> rate limit 200 req/min */}
