429 rate_limited with standard Retry-After + X-RateLimit-* headers.
The full stack
| Layer | Limit | Window | Bucket key | Where it fires |
|---|---|---|---|---|
| Per-API-key | 200 req/min | 60s | apk:<key_id> | Inside v1Handler wrapper, after auth verify |
| Per-dashboard-user | 200 req/min | 60s | dashboard:<userId> | Dashboard auth middleware, after auth resolve |
| Per-tenant aggregate | 5000 req/min | 60s | tenant:<tenantId>:total | Across every authenticated surface in your tenant |
| Per-(tenant, agent-router) | 1000 req/min | 60s | jwt:<tenantId>:<agentRouterSub> | Agent dispatch chokepoint |
| Per-(tenant, tool) | 50 req/min | 60s | jwt:<tenantId>:<toolSub> | Each internal tool endpoint |
| Per-IP webhook | 500 req/min | 60s | webhook:<source>:<ip> | /api/webhooks/whatsapp|voice|auth (before signature verify) |
| Per-(tenant, customer-phone) | 30 req/min | 60s | customer:<tenantId>:<phone> | WhatsApp + voice inbound, post-payload-parse |
checkRateLimit). Windows roll
automatically: the bucket key embeds floor(now / window) so a fresh window allocates a fresh
bucket.
Which layer your traffic hits
Public API calls (Bearer-authed)
Public API calls (Bearer-authed)
A call to
/api/v1/* with a valid API key passes through:- Per-API-key (200 req/min): gated inside the
v1Handlerwrapper after auth. - Per-tenant aggregate (5000 req/min): applied via the same path so a runaway script in one tenant can’t accumulate quota by issuing N keys.
Dashboard pages (authenticated session)
Dashboard pages (authenticated session)
A page render or server action under
/(dashboard)/* passes through:- Per-dashboard-user (200 req/min): applied in the dashboard auth middleware after the session resolves.
Inbound webhooks (signed)
Inbound webhooks (signed)
WhatsApp / voice / auth webhooks pass through:
- Per-IP webhook (500 req/min): applied before signature verification, so a flood can’t make us pay for the verification cost on every request.
- HMAC signature verification: reject 401 on mismatch.
- Per-(tenant, customer-phone) (30 req/min): applied after payload parse, so a single compromised customer number can’t burn through tenant quota.
Agent dispatch + tool routes (JWT-authed)
Agent dispatch + tool routes (JWT-authed)
The router → sub-agent → tool flow uses short-TTL JWTs (5-min for tool calls, 2-min for
worker calls):
- Per-(tenant, agent-router) (1000 req/min): gates the dispatch chokepoint.
- Per-(tenant, tool) (50 req/min): gates each internal tool endpoint after the agent router fans out.
- Per-tenant aggregate (5000 req/min): same global ceiling that public API + tool traffic both share.
Hitting a limit (the response)
Retry-After: seconds until the current window rolls. Honour this; the limiter clamps to ≥1 so you don’t hot-loop.X-RateLimit-Limit: the bucket’s limit (e.g.200).X-RateLimit-Remaining: count remaining in the current window (0when limited).X-RateLimit-Reset: UNIX epoch seconds when the current window rolls.
code field tells you which layer you hit:
'rate_limited'for the public-API surfaces.- The
messagecarries the surface-specific detail (per-key rate limit exceeded,webhook ingress rate limit exceeded, etc.).
Fail-open posture (and why)
The limiter fails OPEN on Redis blips. A Redis outage admits the request rather than 429-ing every customer: better to serve traffic than to falsely block it. The trade-off is explicit: a sustained Redis outage means rate limits stop functioning. Mitigations:- Alarm on the
rate_limit.redis_failedlog spike. A blip is fine; a sustained spike pages the operator. - Defence in depth. The application-tier rate limit is one of three layers: our edge WAF (per-IP global) and per-channel BSP-level limits (the WhatsApp provider’s throttling, voice transport quotas) also apply. A Redis failure removes the application-tier gate but not the others.
Honouring rate limits (client side)
When you get a429:
- Read
Retry-After. Honour it as the minimum back-off. - Don’t loop tighter than
Retry-After. Add jitter so multiple clients don’t all retry at the same window-roll millisecond. - Keep separate buckets per resource. If you’re hammering
/v1/conversationsand getting 429’d, that doesn’t mean/v1/leadsis rate-limited too; they share the per-API-key bucket but only count failures against you under that bucket. - For around-the-brain workflows (n8n nurture loops, post-booking nudges),
Retry-Afterhandling is built into the stock n8n HTTP node.
Plan-aware limits (planned)
The current numbers are platform-wide. Per-customer plan-based ceilings (raise the per-tenant aggregate to 20,000 req/min on Pro plans, etc.) are implemented as a parameter on the limit helpers but not customer-bound today (we don’t have customer plans yet pre-customer). When the billing model launches, these ceilings flex per-plan at the call site without code changes elsewhere.What’s NOT enforced today
- Per-resource-class quotas. No daily/monthly cap on “how many leads can your tenant create”; the per-key + tenant aggregate is the only ceiling.
- Per-key burst credits. Fixed window only; no token-bucket burst allowance. A request at
second
:00and one at:59both count toward the same window’s200. - Custom plans. No way to bump a single tenant’s per-API-key limit without a code change today.
Related docs
- API introduction: surface overview
- Authentication: per-API-key issuance + scopes
- Webhooks: inbound rate-limit specifics
- Security overview: full rate-limit table