Vorel — Documentation

Vorel enforces rate limits at multiple layers — API key, tenant aggregate, per-tool, dashboard user, webhook ingress, per-customer-number. The most-restrictive applicable layer wins. Hitting a limit returns 429 rate_limited with standard Retry-After + X-RateLimit-* headers.

The full stack

Layer	Limit	Window	Bucket key	Where it fires
Per-API-key	200 req/min	60s	`apk:<key_id>`	Inside `v1Handler` wrapper, after auth verify
Per-Clerk-user dashboard	200 req/min	60s	`dashboard:<userId>`	Clerk middleware, after auth resolve
Per-tenant aggregate	5000 req/min	60s	`tenant:<tenantId>:total`	Across every authenticated surface in your tenant
Per-(tenant, agent-router)	1000 req/min	60s	`jwt:<tenantId>:<agentRouterSub>`	Agent dispatch chokepoint
Per-(tenant, tool)	50 req/min	60s	`jwt:<tenantId>:<toolSub>`	Each internal tool endpoint
Per-IP webhook	500 req/min	60s	`webhook:<source>:<ip>`	`/api/webhooks/whatsapp\|vapi\|clerk` (before signature verify)
Per-(tenant, customer-phone)	30 req/min	60s	`customer:<tenantId>:<phone>`	WhatsApp + voice inbound, post-payload-parse

Every layer uses the same Redis-backed fixed-window primitive (checkRateLimit in apps/web/src/lib/rate-limit.ts). Windows roll automatically — the bucket key embeds floor(now / window) so a fresh window allocates a fresh bucket.

Which layer your traffic hits

Public API calls (Bearer-authed)

A call to /api/v1/* with a valid API key passes through:

Per-API-key (200 req/min) — gated inside the v1Handler wrapper after auth.
Per-tenant aggregate (5000 req/min) — applied via the same path so a runaway script in one tenant can’t accumulate quota by issuing N keys.

Per-(tenant, tool) doesn’t apply to the public API today (those buckets are for agent-side internal tool calls). The dashboard and per-IP-webhook limits don’t apply either — different surfaces.

Dashboard pages (Clerk-authed)

A page render or server action under /(dashboard)/* passes through:

Per-Clerk-user dashboard (200 req/min) — applied in the Clerk middleware after the session resolves.

Tenant-aggregate doesn’t gate dashboard sessions today; the per-user gate is the primary floodguard for human navigation.

Inbound webhooks (signed)

WhatsApp / Vapi / Clerk webhooks pass through:

Per-IP webhook (500 req/min) — applied before signature verification, so a flood can’t make us pay for the verification cost on every request.
HMAC signature verification — reject 401 on mismatch.
Per-(tenant, customer-phone) (30 req/min) — applied after payload parse, so a single compromised customer number can’t burn through tenant quota.

Agent dispatch + tool routes (JWT-authed)

The router → sub-agent → tool flow uses short-TTL JWTs (5-min for tool calls, 2-min for worker calls):

Per-(tenant, agent-router) (1000 req/min) — gates the dispatch chokepoint.
Per-(tenant, tool) (50 req/min) — gates each internal tool endpoint after the agent router fans out.
Per-tenant aggregate (5000 req/min) — same global ceiling that public API + tool traffic both share.

Public-API calls and agent-side tool calls live in different buckets even when they hit the same tool — the bucket key embeds the JWT sub for agent calls and the API-key id for public calls.

Hitting a limit (the response)

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 12
X-RateLimit-Limit: 200
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714903260

{
  "error": {
    "code": "rate_limited",
    "message": "per-key rate limit exceeded"
  }
}

Retry-After — seconds until the current window rolls. Honour this; the limiter clamps to ≥1 so you don’t hot-loop.
X-RateLimit-Limit — the bucket’s limit (e.g. 200).
X-RateLimit-Remaining — count remaining in the current window (0 when limited).
X-RateLimit-Reset — UNIX epoch seconds when the current window rolls.

The body’s code field tells you which layer you hit:

'rate_limited' for the public-API surfaces.
The message carries the surface-specific detail (per-key rate limit exceeded, webhook ingress rate limit exceeded, etc.).

Fail-open posture (and why)

The limiter fails OPEN on Redis blips. A Redis outage admits the request rather than 429-ing every customer — better to serve traffic than to falsely block it. The trade-off is explicit: a sustained Redis outage means rate limits stop functioning. Mitigations:

Alarm on the rate_limit.redis_failed log spike. A blip is fine; a sustained spike pages the operator.
Defence in depth. The application-tier rate limit is one of three layers — Cloudflare WAF (per-IP global) and per-channel BSP-level limits (360dialog throttling, Vapi quotas) also apply. A Redis failure removes the application-tier gate but not the others.

This is the right call pre-customer; reconsider when traffic gets serious enough that “fail-open during Redis outage” is materially worse than “429 everyone during Redis outage.”

Honouring rate limits (client side)

When you get a 429:

Read Retry-After — honour it as the minimum back-off.
Don’t loop tighter than Retry-After. Add jitter so multiple clients don’t all retry at the same window-roll millisecond.
Keep separate buckets per resource. If you’re hammering /v1/conversations and getting 429’d, that doesn’t mean /v1/leads is rate-limited too — they share the per-API-key bucket but only count failures against you under that bucket.
For around-the-brain workflows (n8n nurture loops, post-booking nudges), Retry-After handling is built into the stock n8n HTTP node.

Plan-aware limits (planned)

The current numbers are platform-wide. Per-customer plan-based ceilings (raise the per-tenant aggregate to 20,000 req/min on Pro plans, etc.) are implemented as a parameter on the limit helpers but not customer-bound today (we don’t have customer plans yet pre-customer). When the billing model launches, these ceilings flex per-plan at the call site without code changes elsewhere.

What’s NOT enforced today

Per-resource-class quotas. No daily/monthly cap on “how many leads can your tenant create”; the per-key + tenant aggregate is the only ceiling.
Per-key burst credits. Fixed window only; no token-bucket burst allowance. A request at second :00 and one at :59 both count toward the same window’s 200.
Custom plans. No way to bump a single tenant’s per-API-key limit without a code change today.

API introduction — surface overview
Authentication — per-API-key issuance + scopes
Webhooks — inbound rate-limit specifics
Security overview — full rate-limit table

Overview

Conversations

Leads

Appointments

Offerings

CRM

Analytics

Rate limits

The full stack

Which layer your traffic hits

Hitting a limit (the response)

Fail-open posture (and why)

Honouring rate limits (client side)

Plan-aware limits (planned)

What’s NOT enforced today

Overview

Conversations

Leads

Appointments

Offerings

CRM

Analytics

Documentation Index

​The full stack

​Which layer your traffic hits

​Hitting a limit (the response)

​Fail-open posture (and why)

​Honouring rate limits (client side)

​Plan-aware limits (planned)

​What’s NOT enforced today

​Related docs

The full stack

Which layer your traffic hits

Hitting a limit (the response)

Fail-open posture (and why)

Honouring rate limits (client side)

Plan-aware limits (planned)

What’s NOT enforced today

Related docs