Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt

Use this file to discover all available pages before exploring further.

Vorel enforces rate limits at multiple layers — API key, tenant aggregate, per-tool, dashboard user, webhook ingress, per-customer-number. The most-restrictive applicable layer wins. Hitting a limit returns 429 rate_limited with standard Retry-After + X-RateLimit-* headers.

The full stack

LayerLimitWindowBucket keyWhere it fires
Per-API-key200 req/min60sapk:<key_id>Inside v1Handler wrapper, after auth verify
Per-Clerk-user dashboard200 req/min60sdashboard:<userId>Clerk middleware, after auth resolve
Per-tenant aggregate5000 req/min60stenant:<tenantId>:totalAcross every authenticated surface in your tenant
Per-(tenant, agent-router)1000 req/min60sjwt:<tenantId>:<agentRouterSub>Agent dispatch chokepoint
Per-(tenant, tool)50 req/min60sjwt:<tenantId>:<toolSub>Each internal tool endpoint
Per-IP webhook500 req/min60swebhook:<source>:<ip>/api/webhooks/whatsapp|vapi|clerk (before signature verify)
Per-(tenant, customer-phone)30 req/min60scustomer:<tenantId>:<phone>WhatsApp + voice inbound, post-payload-parse
Every layer uses the same Redis-backed fixed-window primitive (checkRateLimit in apps/web/src/lib/rate-limit.ts). Windows roll automatically — the bucket key embeds floor(now / window) so a fresh window allocates a fresh bucket.

Which layer your traffic hits

A call to /api/v1/* with a valid API key passes through:
  1. Per-API-key (200 req/min) — gated inside the v1Handler wrapper after auth.
  2. Per-tenant aggregate (5000 req/min) — applied via the same path so a runaway script in one tenant can’t accumulate quota by issuing N keys.
Per-(tenant, tool) doesn’t apply to the public API today (those buckets are for agent-side internal tool calls). The dashboard and per-IP-webhook limits don’t apply either — different surfaces.
A page render or server action under /(dashboard)/* passes through:
  1. Per-Clerk-user dashboard (200 req/min) — applied in the Clerk middleware after the session resolves.
Tenant-aggregate doesn’t gate dashboard sessions today; the per-user gate is the primary floodguard for human navigation.
WhatsApp / Vapi / Clerk webhooks pass through:
  1. Per-IP webhook (500 req/min) — applied before signature verification, so a flood can’t make us pay for the verification cost on every request.
  2. HMAC signature verification — reject 401 on mismatch.
  3. Per-(tenant, customer-phone) (30 req/min) — applied after payload parse, so a single compromised customer number can’t burn through tenant quota.
The router → sub-agent → tool flow uses short-TTL JWTs (5-min for tool calls, 2-min for worker calls):
  1. Per-(tenant, agent-router) (1000 req/min) — gates the dispatch chokepoint.
  2. Per-(tenant, tool) (50 req/min) — gates each internal tool endpoint after the agent router fans out.
  3. Per-tenant aggregate (5000 req/min) — same global ceiling that public API + tool traffic both share.
Public-API calls and agent-side tool calls live in different buckets even when they hit the same tool — the bucket key embeds the JWT sub for agent calls and the API-key id for public calls.

Hitting a limit (the response)

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 12
X-RateLimit-Limit: 200
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714903260

{
  "error": {
    "code": "rate_limited",
    "message": "per-key rate limit exceeded"
  }
}
  • Retry-After — seconds until the current window rolls. Honour this; the limiter clamps to ≥1 so you don’t hot-loop.
  • X-RateLimit-Limit — the bucket’s limit (e.g. 200).
  • X-RateLimit-Remaining — count remaining in the current window (0 when limited).
  • X-RateLimit-Reset — UNIX epoch seconds when the current window rolls.
The body’s code field tells you which layer you hit:
  • 'rate_limited' for the public-API surfaces.
  • The message carries the surface-specific detail (per-key rate limit exceeded, webhook ingress rate limit exceeded, etc.).

Fail-open posture (and why)

The limiter fails OPEN on Redis blips. A Redis outage admits the request rather than 429-ing every customer — better to serve traffic than to falsely block it. The trade-off is explicit: a sustained Redis outage means rate limits stop functioning. Mitigations:
  • Alarm on the rate_limit.redis_failed log spike. A blip is fine; a sustained spike pages the operator.
  • Defence in depth. The application-tier rate limit is one of three layers — Cloudflare WAF (per-IP global) and per-channel BSP-level limits (360dialog throttling, Vapi quotas) also apply. A Redis failure removes the application-tier gate but not the others.
This is the right call pre-customer; reconsider when traffic gets serious enough that “fail-open during Redis outage” is materially worse than “429 everyone during Redis outage.”

Honouring rate limits (client side)

When you get a 429:
  1. Read Retry-After — honour it as the minimum back-off.
  2. Don’t loop tighter than Retry-After. Add jitter so multiple clients don’t all retry at the same window-roll millisecond.
  3. Keep separate buckets per resource. If you’re hammering /v1/conversations and getting 429’d, that doesn’t mean /v1/leads is rate-limited too — they share the per-API-key bucket but only count failures against you under that bucket.
  4. For around-the-brain workflows (n8n nurture loops, post-booking nudges), Retry-After handling is built into the stock n8n HTTP node.

Plan-aware limits (planned)

The current numbers are platform-wide. Per-customer plan-based ceilings (raise the per-tenant aggregate to 20,000 req/min on Pro plans, etc.) are implemented as a parameter on the limit helpers but not customer-bound today (we don’t have customer plans yet pre-customer). When the billing model launches, these ceilings flex per-plan at the call site without code changes elsewhere.

What’s NOT enforced today

  • Per-resource-class quotas. No daily/monthly cap on “how many leads can your tenant create”; the per-key + tenant aggregate is the only ceiling.
  • Per-key burst credits. Fixed window only; no token-bucket burst allowance. A request at second :00 and one at :59 both count toward the same window’s 200.
  • Custom plans. No way to bump a single tenant’s per-API-key limit without a code change today.