Vorel — Documentation

This page is for the technically-curious — you’re evaluating Vorel against in-house builds, building integrations against the API, or want to understand the failure modes before you deploy. If you’re a buyer who just wants the high-level pitch, the What is Vorel page is shorter.

The big picture

Every customer turn (a call connect, a WhatsApp inbound, a voice utterance) flows through the same five-stage pipeline:

ingress  →  conversation context  →  router  →  sub-agent  →  terminal

Ingress receives the customer event from a telephony / messaging vendor and verifies the signature.
Conversation context hydrates the conversation history, customer identity, vertical pack, and persona.
Router is a small LLM (Gemini 2.5 Flash-Lite) that classifies the customer’s intent into one of ~10 categories.
Sub-agent (qualification / FAQ / booking / handoff) runs the actual reply-generation loop with the tools it’s allowed to call.
Terminal persists the agent reply, runs guardrails (forbidden-phrase + hallucination), and ships the reply back to the customer.

Voice and chat share this entire pipeline — only the ingress and the terminal differ per channel.

The components

Web app

Next.js (App Router) on Railway. Hosts the operator console, the public API (/api/v1/*), the tool routes (/api/tools/*), the Vapi custom-LLM proxy (/api/vapi/*), and the inbound webhook receivers (/api/webhooks/*).

Workers

Standalone Node service running BullMQ queues. Processes inbound WhatsApp messages, end-of-call voice reports, outbound webhook dispatch, and right-to-erasure scrubs. Same git tree as the web app; separate Railway service so a slow background job doesn’t block a request thread.

Postgres

Single multi-tenant database. Every tenant-scoped table has a Row-Level Security policy gating reads + writes by current_setting('app.current_tenant_id'). The vorel_app Postgres role is not a superuser, so a forgotten SET LOCAL app.current_tenant_id returns zero rows — fail-closed.

Redis (BullMQ + rate limit)

BullMQ queue backbone for the workers, plus a fixed-window primitive for the rate-limit stack. Disk-encrypted on Railway.

Vapi (voice)

Voice orchestration vendor. Owns the SIP trunk + Deepgram transcription + ElevenLabs TTS, and forwards each LLM turn to our custom-LLM proxy (/api/vapi/chat/completions).

Telnyx (telephony)

DID provider + SIP carrier. We BYO the SIP trunk into Vapi for full SIP/SDP codec control on UAE cellular calls.

Gemini (LLMs)

gemini-2.5-flash for the agent dispatch + sub-agents; gemini-2.5-flash-lite for the router. Tool calling via the Gemini SDK. The voice path proxies through /api/vapi/chat/completions so the per-call LLM cost lands in billing_events for the cost rollup.

Cloudflare

DNS + TLS termination + 5 OWASP-default security headers + WAF for the public surfaces.

Voice flow, end to end

Customer phone → Telnyx DID → Telnyx SIP trunk → Vapi
                                                  ├─ Deepgram nova-3 (multi) — STT
                                                  ├─ ElevenLabs eleven_turbo_v2_5 — TTS
                                                  └─ Vorel custom-LLM proxy
                                                       ↓
                                                  /api/vapi/chat/completions
                                                       ↓
                                                  router → sub-agent → tool calls
                                                       ↓
                                                  reply text
                                                       ↓
                                                  ElevenLabs TTS → Vapi → caller

Three Vapi-side webhooks land at our app on every call:

assistant-request — fired by Vapi on call connect. We resolve the tenant by vapi_phone_number_id, build a per-tenant assistant config (persona-interpolated system prompt, tool ids, voice + transcriber), and return it within Vapi’s 7.5s budget. This is what makes per-tenant config possible without re-publishing to the Vapi dashboard on every persona edit.
/api/vapi/chat/completions — Vapi’s OpenAI-compatible proxy hits this on every LLM turn. We translate the request to Gemini, run the agent dispatch (router + sub-agent + tools), and stream the reply back as a chat.completion chunk stream.
end-of-call-report — fired by Vapi when the call ends. The worker processes the report, persists the transcript + recording URL + Vapi cost breakdown, runs the QA scoring pipeline, and emits the cost rollup events.

Chat flow, end to end

Customer WhatsApp → 360dialog Cloud → /api/webhooks/whatsapp (signature-verified)
                                          ↓
                                       BullMQ (message-processor queue)
                                          ↓
                                       worker:
                                          ├─ persists customer turn + upserts Customer
                                          └─ POSTs /api/internal/agent/dispatch
                                                  ↓
                                              router → sub-agent → tool calls
                                                  ↓
                                              send_whatsapp_message (writes outbox)

Inbound WhatsApp signatures are verified against the 360dialog HMAC; rate-limited per IP (500 req/min) before signature verification, then per (tenant, customer phone) (30 req/min) post-verification.

WhatsApp outbound send is paused. The send_whatsapp_message tool persists the agent reply into messages + writes the WhatsApp outbox row, but the actual 360dialog network send is mocked today. Real send re-activates once Meta Business Manager verification clears for your tenant (Phase 4b). Inbound + dashboard reply remain available throughout.

The router → sub-agent shape

The router is a single classification call against gemini-2.5-flash-lite with a short prompt. It outputs one of ~10 intent slugs:

greeting · faq · new_lead_inquiry · existing_lead_update · booking · reschedule_or_cancel · human_request · complaint · spam_or_unrelated · out_of_scope

Each intent maps to a sub-agent:

Intent	Sub-agent	Tools available
`new_lead_inquiry`, `existing_lead_update`	`qualification`	`search_offerings`, `update_lead`, `crm_lookup_customer`, `crm_update_record`, `request_handoff`
`booking`	`booking`	`check_availability`, `book_appointment`, `update_lead`, `request_handoff`
`reschedule_or_cancel`, `human_request`, `complaint`	`handoff`	`request_handoff`
`greeting`, `faq`, `spam_or_unrelated`, `out_of_scope`	`faq`	`get_faq_answer`, `search_offerings`, `crm_lookup_customer`, `request_handoff`

Sub-agent prompts live in handoff/prompts/*.md and interpolate the resolved persona + vertical pack at run-time. The agent runs a tool-call loop with Gemini until either a final text reply or a max-iteration cap is hit.

The tool layer

Tool routes live at /api/tools/<name> and are JWT-authed (5-min TTL signed via TOOL_JWT_SECRET). Every call goes through the same wrapper:

JWT verification + per-(tenant, tool) rate limit (50 req/min).
withTenantContext — opens a Postgres transaction with SET LOCAL app.current_tenant_id, so RLS is set before any query runs.
Tool body — handles its specific job (DB read, vector similarity over offerings / KB, CRM proxy call, calendar check, etc.).
tool_call + tool_result log lines are emitted with W3C trace-context propagation, so an operator can follow a single conversation across web + worker + the post-call QA pipeline.

The customer identity model

Cross-channel continuity binds on the customer’s E.164 phone number, not the channel. A customer who calls and later WhatsApps continues the same conversation thread because conversations.customer_identifier is the phone number; the next inbound surfaces the prior history regardless of channel. This also means no separate “voice account” + “WhatsApp account” for the same human. The customers table is the per-(tenant, phone) source of truth; conversations are children.

Storage layout (high-level)

Table	What it holds	Append-only?
`tenants`	One row per Vorel customer (tenant). Persona, vertical, working hours, handoff rules, guardrails.	No
`customers`	One row per (tenant, phone). Cross-channel identity anchor.	No
`conversations`	One row per customer thread. Channel + status + customer_identifier.	No
`messages`	Every turn of every conversation. Customer + agent + system. Append-only.	Yes — Postgres trigger
`leads`	Qualification state + attributes. Linked to a conversation.	No
`offerings`	Tenant catalog (properties / services / clinicians / menu slots). Vector-embedded.	No
`knowledge_base`	FAQ + policy entries. Vector-embedded.	No
`appointments`	Bookings. Linked to a customer + offering + assigned user.	No
`qa_evaluations`	Per-conversation QA scores from the post-call grading pipeline.	No
`audit_log`	Every operator-console read + every mutation. Append-only.	Yes — Postgres trigger
`billing_events`	Per-call cost-of-goods events (Vapi, Telnyx, Gemini) + chargeable events. Append-only.	Yes — Postgres trigger
`webhook_deliveries`	Per-attempt outbound webhook delivery records (status + response).	No
`tenant_credentials`	AES-256-GCM-encrypted CRM driver credentials. KEK from env-var.	No

QA scoring pipeline

After every call ends, a worker runs an LLM-graded QA pass against an 11-criterion rubric (greeting, intent capture, tool-call appropriateness, language match, handoff timing, etc.). The output (qa_evaluations row) carries:

A normalised score (0–11)
Per-criterion breakdown
Hallucination flags graded by a separate v1 grader (severity: low / medium / high)

The hallucination flags feed into the guardrails layer: depending on the tenant’s hallucination.threshold + hallucination.action, a flagged reply triggers a Sentry alert (warn) or routes to handoff (handoff).

Reliability posture

Failure modes we explicitly handle:

Gemini 5xx / DEADLINE_EXCEEDED — withGeminiRetry retries with backoff; a terminal failure inside the sub-agent dispatch falls through to a short ack (“One moment, please”) plus a fell_back: true flag the inbox surfaces.
Redis blip on rate limit — fail-open: a Redis outage admits the request rather than 429-ing every customer. Logged as rate_limit.redis_failed.
CRM driver 401 — OAuth drivers refresh + retry once; refreshed tokens get re-encrypted and persisted back into tenant_credentials (rotated_at updated). Refresh failures audit-log + return auth_error to the agent, which falls back to a polite “I’ll have someone reach out” rather than hallucinating CRM data.
Vendor outage on a hot path — see SLOs. Tool-route success rate (99.9%) and voice-call success rate (99.5%) are the two we measure most aggressively; webhook delivery rides a 6-attempt retry ladder over [60s, 300s, 1800s, 7200s, 43200s].

What’s NOT in the picture today

Per-region routing. tenants.region is captured at create time; routing on it (running EU tenants out of an EU Postgres replica) is on the roadmap.
Stripe billing. Cost rollup is captured in billing_events; invoicing is operator-driven from the rollup until the Stripe model lands.
Outbound voice (Vorel-initiated calls). Inbound only today.
Voicemail / fallback when LLM provider is down. A failed call disconnects.
Real n8n deployment. The around-the-brain templates exist on disk + the operator-side discoverability page lists them, but n8n is not yet deployed as a Railway service. Per-turn dispatch (router + sub-agent + tools) runs in TypeScript inside the web app, not in n8n.

Where to go next

Voice features

What the voice agent does, end-to-end.

Chat features

WhatsApp inbound + outbound posture.

API Reference

REST API + SDK for programmatic access.

Security overview

Multi-tenant isolation, encryption, SLOs.

Get started

Product

Verticals

Integrations

Security

How it works

The big picture

The components

Web app

Workers

Postgres

Redis (BullMQ + rate limit)

Vapi (voice)

Telnyx (telephony)

Gemini (LLMs)

Cloudflare

Voice flow, end to end

Chat flow, end to end

The router → sub-agent shape

The tool layer

The customer identity model

Storage layout (high-level)

QA scoring pipeline

Reliability posture

What’s NOT in the picture today

Where to go next

Voice features

Chat features

API Reference

Security overview

Get started

Product

Verticals

Integrations

Security

Documentation Index

​The big picture

​The components

Web app

Workers

Postgres

Redis (BullMQ + rate limit)

Vapi (voice)

Telnyx (telephony)

Gemini (LLMs)

Cloudflare

​Voice flow, end to end

​Chat flow, end to end

​The router → sub-agent shape

​The tool layer

​The customer identity model

​Storage layout (high-level)

​QA scoring pipeline

​Reliability posture

​What’s NOT in the picture today

​Where to go next

Voice features

Chat features

API Reference

Security overview

The big picture

The components

Voice flow, end to end

Chat flow, end to end

The router → sub-agent shape

The tool layer

The customer identity model

Storage layout (high-level)

QA scoring pipeline

Reliability posture

What’s NOT in the picture today

Where to go next