This page is for the technically-curious — you’re evaluating Vorel against in-house builds, building integrations against the API, or want to understand the failure modes before you deploy. If you’re a buyer who just wants the high-level pitch, the What is Vorel page is shorter.Documentation Index
Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt
Use this file to discover all available pages before exploring further.
The big picture
Every customer turn (a call connect, a WhatsApp inbound, a voice utterance) flows through the same five-stage pipeline:- Ingress receives the customer event from a telephony / messaging vendor and verifies the signature.
- Conversation context hydrates the conversation history, customer identity, vertical pack, and persona.
- Router is a small LLM (Gemini 2.5 Flash-Lite) that classifies the customer’s intent into one of ~10 categories.
- Sub-agent (qualification / FAQ / booking / handoff) runs the actual reply-generation loop with the tools it’s allowed to call.
- Terminal persists the agent reply, runs guardrails (forbidden-phrase + hallucination), and ships the reply back to the customer.
The components
Web app
Next.js (App Router) on Railway. Hosts the operator console, the public API (
/api/v1/*), the
tool routes (/api/tools/*), the Vapi custom-LLM proxy (/api/vapi/*), and the inbound
webhook receivers (/api/webhooks/*).Workers
Standalone Node service running BullMQ queues. Processes inbound WhatsApp messages, end-of-call
voice reports, outbound webhook dispatch, and right-to-erasure scrubs. Same git tree as the web
app; separate Railway service so a slow background job doesn’t block a request thread.
Postgres
Single multi-tenant database. Every tenant-scoped table has a Row-Level Security policy gating
reads + writes by
current_setting('app.current_tenant_id'). The vorel_app Postgres role is
not a superuser, so a forgotten SET LOCAL app.current_tenant_id returns zero rows — fail-closed.Redis (BullMQ + rate limit)
BullMQ queue backbone for the workers, plus a fixed-window primitive for the rate-limit stack.
Disk-encrypted on Railway.
Vapi (voice)
Voice orchestration vendor. Owns the SIP trunk + Deepgram transcription + ElevenLabs TTS, and
forwards each LLM turn to our custom-LLM proxy (
/api/vapi/chat/completions).Telnyx (telephony)
DID provider + SIP carrier. We BYO the SIP trunk into Vapi for full SIP/SDP codec control on
UAE cellular calls.
Gemini (LLMs)
gemini-2.5-flash for the agent dispatch + sub-agents; gemini-2.5-flash-lite for the router.
Tool calling via the Gemini SDK. The voice path proxies through /api/vapi/chat/completions
so the per-call LLM cost lands in billing_events for the cost rollup.Cloudflare
DNS + TLS termination + 5 OWASP-default security headers + WAF for the public surfaces.
Voice flow, end to end
assistant-request— fired by Vapi on call connect. We resolve the tenant byvapi_phone_number_id, build a per-tenant assistant config (persona-interpolated system prompt, tool ids, voice + transcriber), and return it within Vapi’s 7.5s budget. This is what makes per-tenant config possible without re-publishing to the Vapi dashboard on every persona edit./api/vapi/chat/completions— Vapi’s OpenAI-compatible proxy hits this on every LLM turn. We translate the request to Gemini, run the agent dispatch (router + sub-agent + tools), and stream the reply back as achat.completionchunk stream.end-of-call-report— fired by Vapi when the call ends. The worker processes the report, persists the transcript + recording URL + Vapi cost breakdown, runs the QA scoring pipeline, and emits the cost rollup events.
Chat flow, end to end
WhatsApp outbound send is paused. The
send_whatsapp_message tool persists the agent reply
into messages + writes the WhatsApp outbox row, but the actual 360dialog network send is mocked
today. Real send re-activates once Meta Business Manager verification clears for your tenant
(Phase 4b). Inbound + dashboard reply remain available throughout.The router → sub-agent shape
The router is a single classification call againstgemini-2.5-flash-lite with a short prompt. It outputs one of ~10 intent slugs:
greeting·faq·new_lead_inquiry·existing_lead_update·booking·reschedule_or_cancel·human_request·complaint·spam_or_unrelated·out_of_scope
| Intent | Sub-agent | Tools available |
|---|---|---|
new_lead_inquiry, existing_lead_update | qualification | search_offerings, update_lead, crm_lookup_customer, crm_update_record, request_handoff |
booking | booking | check_availability, book_appointment, update_lead, request_handoff |
reschedule_or_cancel, human_request, complaint | handoff | request_handoff |
greeting, faq, spam_or_unrelated, out_of_scope | faq | get_faq_answer, search_offerings, crm_lookup_customer, request_handoff |
handoff/prompts/*.md and interpolate the resolved persona + vertical pack at run-time. The agent runs a tool-call loop with Gemini until either a final text reply or a max-iteration cap is hit.
The tool layer
Tool routes live at/api/tools/<name> and are JWT-authed (5-min TTL signed via TOOL_JWT_SECRET). Every call goes through the same wrapper:
- JWT verification + per-(tenant, tool) rate limit (50 req/min).
withTenantContext— opens a Postgres transaction withSET LOCAL app.current_tenant_id, so RLS is set before any query runs.- Tool body — handles its specific job (DB read, vector similarity over offerings / KB, CRM proxy call, calendar check, etc.).
tool_call+tool_resultlog lines are emitted with W3C trace-context propagation, so an operator can follow a single conversation across web + worker + the post-call QA pipeline.
The customer identity model
Cross-channel continuity binds on the customer’s E.164 phone number, not the channel. A customer who calls and later WhatsApps continues the same conversation thread becauseconversations.customer_identifier is the phone number; the next inbound surfaces the prior history regardless of channel.
This also means no separate “voice account” + “WhatsApp account” for the same human. The customers table is the per-(tenant, phone) source of truth; conversations are children.
Storage layout (high-level)
| Table | What it holds | Append-only? |
|---|---|---|
tenants | One row per Vorel customer (tenant). Persona, vertical, working hours, handoff rules, guardrails. | No |
customers | One row per (tenant, phone). Cross-channel identity anchor. | No |
conversations | One row per customer thread. Channel + status + customer_identifier. | No |
messages | Every turn of every conversation. Customer + agent + system. Append-only. | Yes — Postgres trigger |
leads | Qualification state + attributes. Linked to a conversation. | No |
offerings | Tenant catalog (properties / services / clinicians / menu slots). Vector-embedded. | No |
knowledge_base | FAQ + policy entries. Vector-embedded. | No |
appointments | Bookings. Linked to a customer + offering + assigned user. | No |
qa_evaluations | Per-conversation QA scores from the post-call grading pipeline. | No |
audit_log | Every operator-console read + every mutation. Append-only. | Yes — Postgres trigger |
billing_events | Per-call cost-of-goods events (Vapi, Telnyx, Gemini) + chargeable events. Append-only. | Yes — Postgres trigger |
webhook_deliveries | Per-attempt outbound webhook delivery records (status + response). | No |
tenant_credentials | AES-256-GCM-encrypted CRM driver credentials. KEK from env-var. | No |
QA scoring pipeline
After every call ends, a worker runs an LLM-graded QA pass against an 11-criterion rubric (greeting, intent capture, tool-call appropriateness, language match, handoff timing, etc.). The output (qa_evaluations row) carries:
- A normalised score (0–11)
- Per-criterion breakdown
- Hallucination flags graded by a separate v1 grader (severity: low / medium / high)
hallucination.threshold + hallucination.action, a flagged reply triggers a Sentry alert (warn) or routes to handoff (handoff).
Reliability posture
Failure modes we explicitly handle:- Gemini 5xx / DEADLINE_EXCEEDED —
withGeminiRetryretries with backoff; a terminal failure inside the sub-agent dispatch falls through to a short ack (“One moment, please”) plus afell_back: trueflag the inbox surfaces. - Redis blip on rate limit — fail-open: a Redis outage admits the request rather than 429-ing every customer. Logged as
rate_limit.redis_failed. - CRM driver 401 — OAuth drivers refresh + retry once; refreshed tokens get re-encrypted and persisted back into
tenant_credentials(rotated_at updated). Refresh failures audit-log + returnauth_errorto the agent, which falls back to a polite “I’ll have someone reach out” rather than hallucinating CRM data. - Vendor outage on a hot path — see SLOs. Tool-route success rate (99.9%) and voice-call success rate (99.5%) are the two we measure most aggressively; webhook delivery rides a 6-attempt retry ladder over
[60s, 300s, 1800s, 7200s, 43200s].
What’s NOT in the picture today
- Per-region routing.
tenants.regionis captured at create time; routing on it (running EU tenants out of an EU Postgres replica) is on the roadmap. - Stripe billing. Cost rollup is captured in
billing_events; invoicing is operator-driven from the rollup until the Stripe model lands. - Outbound voice (Vorel-initiated calls). Inbound only today.
- Voicemail / fallback when LLM provider is down. A failed call disconnects.
- Real n8n deployment. The around-the-brain templates exist on disk + the operator-side discoverability page lists them, but n8n is not yet deployed as a Railway service. Per-turn dispatch (router + sub-agent + tools) runs in TypeScript inside the web app, not in n8n.
Where to go next
Voice features
What the voice agent does, end-to-end.
Chat features
WhatsApp inbound + outbound posture.
API Reference
REST API + SDK for programmatic access.
Security overview
Multi-tenant isolation, encryption, SLOs.