Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt

Use this file to discover all available pages before exploring further.

This page is for the technically-curious — you’re evaluating Vorel against in-house builds, building integrations against the API, or want to understand the failure modes before you deploy. If you’re a buyer who just wants the high-level pitch, the What is Vorel page is shorter.

The big picture

Every customer turn (a call connect, a WhatsApp inbound, a voice utterance) flows through the same five-stage pipeline:
ingress  →  conversation context  →  router  →  sub-agent  →  terminal
  • Ingress receives the customer event from a telephony / messaging vendor and verifies the signature.
  • Conversation context hydrates the conversation history, customer identity, vertical pack, and persona.
  • Router is a small LLM (Gemini 2.5 Flash-Lite) that classifies the customer’s intent into one of ~10 categories.
  • Sub-agent (qualification / FAQ / booking / handoff) runs the actual reply-generation loop with the tools it’s allowed to call.
  • Terminal persists the agent reply, runs guardrails (forbidden-phrase + hallucination), and ships the reply back to the customer.
Voice and chat share this entire pipeline — only the ingress and the terminal differ per channel.

The components

Web app

Next.js (App Router) on Railway. Hosts the operator console, the public API (/api/v1/*), the tool routes (/api/tools/*), the Vapi custom-LLM proxy (/api/vapi/*), and the inbound webhook receivers (/api/webhooks/*).

Workers

Standalone Node service running BullMQ queues. Processes inbound WhatsApp messages, end-of-call voice reports, outbound webhook dispatch, and right-to-erasure scrubs. Same git tree as the web app; separate Railway service so a slow background job doesn’t block a request thread.

Postgres

Single multi-tenant database. Every tenant-scoped table has a Row-Level Security policy gating reads + writes by current_setting('app.current_tenant_id'). The vorel_app Postgres role is not a superuser, so a forgotten SET LOCAL app.current_tenant_id returns zero rows — fail-closed.

Redis (BullMQ + rate limit)

BullMQ queue backbone for the workers, plus a fixed-window primitive for the rate-limit stack. Disk-encrypted on Railway.

Vapi (voice)

Voice orchestration vendor. Owns the SIP trunk + Deepgram transcription + ElevenLabs TTS, and forwards each LLM turn to our custom-LLM proxy (/api/vapi/chat/completions).

Telnyx (telephony)

DID provider + SIP carrier. We BYO the SIP trunk into Vapi for full SIP/SDP codec control on UAE cellular calls.

Gemini (LLMs)

gemini-2.5-flash for the agent dispatch + sub-agents; gemini-2.5-flash-lite for the router. Tool calling via the Gemini SDK. The voice path proxies through /api/vapi/chat/completions so the per-call LLM cost lands in billing_events for the cost rollup.

Cloudflare

DNS + TLS termination + 5 OWASP-default security headers + WAF for the public surfaces.

Voice flow, end to end

Customer phone → Telnyx DID → Telnyx SIP trunk → Vapi
                                                  ├─ Deepgram nova-3 (multi) — STT
                                                  ├─ ElevenLabs eleven_turbo_v2_5 — TTS
                                                  └─ Vorel custom-LLM proxy

                                                  /api/vapi/chat/completions

                                                  router → sub-agent → tool calls

                                                  reply text

                                                  ElevenLabs TTS → Vapi → caller
Three Vapi-side webhooks land at our app on every call:
  1. assistant-request — fired by Vapi on call connect. We resolve the tenant by vapi_phone_number_id, build a per-tenant assistant config (persona-interpolated system prompt, tool ids, voice + transcriber), and return it within Vapi’s 7.5s budget. This is what makes per-tenant config possible without re-publishing to the Vapi dashboard on every persona edit.
  2. /api/vapi/chat/completions — Vapi’s OpenAI-compatible proxy hits this on every LLM turn. We translate the request to Gemini, run the agent dispatch (router + sub-agent + tools), and stream the reply back as a chat.completion chunk stream.
  3. end-of-call-report — fired by Vapi when the call ends. The worker processes the report, persists the transcript + recording URL + Vapi cost breakdown, runs the QA scoring pipeline, and emits the cost rollup events.

Chat flow, end to end

Customer WhatsApp → 360dialog Cloud → /api/webhooks/whatsapp (signature-verified)

                                       BullMQ (message-processor queue)

                                       worker:
                                          ├─ persists customer turn + upserts Customer
                                          └─ POSTs /api/internal/agent/dispatch

                                              router → sub-agent → tool calls

                                              send_whatsapp_message (writes outbox)
Inbound WhatsApp signatures are verified against the 360dialog HMAC; rate-limited per IP (500 req/min) before signature verification, then per (tenant, customer phone) (30 req/min) post-verification.
WhatsApp outbound send is paused. The send_whatsapp_message tool persists the agent reply into messages + writes the WhatsApp outbox row, but the actual 360dialog network send is mocked today. Real send re-activates once Meta Business Manager verification clears for your tenant (Phase 4b). Inbound + dashboard reply remain available throughout.

The router → sub-agent shape

The router is a single classification call against gemini-2.5-flash-lite with a short prompt. It outputs one of ~10 intent slugs:
  • greeting · faq · new_lead_inquiry · existing_lead_update · booking · reschedule_or_cancel · human_request · complaint · spam_or_unrelated · out_of_scope
Each intent maps to a sub-agent:
IntentSub-agentTools available
new_lead_inquiry, existing_lead_updatequalificationsearch_offerings, update_lead, crm_lookup_customer, crm_update_record, request_handoff
bookingbookingcheck_availability, book_appointment, update_lead, request_handoff
reschedule_or_cancel, human_request, complainthandoffrequest_handoff
greeting, faq, spam_or_unrelated, out_of_scopefaqget_faq_answer, search_offerings, crm_lookup_customer, request_handoff
Sub-agent prompts live in handoff/prompts/*.md and interpolate the resolved persona + vertical pack at run-time. The agent runs a tool-call loop with Gemini until either a final text reply or a max-iteration cap is hit.

The tool layer

Tool routes live at /api/tools/<name> and are JWT-authed (5-min TTL signed via TOOL_JWT_SECRET). Every call goes through the same wrapper:
  1. JWT verification + per-(tenant, tool) rate limit (50 req/min).
  2. withTenantContext — opens a Postgres transaction with SET LOCAL app.current_tenant_id, so RLS is set before any query runs.
  3. Tool body — handles its specific job (DB read, vector similarity over offerings / KB, CRM proxy call, calendar check, etc.).
  4. tool_call + tool_result log lines are emitted with W3C trace-context propagation, so an operator can follow a single conversation across web + worker + the post-call QA pipeline.

The customer identity model

Cross-channel continuity binds on the customer’s E.164 phone number, not the channel. A customer who calls and later WhatsApps continues the same conversation thread because conversations.customer_identifier is the phone number; the next inbound surfaces the prior history regardless of channel. This also means no separate “voice account” + “WhatsApp account” for the same human. The customers table is the per-(tenant, phone) source of truth; conversations are children.

Storage layout (high-level)

TableWhat it holdsAppend-only?
tenantsOne row per Vorel customer (tenant). Persona, vertical, working hours, handoff rules, guardrails.No
customersOne row per (tenant, phone). Cross-channel identity anchor.No
conversationsOne row per customer thread. Channel + status + customer_identifier.No
messagesEvery turn of every conversation. Customer + agent + system. Append-only.Yes — Postgres trigger
leadsQualification state + attributes. Linked to a conversation.No
offeringsTenant catalog (properties / services / clinicians / menu slots). Vector-embedded.No
knowledge_baseFAQ + policy entries. Vector-embedded.No
appointmentsBookings. Linked to a customer + offering + assigned user.No
qa_evaluationsPer-conversation QA scores from the post-call grading pipeline.No
audit_logEvery operator-console read + every mutation. Append-only.Yes — Postgres trigger
billing_eventsPer-call cost-of-goods events (Vapi, Telnyx, Gemini) + chargeable events. Append-only.Yes — Postgres trigger
webhook_deliveriesPer-attempt outbound webhook delivery records (status + response).No
tenant_credentialsAES-256-GCM-encrypted CRM driver credentials. KEK from env-var.No

QA scoring pipeline

After every call ends, a worker runs an LLM-graded QA pass against an 11-criterion rubric (greeting, intent capture, tool-call appropriateness, language match, handoff timing, etc.). The output (qa_evaluations row) carries:
  • A normalised score (0–11)
  • Per-criterion breakdown
  • Hallucination flags graded by a separate v1 grader (severity: low / medium / high)
The hallucination flags feed into the guardrails layer: depending on the tenant’s hallucination.threshold + hallucination.action, a flagged reply triggers a Sentry alert (warn) or routes to handoff (handoff).

Reliability posture

Failure modes we explicitly handle:
  • Gemini 5xx / DEADLINE_EXCEEDEDwithGeminiRetry retries with backoff; a terminal failure inside the sub-agent dispatch falls through to a short ack (“One moment, please”) plus a fell_back: true flag the inbox surfaces.
  • Redis blip on rate limit — fail-open: a Redis outage admits the request rather than 429-ing every customer. Logged as rate_limit.redis_failed.
  • CRM driver 401 — OAuth drivers refresh + retry once; refreshed tokens get re-encrypted and persisted back into tenant_credentials (rotated_at updated). Refresh failures audit-log + return auth_error to the agent, which falls back to a polite “I’ll have someone reach out” rather than hallucinating CRM data.
  • Vendor outage on a hot path — see SLOs. Tool-route success rate (99.9%) and voice-call success rate (99.5%) are the two we measure most aggressively; webhook delivery rides a 6-attempt retry ladder over [60s, 300s, 1800s, 7200s, 43200s].

What’s NOT in the picture today

  • Per-region routing. tenants.region is captured at create time; routing on it (running EU tenants out of an EU Postgres replica) is on the roadmap.
  • Stripe billing. Cost rollup is captured in billing_events; invoicing is operator-driven from the rollup until the Stripe model lands.
  • Outbound voice (Vorel-initiated calls). Inbound only today.
  • Voicemail / fallback when LLM provider is down. A failed call disconnects.
  • Real n8n deployment. The around-the-brain templates exist on disk + the operator-side discoverability page lists them, but n8n is not yet deployed as a Railway service. Per-turn dispatch (router + sub-agent + tools) runs in TypeScript inside the web app, not in n8n.

Where to go next

Voice features

What the voice agent does, end-to-end.

Chat features

WhatsApp inbound + outbound posture.

API Reference

REST API + SDK for programmatic access.

Security overview

Multi-tenant isolation, encryption, SLOs.