Skip to main content
This page is for the technically-curious: you’re evaluating Vorel against in-house builds, building integrations against the API, or want to understand the failure modes before you deploy. If you’re a buyer who just wants the high-level pitch, the What is Vorel page is shorter.

The big picture

Every customer turn (a call connect, a WhatsApp inbound, a voice utterance) flows through the same five-stage pipeline:
ingress  →  conversation context  →  router  →  sub-agent  →  terminal
  • Ingress receives the customer event from a telephony / messaging provider and verifies the signature.
  • Conversation context hydrates the conversation history, customer identity, vertical pack, and persona.
  • Router is a small, fast LLM that classifies the customer’s intent into one of ~10 categories.
  • Sub-agent (qualification / FAQ / booking / handoff) runs the actual reply-generation loop with the tools it’s allowed to call.
  • Terminal persists the agent reply, runs guardrails (forbidden-phrase + hallucination), and ships the reply back to the customer.
Voice and chat share this entire pipeline; only the ingress and the terminal differ per channel.

The components

Web app

Next.js (App Router) on our managed hosting. Hosts the operator console, the public API (/api/v1/*), the tool routes (/api/tools/*), the per-turn voice dispatch endpoint (/api/voice/dispatch-turn), and the inbound webhook receivers (/api/webhooks/*). The agent brain (router, sub-agents, tools, LLM loop) runs here, in-process.

Voice bridge

A standalone WebSocket service that terminates the live telephony media stream, runs speech-to-text and text-to-speech, and drives each call turn by calling the web app’s dispatch endpoint over HTTP/SSE. It’s process-isolated from the web app so a voice-pipeline issue can’t take down the console or the API.

Workers

Standalone Node service running a background-job queue. Processes inbound WhatsApp messages, end-of-call voice reports, outbound webhook dispatch, and right-to-erasure scrubs. Same git tree as the web app; separate service so a slow background job doesn’t block a request thread.

Postgres

Single multi-tenant database. Every tenant-scoped table has a Row-Level Security policy gating reads + writes by current_setting('app.current_tenant_id'). The vorel_app Postgres role is not a superuser, so a forgotten SET LOCAL app.current_tenant_id returns zero rows: fail-closed.

Redis (queues + rate limit)

Job-queue backbone for the workers, plus a fixed-window primitive for the rate-limit stack, plus per-call dispatch context. Disk-encrypted at rest.

Our AI stack

All language-model calls go through a vendor-neutral provider abstraction (intent router, the sub-agent reply loop, embeddings for semantic search). The provider is selected per tenant / vertical / sub-agent, so a model swap is a config change, not a code change. Per-call cost lands in billing_events for the cost rollup.

Edge + TLS

DNS + TLS termination + OWASP-default security headers + a WAF in front of the public surfaces.

Voice flow, end to end

The live voice pipeline terminates the call’s media stream in Vorel’s own voice bridge and drives each turn directly against the agent brain. The agent-level behavior (router, sub-agents, tool calls, guardrails) is identical to the chat path. For the deeper pipeline write-up see Voice features.
Customer phone → telephony carrier → live media stream (WebSocket)

                                          voice bridge (Vorel service)
                                          ├─ speech-to-text
                                          └─ text-to-speech

                                          /api/voice/dispatch-turn  (HTTP/SSE)

                                          router → sub-agent → tool calls

                                          reply text streams back token-by-token

                                          text-to-speech → audio frames → caller
On each finalized caller utterance, the voice bridge calls the web app’s /api/voice/dispatch-turn endpoint and consumes the LLM reply as a token-by-token stream, feeding it straight into text-to-speech so the caller hears the answer as it’s generated. Caller speech interrupts playback mid-utterance (barge-in). When the call ends, a background worker processes the end-of-call report: it persists the transcript + recording + per-call cost breakdown, runs the QA scoring pipeline, and emits the cost-rollup events.
A second-generation, speech-to-speech voice path is in development for latency-sensitive intents, selected per vertical and intent. Tool-heavy turns (booking, tool calls, handoff) always stay on the streamed chained pipeline above. It is not the default path today.

Chat flow, end to end

Customer WhatsApp → WhatsApp provider → /api/webhooks/whatsapp (signature-verified)

                                       message-processor queue

                                       worker:
                                          ├─ persists customer turn + upserts Customer
                                          └─ POSTs /api/internal/agent/dispatch

                                              router → sub-agent → tool calls

                                              send_whatsapp_message (writes outbox)
Inbound WhatsApp signatures are HMAC-verified; rate-limited per IP (500 req/min) before signature verification, then per (tenant, customer phone) (30 req/min) post-verification.
WhatsApp outbound send is paused. The send_whatsapp_message tool persists the agent reply into messages + writes the WhatsApp outbox row, but the actual network send is mocked today. Real send re-activates once WhatsApp Business verification clears for your tenant. Inbound + dashboard reply remain available throughout.

The router → sub-agent shape

The router is a single classification call against a small, fast model with a short prompt. It outputs one of ~10 intent slugs:
  • greeting · faq · new_lead_inquiry · existing_lead_update · booking · reschedule_or_cancel · human_request · complaint · spam_or_unrelated · out_of_scope
Each intent maps to a sub-agent:
IntentSub-agentTools available
new_lead_inquiry, existing_lead_updatequalificationsearch_offerings, update_lead, crm_lookup_customer, crm_update_record, request_handoff
bookingbookingcheck_availability, book_appointment, update_lead, request_handoff
reschedule_or_cancel, human_request, complainthandoffrequest_handoff
greeting, faq, spam_or_unrelated, out_of_scopefaqget_faq_answer, search_offerings, crm_lookup_customer, request_handoff
Sub-agent prompts interpolate the resolved persona + vertical pack at run-time. The agent runs a tool-call loop until either a final text reply or a max-iteration cap is hit.

The tool layer

Tool routes live at /api/tools/<name> and are JWT-authed (5-min TTL signed via TOOL_JWT_SECRET). Every call goes through the same wrapper:
  1. JWT verification + per-(tenant, tool) rate limit (50 req/min).
  2. withTenantContext: opens a Postgres transaction with SET LOCAL app.current_tenant_id, so RLS is set before any query runs.
  3. Tool body: handles its specific job (DB read, vector similarity over offerings / KB, CRM proxy call, calendar check, etc.).
  4. tool_call + tool_result log lines are emitted with W3C trace-context propagation, so an operator can follow a single conversation across web + worker + the post-call QA pipeline.

The customer identity model

Cross-channel continuity binds on the customer’s E.164 phone number, not the channel. A customer who calls and later WhatsApps continues the same conversation thread because conversations.customer_identifier is the phone number; the next inbound surfaces the prior history regardless of channel. This also means no separate “voice account” + “WhatsApp account” for the same human. The customers table is the per-(tenant, phone) source of truth; conversations are children.

Storage layout (high-level)

TableWhat it holdsAppend-only?
tenantsOne row per Vorel customer (tenant). Persona, vertical, working hours, handoff rules, guardrails.No
customersOne row per (tenant, phone). Cross-channel identity anchor.No
conversationsOne row per customer thread. Channel + status + customer_identifier.No
messagesEvery turn of every conversation. Customer + agent + system. Append-only.Yes (Postgres trigger)
leadsQualification state + attributes. Linked to a conversation.No
offeringsTenant catalog (properties / services / clinicians / menu slots). Vector-embedded.No
knowledge_baseFAQ + policy entries. Vector-embedded.No
appointmentsBookings. Linked to a customer + offering + assigned user.No
qa_evaluationsPer-conversation QA scores from the post-call grading pipeline.No
audit_logEvery operator-console read + every mutation. Append-only.Yes (Postgres trigger)
billing_eventsPer-call cost-of-goods events (telephony, speech, model) + chargeable events. Append-only.Yes (Postgres trigger)
webhook_deliveriesPer-attempt outbound webhook delivery records (status + response).No
tenant_credentialsAES-256-GCM-encrypted CRM driver credentials. KEK from env-var.No

QA scoring pipeline

After every call ends, a worker runs an LLM-graded QA pass against an 11-criterion rubric (greeting, intent capture, tool-call appropriateness, language match, handoff timing, etc.). The output (qa_evaluations row) carries:
  • A normalised score (0–11)
  • Per-criterion breakdown
  • Hallucination flags graded by a separate v1 grader (severity: low / medium / high)
The hallucination flags feed into the guardrails layer: depending on the tenant’s hallucination.threshold + hallucination.action, a flagged reply triggers a monitoring alert (warn) or routes to handoff (handoff).

Reliability posture

Failure modes we explicitly handle:
  • LLM provider 5xx / timeout: the provider call retries with backoff; a terminal failure inside the sub-agent dispatch falls through to a short ack (“One moment, please”) plus a fell_back: true flag the inbox surfaces.
  • Redis blip on rate limit: fail-open. A Redis outage admits the request rather than 429-ing every customer. Logged as rate_limit.redis_failed.
  • CRM driver 401: OAuth drivers refresh + retry once; refreshed tokens get re-encrypted and persisted back into tenant_credentials (rotated_at updated). Refresh failures audit-log + return auth_error to the agent, which falls back to a polite “I’ll have someone reach out” rather than hallucinating CRM data.
  • Vendor outage on a hot path: see SLOs. Tool-route success rate (99.9%) and voice-call success rate (99.5%) are the two we measure most aggressively; webhook delivery rides a 6-attempt retry ladder over [60s, 300s, 1800s, 7200s, 43200s].

What’s NOT in the picture today

  • Per-region routing. tenants.region is captured at create time; routing on it (running EU tenants out of an EU Postgres replica) is on the roadmap.
  • Automated billing. Cost rollup is captured in billing_events; invoicing is operator-driven from the rollup until the automated billing model lands.
  • Outbound voice (Vorel-initiated calls). Inbound only today.
  • Voicemail / fallback when the LLM provider is down. A failed call disconnects.
  • Real n8n deployment. The around-the-brain templates exist on disk + the operator-side discoverability page lists them, but n8n is not yet deployed as a running service. Per-turn dispatch (router + sub-agent + tools) runs in TypeScript inside the web app, not in n8n.

Where to go next

Voice features

What the voice agent does, end-to-end.

Chat features

WhatsApp inbound + outbound posture.

API Reference

REST API + SDK for programmatic access.

Security overview

Multi-tenant isolation, encryption, SLOs.