The big picture
Every customer turn (a call connect, a WhatsApp inbound, a voice utterance) flows through the same five-stage pipeline:- Ingress receives the customer event from a telephony / messaging provider and verifies the signature.
- Conversation context hydrates the conversation history, customer identity, vertical pack, and persona.
- Router is a small, fast LLM that classifies the customer’s intent into one of ~10 categories.
- Sub-agent (qualification / FAQ / booking / handoff) runs the actual reply-generation loop with the tools it’s allowed to call.
- Terminal persists the agent reply, runs guardrails (forbidden-phrase + hallucination), and ships the reply back to the customer.
The components
Web app
Next.js (App Router) on our managed hosting. Hosts the operator console, the public API
(
/api/v1/*), the tool routes (/api/tools/*), the per-turn voice dispatch endpoint
(/api/voice/dispatch-turn), and the inbound webhook receivers (/api/webhooks/*). The agent
brain (router, sub-agents, tools, LLM loop) runs here, in-process.Voice bridge
A standalone WebSocket service that terminates the live telephony media stream, runs speech-to-text
and text-to-speech, and drives each call turn by calling the web app’s dispatch endpoint over
HTTP/SSE. It’s process-isolated from the web app so a voice-pipeline issue can’t take down the
console or the API.
Workers
Standalone Node service running a background-job queue. Processes inbound WhatsApp messages,
end-of-call voice reports, outbound webhook dispatch, and right-to-erasure scrubs. Same git tree
as the web app; separate service so a slow background job doesn’t block a request thread.
Postgres
Single multi-tenant database. Every tenant-scoped table has a Row-Level Security policy gating
reads + writes by
current_setting('app.current_tenant_id'). The vorel_app Postgres role is
not a superuser, so a forgotten SET LOCAL app.current_tenant_id returns zero rows:
fail-closed.Redis (queues + rate limit)
Job-queue backbone for the workers, plus a fixed-window primitive for the rate-limit stack, plus
per-call dispatch context. Disk-encrypted at rest.
Our AI stack
All language-model calls go through a vendor-neutral provider abstraction (intent router, the
sub-agent reply loop, embeddings for semantic search). The provider is selected per tenant /
vertical / sub-agent, so a model swap is a config change, not a code change. Per-call cost lands
in
billing_events for the cost rollup.Edge + TLS
DNS + TLS termination + OWASP-default security headers + a WAF in front of the public surfaces.
Voice flow, end to end
The live voice pipeline terminates the call’s media stream in Vorel’s own voice bridge and drives each turn directly against the agent brain. The agent-level behavior (router, sub-agents, tool calls, guardrails) is identical to the chat path. For the deeper pipeline write-up see Voice features./api/voice/dispatch-turn endpoint and consumes the LLM reply as a token-by-token stream, feeding it straight into text-to-speech so the caller hears the answer as it’s generated. Caller speech interrupts playback mid-utterance (barge-in).
When the call ends, a background worker processes the end-of-call report: it persists the transcript + recording + per-call cost breakdown, runs the QA scoring pipeline, and emits the cost-rollup events.
A second-generation, speech-to-speech voice path is in development for latency-sensitive intents,
selected per vertical and intent. Tool-heavy turns (booking, tool calls, handoff) always stay on
the streamed chained pipeline above. It is not the default path today.
Chat flow, end to end
WhatsApp outbound send is paused. The
send_whatsapp_message tool persists the agent reply
into messages + writes the WhatsApp outbox row, but the actual network send is mocked today.
Real send re-activates once WhatsApp Business verification clears for your tenant. Inbound +
dashboard reply remain available throughout.The router → sub-agent shape
The router is a single classification call against a small, fast model with a short prompt. It outputs one of ~10 intent slugs:greeting·faq·new_lead_inquiry·existing_lead_update·booking·reschedule_or_cancel·human_request·complaint·spam_or_unrelated·out_of_scope
| Intent | Sub-agent | Tools available |
|---|---|---|
new_lead_inquiry, existing_lead_update | qualification | search_offerings, update_lead, crm_lookup_customer, crm_update_record, request_handoff |
booking | booking | check_availability, book_appointment, update_lead, request_handoff |
reschedule_or_cancel, human_request, complaint | handoff | request_handoff |
greeting, faq, spam_or_unrelated, out_of_scope | faq | get_faq_answer, search_offerings, crm_lookup_customer, request_handoff |
The tool layer
Tool routes live at/api/tools/<name> and are JWT-authed (5-min TTL signed via TOOL_JWT_SECRET). Every call goes through the same wrapper:
- JWT verification + per-(tenant, tool) rate limit (50 req/min).
withTenantContext: opens a Postgres transaction withSET LOCAL app.current_tenant_id, so RLS is set before any query runs.- Tool body: handles its specific job (DB read, vector similarity over offerings / KB, CRM proxy call, calendar check, etc.).
tool_call+tool_resultlog lines are emitted with W3C trace-context propagation, so an operator can follow a single conversation across web + worker + the post-call QA pipeline.
The customer identity model
Cross-channel continuity binds on the customer’s E.164 phone number, not the channel. A customer who calls and later WhatsApps continues the same conversation thread becauseconversations.customer_identifier is the phone number; the next inbound surfaces the prior history regardless of channel.
This also means no separate “voice account” + “WhatsApp account” for the same human. The customers table is the per-(tenant, phone) source of truth; conversations are children.
Storage layout (high-level)
| Table | What it holds | Append-only? |
|---|---|---|
tenants | One row per Vorel customer (tenant). Persona, vertical, working hours, handoff rules, guardrails. | No |
customers | One row per (tenant, phone). Cross-channel identity anchor. | No |
conversations | One row per customer thread. Channel + status + customer_identifier. | No |
messages | Every turn of every conversation. Customer + agent + system. Append-only. | Yes (Postgres trigger) |
leads | Qualification state + attributes. Linked to a conversation. | No |
offerings | Tenant catalog (properties / services / clinicians / menu slots). Vector-embedded. | No |
knowledge_base | FAQ + policy entries. Vector-embedded. | No |
appointments | Bookings. Linked to a customer + offering + assigned user. | No |
qa_evaluations | Per-conversation QA scores from the post-call grading pipeline. | No |
audit_log | Every operator-console read + every mutation. Append-only. | Yes (Postgres trigger) |
billing_events | Per-call cost-of-goods events (telephony, speech, model) + chargeable events. Append-only. | Yes (Postgres trigger) |
webhook_deliveries | Per-attempt outbound webhook delivery records (status + response). | No |
tenant_credentials | AES-256-GCM-encrypted CRM driver credentials. KEK from env-var. | No |
QA scoring pipeline
After every call ends, a worker runs an LLM-graded QA pass against an 11-criterion rubric (greeting, intent capture, tool-call appropriateness, language match, handoff timing, etc.). The output (qa_evaluations row) carries:
- A normalised score (0–11)
- Per-criterion breakdown
- Hallucination flags graded by a separate v1 grader (severity: low / medium / high)
hallucination.threshold + hallucination.action, a flagged reply triggers a monitoring alert (warn) or routes to handoff (handoff).
Reliability posture
Failure modes we explicitly handle:- LLM provider 5xx / timeout: the provider call retries with backoff; a terminal failure inside the sub-agent dispatch falls through to a short ack (“One moment, please”) plus a
fell_back: trueflag the inbox surfaces. - Redis blip on rate limit: fail-open. A Redis outage admits the request rather than 429-ing every customer. Logged as
rate_limit.redis_failed. - CRM driver 401: OAuth drivers refresh + retry once; refreshed tokens get re-encrypted and persisted back into
tenant_credentials(rotated_at updated). Refresh failures audit-log + returnauth_errorto the agent, which falls back to a polite “I’ll have someone reach out” rather than hallucinating CRM data. - Vendor outage on a hot path: see SLOs. Tool-route success rate (99.9%) and voice-call success rate (99.5%) are the two we measure most aggressively; webhook delivery rides a 6-attempt retry ladder over
[60s, 300s, 1800s, 7200s, 43200s].
What’s NOT in the picture today
- Per-region routing.
tenants.regionis captured at create time; routing on it (running EU tenants out of an EU Postgres replica) is on the roadmap. - Automated billing. Cost rollup is captured in
billing_events; invoicing is operator-driven from the rollup until the automated billing model lands. - Outbound voice (Vorel-initiated calls). Inbound only today.
- Voicemail / fallback when the LLM provider is down. A failed call disconnects.
- Real n8n deployment. The around-the-brain templates exist on disk + the operator-side discoverability page lists them, but n8n is not yet deployed as a running service. Per-turn dispatch (router + sub-agent + tools) runs in TypeScript inside the web app, not in n8n.
Where to go next
Voice features
What the voice agent does, end-to-end.
Chat features
WhatsApp inbound + outbound posture.
API Reference
REST API + SDK for programmatic access.
Security overview
Multi-tenant isolation, encryption, SLOs.