> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# How it works

> Vorel's architecture, end-to-end. Telephony + WhatsApp ingress, the router → sub-agent dispatch, the tool layer, the QA pipeline, and how data lands in your CRM.

This page is for the technically-curious: you're evaluating Vorel against in-house builds, building integrations against the API, or want to understand the failure modes before you deploy. If you're a buyer who just wants the high-level pitch, the [What is Vorel](/getting-started/what-is-vorel) page is shorter.

## The big picture

Every customer turn (a call connect, a WhatsApp inbound, a voice utterance) flows through the same five-stage pipeline:

```
ingress  →  conversation context  →  router  →  sub-agent  →  terminal
```

* **Ingress** receives the customer event from a telephony / messaging provider and verifies the signature.
* **Conversation context** hydrates the conversation history, customer identity, vertical pack, and persona.
* **Router** is a small, fast LLM that classifies the customer's intent into one of \~10 categories.
* **Sub-agent** (qualification / FAQ / booking / handoff) runs the actual reply-generation loop with the tools it's allowed to call.
* **Terminal** persists the agent reply, runs guardrails (forbidden-phrase + hallucination), and ships the reply back to the customer.

Voice and chat share this entire pipeline; only the ingress and the terminal differ per channel.

## The components

<CardGroup cols={2}>
  <Card title="Web app" icon="server">
    Next.js (App Router) on our managed hosting. Hosts the operator console, the public API
    (`/api/v1/*`), the tool routes (`/api/tools/*`), the per-turn voice dispatch endpoint
    (`/api/voice/dispatch-turn`), and the inbound webhook receivers (`/api/webhooks/*`). The agent
    brain (router, sub-agents, tools, LLM loop) runs here, in-process.
  </Card>

  <Card title="Voice bridge" icon="phone">
    A standalone WebSocket service that terminates the live telephony media stream, runs speech-to-text
    and text-to-speech, and drives each call turn by calling the web app's dispatch endpoint over
    HTTP/SSE. It's process-isolated from the web app so a voice-pipeline issue can't take down the
    console or the API.
  </Card>

  <Card title="Workers" icon="gears">
    Standalone Node service running a background-job queue. Processes inbound WhatsApp messages,
    end-of-call voice reports, outbound webhook dispatch, and right-to-erasure scrubs. Same git tree
    as the web app; separate service so a slow background job doesn't block a request thread.
  </Card>

  <Card title="Postgres" icon="database">
    Single multi-tenant database. Every tenant-scoped table has a Row-Level Security policy gating
    reads + writes by `current_setting('app.current_tenant_id')`. The `vorel_app` Postgres role is
    not a superuser, so a forgotten `SET LOCAL app.current_tenant_id` returns zero rows:
    fail-closed.
  </Card>

  <Card title="Redis (queues + rate limit)" icon="bolt">
    Job-queue backbone for the workers, plus a fixed-window primitive for the rate-limit stack, plus
    per-call dispatch context. Disk-encrypted at rest.
  </Card>

  <Card title="Our AI stack" icon="microchip">
    All language-model calls go through a vendor-neutral provider abstraction (intent router, the
    sub-agent reply loop, embeddings for semantic search). The provider is selected per tenant /
    vertical / sub-agent, so a model swap is a config change, not a code change. Per-call cost lands
    in `billing_events` for the cost rollup.
  </Card>

  <Card title="Edge + TLS" icon="cloud">
    DNS + TLS termination + OWASP-default security headers + a WAF in front of the public surfaces.
  </Card>
</CardGroup>

## Voice flow, end to end

The live voice pipeline terminates the call's media stream in Vorel's own voice bridge and drives each turn directly against the agent brain. The agent-level behavior (router, sub-agents, tool calls, guardrails) is identical to the chat path. For the deeper pipeline write-up see [Voice features](/product/voice).

```
Customer phone → telephony carrier → live media stream (WebSocket)
                                                  ↓
                                          voice bridge (Vorel service)
                                          ├─ speech-to-text
                                          └─ text-to-speech
                                                  ↓
                                          /api/voice/dispatch-turn  (HTTP/SSE)
                                                  ↓
                                          router → sub-agent → tool calls
                                                  ↓
                                          reply text streams back token-by-token
                                                  ↓
                                          text-to-speech → audio frames → caller
```

On each finalized caller utterance, the voice bridge calls the web app's `/api/voice/dispatch-turn` endpoint and consumes the LLM reply as a token-by-token stream, feeding it straight into text-to-speech so the caller hears the answer as it's generated. Caller speech interrupts playback mid-utterance (barge-in).

When the call ends, a background worker processes the end-of-call report: it persists the transcript + recording + per-call cost breakdown, runs the QA scoring pipeline, and emits the cost-rollup events.

<Note>
  A second-generation, speech-to-speech voice path is in development for latency-sensitive intents,
  selected per vertical and intent. Tool-heavy turns (booking, tool calls, handoff) always stay on
  the streamed chained pipeline above. It is not the default path today.
</Note>

## Chat flow, end to end

```
Customer WhatsApp → WhatsApp provider → /api/webhooks/whatsapp (signature-verified)
                                          ↓
                                       message-processor queue
                                          ↓
                                       worker:
                                          ├─ persists customer turn + upserts Customer
                                          └─ POSTs /api/internal/agent/dispatch
                                                  ↓
                                              router → sub-agent → tool calls
                                                  ↓
                                              send_whatsapp_message (writes outbox)
```

Inbound WhatsApp signatures are HMAC-verified; rate-limited per IP (500 req/min) before signature verification, then per (tenant, customer phone) (30 req/min) post-verification.

<Note>
  **WhatsApp outbound send is paused.** The `send_whatsapp_message` tool persists the agent reply
  into `messages` + writes the WhatsApp outbox row, but the actual network send is mocked today.
  Real send re-activates once WhatsApp Business verification clears for your tenant. Inbound +
  dashboard reply remain available throughout.
</Note>

## The router → sub-agent shape

The router is a single classification call against a small, fast model with a short prompt. It outputs one of \~10 intent slugs:

* `greeting` · `faq` · `new_lead_inquiry` · `existing_lead_update` · `booking` · `reschedule_or_cancel` · `human_request` · `complaint` · `spam_or_unrelated` · `out_of_scope`

Each intent maps to a sub-agent:

| Intent                                                 | Sub-agent       | Tools available                                                                                  |
| ------------------------------------------------------ | --------------- | ------------------------------------------------------------------------------------------------ |
| `new_lead_inquiry`, `existing_lead_update`             | `qualification` | `search_offerings`, `update_lead`, `crm_lookup_customer`, `crm_update_record`, `request_handoff` |
| `booking`                                              | `booking`       | `check_availability`, `book_appointment`, `update_lead`, `request_handoff`                       |
| `reschedule_or_cancel`, `human_request`, `complaint`   | `handoff`       | `request_handoff`                                                                                |
| `greeting`, `faq`, `spam_or_unrelated`, `out_of_scope` | `faq`           | `get_faq_answer`, `search_offerings`, `crm_lookup_customer`, `request_handoff`                   |

Sub-agent prompts interpolate the resolved persona + vertical pack at run-time. The agent runs a tool-call loop until either a final text reply or a max-iteration cap is hit.

## The tool layer

Tool routes live at `/api/tools/<name>` and are JWT-authed (5-min TTL signed via `TOOL_JWT_SECRET`). Every call goes through the same wrapper:

1. **JWT verification** + per-(tenant, tool) rate limit (50 req/min).
2. **`withTenantContext`**: opens a Postgres transaction with `SET LOCAL app.current_tenant_id`, so RLS is set before any query runs.
3. **Tool body**: handles its specific job (DB read, vector similarity over offerings / KB, CRM proxy call, calendar check, etc.).
4. **`tool_call` + `tool_result` log lines** are emitted with W3C trace-context propagation, so an operator can follow a single conversation across web + worker + the post-call QA pipeline.

## The customer identity model

Cross-channel continuity binds on the customer's E.164 phone number, not the channel. A customer who calls and later WhatsApps continues the same conversation thread because `conversations.customer_identifier` is the phone number; the next inbound surfaces the prior history regardless of channel.

This also means **no separate "voice account" + "WhatsApp account"** for the same human. The `customers` table is the per-(tenant, phone) source of truth; conversations are children.

## Storage layout (high-level)

| Table                | What it holds                                                                                     | Append-only?           |
| -------------------- | ------------------------------------------------------------------------------------------------- | ---------------------- |
| `tenants`            | One row per Vorel customer (tenant). Persona, vertical, working hours, handoff rules, guardrails. | No                     |
| `customers`          | One row per (tenant, phone). Cross-channel identity anchor.                                       | No                     |
| `conversations`      | One row per customer thread. Channel + status + customer\_identifier.                             | No                     |
| `messages`           | Every turn of every conversation. Customer + agent + system. **Append-only.**                     | Yes (Postgres trigger) |
| `leads`              | Qualification state + attributes. Linked to a conversation.                                       | No                     |
| `offerings`          | Tenant catalog (properties / services / clinicians / menu slots). Vector-embedded.                | No                     |
| `knowledge_base`     | FAQ + policy entries. Vector-embedded.                                                            | No                     |
| `appointments`       | Bookings. Linked to a customer + offering + assigned user.                                        | No                     |
| `qa_evaluations`     | Per-conversation QA scores from the post-call grading pipeline.                                   | No                     |
| `audit_log`          | Every operator-console read + every mutation. **Append-only.**                                    | Yes (Postgres trigger) |
| `billing_events`     | Per-call cost-of-goods events (telephony, speech, model) + chargeable events. **Append-only.**    | Yes (Postgres trigger) |
| `webhook_deliveries` | Per-attempt outbound webhook delivery records (status + response).                                | No                     |
| `tenant_credentials` | AES-256-GCM-encrypted CRM driver credentials. KEK from env-var.                                   | No                     |

## QA scoring pipeline

After every call ends, a worker runs an LLM-graded QA pass against an 11-criterion rubric (greeting, intent capture, tool-call appropriateness, language match, handoff timing, etc.). The output (`qa_evaluations` row) carries:

* A normalised score (0–11)
* Per-criterion breakdown
* Hallucination flags graded by a separate v1 grader (severity: low / medium / high)

The hallucination flags feed into the [guardrails](/product/guardrails) layer: depending on the tenant's `hallucination.threshold` + `hallucination.action`, a flagged reply triggers a monitoring alert (`warn`) or routes to handoff (`handoff`).

## Reliability posture

Failure modes we explicitly handle:

* **LLM provider 5xx / timeout**: the provider call retries with backoff; a terminal failure inside the sub-agent dispatch falls through to a short ack ("One moment, please") plus a `fell_back: true` flag the inbox surfaces.
* **Redis blip on rate limit**: fail-open. A Redis outage admits the request rather than 429-ing every customer. Logged as `rate_limit.redis_failed`.
* **CRM driver 401**: OAuth drivers refresh + retry once; refreshed tokens get re-encrypted and persisted back into `tenant_credentials` (rotated\_at updated). Refresh failures audit-log + return `auth_error` to the agent, which falls back to a polite "I'll have someone reach out" rather than hallucinating CRM data.
* **Vendor outage on a hot path**: see [SLOs](/security/overview#slos). Tool-route success rate (99.9%) and voice-call success rate (99.5%) are the two we measure most aggressively; webhook delivery rides a 6-attempt retry ladder over `[60s, 300s, 1800s, 7200s, 43200s]`.

## What's NOT in the picture today

* **Per-region routing.** `tenants.region` is captured at create time; routing on it (running EU tenants out of an EU Postgres replica) is on the roadmap.
* **Automated billing.** Cost rollup is captured in `billing_events`; invoicing is operator-driven from the rollup until the automated billing model lands.
* **Outbound voice (Vorel-initiated calls).** Inbound only today.
* **Voicemail / fallback when the LLM provider is down.** A failed call disconnects.
* **Real n8n deployment.** The around-the-brain templates exist on disk + the operator-side discoverability page lists them, but n8n is not yet deployed as a running service. Per-turn dispatch (router + sub-agent + tools) runs in TypeScript inside the web app, not in n8n.

## Where to go next

<CardGroup cols={2}>
  <Card title="Voice features" icon="phone" href="/product/voice">
    What the voice agent does, end-to-end.
  </Card>

  <Card title="Chat features" icon="comments" href="/product/chat">
    WhatsApp inbound + outbound posture.
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/introduction">
    REST API + SDK for programmatic access.
  </Card>

  <Card title="Security overview" icon="shield-check" href="/security/overview">
    Multi-tenant isolation, encryption, SLOs.
  </Card>
</CardGroup>

{/* verified-against: apps/web/src/lib/agents/voice-dispatch.ts + chat-dispatch.ts (router → sub-agent shape) */}

{/* verified-against: apps/voice-ws/ (live voice bridge; Twilio Media Streams → /api/voice/dispatch-turn over HTTP/SSE) */}

{/* verified-against: apps/web/src/app/api/webhooks/whatsapp/route.ts + apps/workers/src/queues/message-processor.ts (chat ingress) */}

{/* verified-against: apps/web/src/app/admin/automation/page.tsx (n8n service status: not yet deployed) */}

{/* verified-against: handoff/docs/SLOs.md (5 SLOs) */}