Vorel — Documentation

A Vorel voice call flows through a chained, streaming pipeline: speech recognition, the agent’s reasoning loop, and text-to-speech, each a discrete step. The call can ride one of two transport paths that own the carrier, codec negotiation, and audio media layer. Per-tenant configuration determines which transport handles each call. This page explains the transport layer, the per-tenant configuration, the eval-gate discipline that protects tenants against quality regressions on a pipeline change, and the shadow-mode protocol that runs before any tenant cuts over.

The two transport paths

The transport path is the layer that owns the carrier trunk, the telephony codec negotiation, and the audio media path. Two transports are in production:

Vendor-orchestrated transport

The original transport. A telephony-orchestration vendor owns the carrier trunk and the speech-vendor integrations, and forwards each turn to Vorel’s agent over an LLM-proxy endpoint. This is the transport Vorel shipped first; it remains in use for tenants that have not migrated to the direct transport.

Direct transport (twilio-direct)

The direct-control transport. The carrier provides the media stream; Vorel’s voice-ws service receives the raw audio frames, runs speech recognition and text-to-speech inside Vorel infrastructure, and ships the synthesized audio back to the carrier. Vorel owns more of the latency budget and more of the failure modes. This is the transport Skyline and other production tenants run on today.

Direct transport architecture

Customer phone
    │
    ▼
Carrier  ──►  Media stream (WebSocket)  ──►  voice-ws (Vorel service)
                                                  │
                                                  ├─► Speech recognition (streaming)
                                                  │
                                                  ├─► /api/voice/dispatch-turn
                                                  │       │
                                                  │       ▼
                                                  │   router → sub-agent → tool calls
                                                  │       │
                                                  │       ▼
                                                  │   reply text (streamed)
                                                  │
                                                  └─► Text-to-speech (streaming)
                                                          │
                                                          ▼
                                                  Audio frames → carrier → caller

On the direct transport, every turn runs the chained pipeline: audio → speech recognition → text → router → sub-agent → tool calls → reply text → text-to-speech → audio. The dispatch path is the same router → sub-agent → tools loop the chat side uses. The reply text streams token-by-token straight into text-to-speech, so the caller hears the answer forming in real time, and the agent barges in cleanly when the caller interrupts.

Vendor-orchestrated transport architecture

Customer phone → Carrier → Telephony-orchestration vendor
                              ├─ Speech recognition
                              ├─ Text-to-speech
                              └─ Vorel agent (LLM-proxy endpoint)
                                   │
                                   ▼
                              router → sub-agent → tool calls

The orchestration vendor owns the audio media layer; Vorel’s role is the agent that returns the reply. Either transport runs the same chained reasoning loop.

Why chained, not speech-to-speech

Vorel deliberately runs a chained pipeline rather than a single speech-to-speech model. Chaining keeps each stage discrete and swappable, which is what makes the agent reliably call tools, ground its answers against your catalog and knowledge base, and enforce per-tenant guardrails on every turn. A single end-to-end speech model would couple recognition, reasoning, and synthesis to one vendor on the most caller-facing surface, a lock-in tradeoff Vorel has chosen not to take. The chained path is the load-bearing default and the architecture the latency work targets (the per-turn sub-2-second mandate is met on the chained stack).

Per-tenant `voice_provider` setting

Each tenant carries a voice_provider setting that selects the transport path:

`voice_provider` value	Transport path
vendor-orchestrated	Vendor-orchestrated transport (legacy)
`twilio-direct`	Direct transport (chained)

New production cutovers move to the direct transport after the cutover protocol (below). The setting is on tenants.voice_provider and is operator-flippable. Changing it requires a tenant-scoped cutover protocol (next section); operators do not flip it without running the protocol first.

Eval-gate at swap

Any transport swap on a production tenant runs the eval-gate before commit. The eval-gate asserts the new configuration does not regress against the tenant’s existing quality bar. The gate runs three checks against a 30-conversation eval set drawn from the tenant’s historical traffic:

Outcome correctness regression. The classifier outcomes on the eval set must match the existing pipeline’s outcomes within a tolerance threshold.
First-token-to-speech p95 regression. The new pipeline’s p95 first-token-to-speech must be within 200ms of the existing pipeline’s p95.
Barge-in success rate regression. The new pipeline’s barge-in handling must not regress more than 5% from the existing pipeline.

A failed eval-gate blocks the cutover. The operator either tunes the configuration and re-runs the gate, or escalates to the engineering owner before proceeding. The eval-gate output is recorded in audit_log so the procurement signal “pipeline cutovers run quality regression checks” is auditable.

Shadow-mode-before-cutover

The voice pipeline cutover runbook requires every direct-transport tenant to run shadow against the existing transport for 7 days before flipping the live traffic. Shadow mode means:

Live traffic continues on the existing transport

The tenant’s customers continue to receive responses from the existing transport. Customer-facing behavior is unchanged.

Each turn is also dispatched to the candidate transport

For every live turn, voice-ws also dispatches the same input through the candidate transport (twilio-direct). The candidate transport’s reply is captured but not returned to the customer.

Shadow comparisons land in `shadow_dispatch_comparisons`

The two replies are recorded side-by-side in a class-(b) telemetry table (retention 30 days per ADR 0019). The candidate’s reply text, latency, and tool-call sequence are compared against the live transport’s.

Operator reviews after 7 days

The operator inspects the shadow comparison data at the operator-side surface. If the candidate transport matches the live transport within the acceptance threshold, the eval-gate is run for final confirmation, and the cutover proceeds.

Flip the `voice_provider` setting

On a successful eval-gate pass, the operator flips tenants.voice_provider to twilio-direct. The next call routes to the new transport. The change is audit-logged.

The shadow-mode period is a hard prerequisite for any cutover on production traffic. The protocol exists because transport cutovers historically regress in ways that are not visible until customer-facing behavior diverges; shadow mode catches divergences before customers see them.

The shadow-comparison and migration scaffolding runs offline against recorded fixtures today. It is the discipline that gates a cutover, not a claim that live shadow traffic is being compared in production right now.

Tenant-side visibility

Tenants do not see the underlying transport directly; the dashboard surfaces conversation outcomes, not transport-layer detail. However, the eval-gate audit rows are visible on the tenant-side audit surface, so a tenant admin can see that on a given date their voice transport was cut over and the cutover passed the configured quality threshold.

What does NOT change across transports

The reasoning loop. Both transports run the same chained router → sub-agent → tool loop. The agent-level behavior is transport-independent.
The tool layer. The same JWT-authed /api/tools/* routes serve both transports. A tenant’s tool calls, CRM writes, and guardrails are transport-independent.
The CRM-as-SoR architectural commitment. Class-(c) writes go through the same CRM-write wrapper regardless of transport; retention windows are the same.
Audit logging. Every turn, every tool call, every cutover, every override lands in audit_log regardless of transport.
Per-tenant guardrails. Forbidden phrases, hallucination thresholds, handoff rules apply identically.
Per-vertical packs. The qualification slots, the persona, the FAQ retrieval, and the booking handlers are all transport-agnostic.

The transport is a media-path choice; the agent-level behavior is the same.

Product → Voice: voice agent capabilities, per-vertical qualification rules, billing model.
Getting started → How it works: full architecture, including the dispatch pipeline and tool layer.
Security → Data retention: shadow_dispatch_comparisons 30-day window, voice_turn_latency 90-day window.

​The two transport paths

Vendor-orchestrated transport

Direct transport (twilio-direct)

​Direct transport architecture

​Vendor-orchestrated transport architecture

​Why chained, not speech-to-speech

​Per-tenant voice_provider setting

​Eval-gate at swap

​Shadow-mode-before-cutover

​Tenant-side visibility

​What does NOT change across transports

​Related docs

The two transport paths

Direct transport architecture

Vendor-orchestrated transport architecture

Why chained, not speech-to-speech

Per-tenant `voice_provider` setting

Eval-gate at swap

Shadow-mode-before-cutover

Tenant-side visibility

What does NOT change across transports

Related docs