> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice pipelines

> Vorel runs a chained, streaming voice pipeline (speech recognition, our voice AI, lifelike text-to-speech) with two transport paths. Per-tenant selection, eval-gate at swap, shadow-mode-before-cutover.

A Vorel voice call flows through a **chained, streaming pipeline**: speech recognition, the agent's reasoning loop, and text-to-speech, each a discrete step. The call can ride one of two **transport paths** that own the carrier, codec negotiation, and audio media layer. Per-tenant configuration determines which transport handles each call. This page explains the transport layer, the per-tenant configuration, the eval-gate discipline that protects tenants against quality regressions on a pipeline change, and the shadow-mode protocol that runs before any tenant cuts over.

## The two transport paths

The transport path is the layer that owns the carrier trunk, the telephony codec negotiation, and the audio media path. Two transports are in production:

<CardGroup cols={2}>
  <Card title="Vendor-orchestrated transport" icon="phone">
    The original transport. A telephony-orchestration vendor owns the carrier trunk and the
    speech-vendor integrations, and forwards each turn to Vorel's agent over an LLM-proxy endpoint.
    This is the transport Vorel shipped first; it remains in use for tenants that have not migrated
    to the direct transport.
  </Card>

  <Card title="Direct transport (twilio-direct)" icon="bolt">
    The direct-control transport. The carrier provides the media stream; Vorel's voice-ws service
    receives the raw audio frames, runs speech recognition and text-to-speech inside Vorel
    infrastructure, and ships the synthesized audio back to the carrier. Vorel owns more of the
    latency budget and more of the failure modes. This is the transport Skyline and other production
    tenants run on today.
  </Card>
</CardGroup>

### Direct transport architecture

```
Customer phone
    │
    ▼
Carrier  ──►  Media stream (WebSocket)  ──►  voice-ws (Vorel service)
                                                  │
                                                  ├─► Speech recognition (streaming)
                                                  │
                                                  ├─► /api/voice/dispatch-turn
                                                  │       │
                                                  │       ▼
                                                  │   router → sub-agent → tool calls
                                                  │       │
                                                  │       ▼
                                                  │   reply text (streamed)
                                                  │
                                                  └─► Text-to-speech (streaming)
                                                          │
                                                          ▼
                                                  Audio frames → carrier → caller
```

On the direct transport, every turn runs the **chained** pipeline: audio → speech recognition → text → router → sub-agent → tool calls → reply text → text-to-speech → audio. The dispatch path is the same router → sub-agent → tools loop the chat side uses. The reply text streams token-by-token straight into text-to-speech, so the caller hears the answer forming in real time, and the agent barges in cleanly when the caller interrupts.

### Vendor-orchestrated transport architecture

```
Customer phone → Carrier → Telephony-orchestration vendor
                              ├─ Speech recognition
                              ├─ Text-to-speech
                              └─ Vorel agent (LLM-proxy endpoint)
                                   │
                                   ▼
                              router → sub-agent → tool calls
```

The orchestration vendor owns the audio media layer; Vorel's role is the agent that returns the reply. Either transport runs the same chained reasoning loop.

## Why chained, not speech-to-speech

Vorel deliberately runs a **chained** pipeline rather than a single speech-to-speech model. Chaining keeps each stage discrete and swappable, which is what makes the agent reliably call tools, ground its answers against your catalog and knowledge base, and enforce per-tenant guardrails on **every** turn. A single end-to-end speech model would couple recognition, reasoning, and synthesis to one vendor on the most caller-facing surface, a lock-in tradeoff Vorel has chosen not to take. The chained path is the load-bearing default and the architecture the latency work targets (the per-turn sub-2-second mandate is met on the chained stack).

## Per-tenant `voice_provider` setting

Each tenant carries a `voice_provider` setting that selects the transport path:

| `voice_provider` value | Transport path                         |
| ---------------------- | -------------------------------------- |
| vendor-orchestrated    | Vendor-orchestrated transport (legacy) |
| `twilio-direct`        | Direct transport (chained)             |

New production cutovers move to the direct transport after the cutover protocol (below). The setting is on `tenants.voice_provider` and is operator-flippable. Changing it requires a tenant-scoped cutover protocol (next section); operators do not flip it without running the protocol first.

## Eval-gate at swap

Any transport swap on a production tenant **runs the eval-gate before commit**. The eval-gate asserts the new configuration does not regress against the tenant's existing quality bar.

The gate runs three checks against a 30-conversation eval set drawn from the tenant's historical traffic:

1. **Outcome correctness regression.** The classifier outcomes on the eval set must match the existing pipeline's outcomes within a tolerance threshold.
2. **First-token-to-speech p95 regression.** The new pipeline's p95 first-token-to-speech must be within 200ms of the existing pipeline's p95.
3. **Barge-in success rate regression.** The new pipeline's barge-in handling must not regress more than 5% from the existing pipeline.

A failed eval-gate blocks the cutover. The operator either tunes the configuration and re-runs the gate, or escalates to the engineering owner before proceeding. The eval-gate output is recorded in `audit_log` so the procurement signal "pipeline cutovers run quality regression checks" is auditable.

## Shadow-mode-before-cutover

The voice pipeline cutover runbook requires every direct-transport tenant to **run shadow against the existing transport for 7 days before flipping the live traffic**. Shadow mode means:

<Steps>
  <Step title="Live traffic continues on the existing transport">
    The tenant's customers continue to receive responses from the existing transport. Customer-facing
    behavior is unchanged.
  </Step>

  <Step title="Each turn is also dispatched to the candidate transport">
    For every live turn, voice-ws also dispatches the same input through the candidate transport
    (twilio-direct). The candidate transport's reply is captured but not returned to the customer.
  </Step>

  <Step title="Shadow comparisons land in `shadow_dispatch_comparisons`">
    The two replies are recorded side-by-side in a class-(b) telemetry table (retention 30 days per
    ADR 0019). The candidate's reply text, latency, and tool-call sequence are compared against the
    live transport's.
  </Step>

  <Step title="Operator reviews after 7 days">
    The operator inspects the shadow comparison data at the operator-side surface. If the candidate
    transport matches the live transport within the acceptance threshold, the eval-gate is run for
    final confirmation, and the cutover proceeds.
  </Step>

  <Step title="Flip the `voice_provider` setting">
    On a successful eval-gate pass, the operator flips `tenants.voice_provider` to `twilio-direct`.
    The next call routes to the new transport. The change is audit-logged.
  </Step>
</Steps>

The shadow-mode period is a hard prerequisite for any cutover on production traffic. The protocol exists because transport cutovers historically regress in ways that are not visible until customer-facing behavior diverges; shadow mode catches divergences before customers see them.

<Note>
  The shadow-comparison and migration scaffolding runs offline against recorded fixtures today. It
  is the discipline that gates a cutover, not a claim that live shadow traffic is being compared in
  production right now.
</Note>

## Tenant-side visibility

Tenants do not see the underlying transport directly; the dashboard surfaces conversation outcomes, not transport-layer detail. However, the eval-gate audit rows are visible on the tenant-side audit surface, so a tenant admin can see that on a given date their voice transport was cut over and the cutover passed the configured quality threshold.

## What does NOT change across transports

* **The reasoning loop.** Both transports run the same chained router → sub-agent → tool loop. The agent-level behavior is transport-independent.
* **The tool layer.** The same JWT-authed `/api/tools/*` routes serve both transports. A tenant's tool calls, CRM writes, and guardrails are transport-independent.
* **The CRM-as-SoR architectural commitment.** Class-(c) writes go through the same CRM-write wrapper regardless of transport; retention windows are the same.
* **Audit logging.** Every turn, every tool call, every cutover, every override lands in `audit_log` regardless of transport.
* **Per-tenant guardrails.** Forbidden phrases, hallucination thresholds, handoff rules apply identically.
* **Per-vertical packs.** The qualification slots, the persona, the FAQ retrieval, and the booking handlers are all transport-agnostic.

The transport is a media-path choice; the agent-level behavior is the same.

## Related docs

* [Product → Voice](/product/voice): voice agent capabilities, per-vertical qualification rules, billing model.
* [Getting started → How it works](/getting-started/how-it-works): full architecture, including the dispatch pipeline and tool layer.
* [Security → Data retention](/security/data-retention): `shadow_dispatch_comparisons` 30-day window, `voice_turn_latency` 90-day window.

{/* verified-against: apps/voice-ws/src/handlers/media-stream.ts + engine-v2-path.ts (media-stream handler; direct transport ingress; chained pipeline) */}

{/* verified-against: handoff/codebase/web-voice.md (S2S family is an operator/scaffold surface, deferred; shadow/migration/eval are offline scaffolds) */}
