How voice works
Vorel’s voice agent runs a chained, streaming pipeline: speech recognition transcribes the caller, our voice AI reasons over the turn (router, sub-agents, tool calls, guardrails), and lifelike text-to-speech speaks the reply. Audio streams both ways in real time, the agent barges in cleanly when the caller interrupts, and we target a sub-2-second response on each turn.What the voice agent can do
Bilingual Arabic + English
Speech recognition runs in multilingual mode, so the same call can carry both languages,
including Levantine and Jordanian Arabic dialects. The system prompt instructs the agent to
switch when the caller switches and stay consistent within each turn. No mid-turn code-mixing.
Qualify a new lead
Asks the questions you’d want a human receptionist to ask: name, intent, timeline, contact
preference. Captures structured data + writes the lead into your CRM.
Answer questions from your knowledge base
Vector-indexed retrieval against the offerings + FAQ + knowledge entries you’ve populated for
your tenant. Answers cite the source.
Book appointments
Finds available slots within your working hours + handoff rules. Books against the local DB
today; Google Calendar real-sync ships in Phase 7.
Escalate to your team
On configurable triggers: explicit request, complaint, negotiation, stuck conversation,
compliance question. Routes via Slack webhook or email; the conversation lands in your
/inbox.TTS-safe formatting
The agent’s voice replies are formatted for natural speech: no markdown, no URLs, no bullet
lists, no parenthetical asides. A channel-rules block is appended to every sub-agent prompt to
enforce 2-sentence-per-turn, no-markdown, spell-out-numbers behaviour.
Voice quality
Voice quality depends on:- Network conditions between the customer’s carrier and our telephony layer. UAE-domestic calls work cleanly; unusual international carrier pairs can show audio artefacts.
- Speech-recognition confidence: recognition runs in multilingual mode (English + Arabic in the same call); strong on English, decent on Arabic, with accented Arabic varying. Per-tenant recognition confidence thresholds are not yet exposed.
- Text-to-speech: we use a low-latency, lifelike multilingual voice as the platform default. The voice is tuned to stay natural while keeping per-turn latency low, since slower, higher-fidelity models hurt live conversation. Per-tenant voice selection is configurable.
Voice billing model
Voice cost per call decomposes into the underlying transport, speech, and reasoning components. We capture each component intobilling_events (vendor cost-of-goods) and per-call cost tables so your operator can see the full breakdown.
Your operator reviews the per-tenant cost breakdown at /admin/cost-rollup. Vorel’s customer-facing billing is outcome-based (you pay per resolved outcome, not per minute or per token); see Pricing. Invoicing is manual today: your operator generates the monthly invoice from the resolution-event and ledger rows plus your agreed rate card. The internal cost rollup is operator-only and never surfaces the vendor stack on tenant-facing pages.
Voice quality assurance
Every call is scored after-the-fact by Vorel’s QA pipeline:- An LLM scores the call against an 11-criterion rubric (language matching, tone, brevity, factual grounding, tool-usage correctness, qualification completeness, booking-flow accuracy, handoff judgment, safety/compliance, conversion progress, customer-sentiment trajectory).
- Output: a normalized score, per-criterion breakdown, and derived flags.
- Operator-side: stored in
qa_evaluationswith the conversation transcript; surfaced on the analytics + quality surfaces per-tenant and cross-tenant operator-side.
What’s currently NOT supported on voice
- Voicemail / call-back when busy: if the agent fails (LLM-provider outage, network issue), the call disconnects. A voicemail-style fallback is a future addition.
- Outbound calls: operator-initiated outbound dialing is built but ships dark behind a flag; inbound is the live surface today.
- Call recording archive: recordings are hosted by the telephony layer and we store the URL reference. Long-term archiving + PII-redacted recordings are deferred features.
- Conference / multi-party: single-customer-to-agent only.
Per-vertical specifics
Each vertical pack tunes the voice agent’s qualification + handoff behaviour. The summary below mirrors theprompt_overrides.qualification_extra_rules field of each pack:
- Real estate: captures intent (buy vs rent), property type, bedrooms, budget range with currency, preferred areas, timeline, and
financing_needed. Books viewings against your offerings. - Salon: captures whether the caller is returning, occasion (regular vs wedding vs trial), and (for color services) whether they’re matching an existing tone or changing. Sensitive topics like allergies are captured once and never re-asked.
- Clinic: confirms
patient_status(new vs returning), insurance provider + member number, then captures symptoms in the patient’s own words and routes to the right specialty. Will not diagnose (forbidden_phrasesincludesdiagnose,you have,it's nothing serious). Red-flag symptoms (chest pain, severe bleeding, suicidal ideation, possible stroke) trigger an immediate human handoff with an explicit instruction to call emergency services. - Restaurant: captures party size, target service period (lunch / dinner / etc.), dietary restrictions and allergies, and seating preference. Large parties (8+) and private-room bookings: confirms minimum spend, set-menu requirement, and deposit policy at booking time.
- Auto service: captures make / model / year first (everything else depends on it), captures symptoms in the caller’s own words without diagnosing, and never quotes a final repair price over the phone for symptom-driven work. Routine services (oil, brakes, tires) book straight; symptom-driven calls book a diagnostic first.
- Generic SMB: captures name, contact, and what the caller is trying to accomplish. Operator tailors the qualification questions per-tenant in the dashboard.