Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt

Use this file to discover all available pages before exploring further.

Vorel’s per-tenant guardrails are the safety differentiator. Most AI receptionist platforms ship a single prompt for every customer. Vorel lets your operator tune the safety policy per tenant — strict for clinics + regulated verticals, looser for low-stakes tenants — without a code deploy. Two guardrails ship today, each with its own threshold + action knobs.

What ships today

Hallucination guardrail

A v1 LLM-based grader scores every agent reply for factual fidelity against the conversation history + tool results + retrieved knowledge base. Configurable threshold and action.

Forbidden-phrase guardrail

Substring-match every agent reply against the merged forbidden-phrase list (vertical pack defaults + tenant-specific additions). Configurable action when a hit fires.

Hallucination guardrail

Every reply gets graded by a hallucination scorer that flags issues by severity (low / medium / high) and kind (e.g. fabricated_offering, wrong_price, missing_handoff, stale_state). The flags land in messages.hallucination_flags for analytics. The guardrail’s job is to decide what to do when flags fire.

Threshold (which severities trip the guardrail)

ThresholdTrips on
lowAny flagged reply (low / medium / high)
mediumMedium or high flags
high (default)Only high flags
neverNever trips. Graders still record flags for the dashboard, but no runtime action.

Action (what to do when it trips)

ActionBehaviour
warn (default)Log only. Sentry alert fires for high-severity flags either way.
handoffDrop the bot reply. Route the conversation to a human via the existing handoff queue.
Default for new tenants: threshold='high' + action='warn'. This matches today’s pre-Phase-O behaviour — no behaviour change for tenants who haven’t tuned the policy yet.
  • Clinics, regulated verticals, financial services: threshold='medium' + action='handoff'. A medium-confidence hallucination should never reach the customer. Cost: more handoffs; benefit: no AI-generated misinformation in a regulated context.
  • High-volume retail, hospitality: threshold='high' + action='warn'. The default; high flags get escalated via Sentry, but mid / low flags ride through. Cost: occasional surface of imperfect replies; benefit: no extra handoff load.
  • Pilot / staging tenants: threshold='low' + action='warn'. Maximally noisy — surfaces every flag in the analytics dashboard so you can calibrate the threshold based on real data before promoting the tenant to production policy.

Forbidden-phrase guardrail

If the agent’s reply contains any phrase from the merged forbidden-phrase list, this guardrail’s action kicks in. The phrase list comes from two sources, concatenated + de-duplicated:
  1. Vertical pack defaults — every pack ships its own list (e.g. clinic ships diagnose, you have, definitely, it's nothing serious).
  2. Tenant-specific additions — your operator adds phrases via the dashboard pack-overrides UI.
Detection is substring match, case-insensitive, on the trimmed phrase. So diagnose matches “I diagnose…”, “let me diagnose…”, “I can’t diagnose…”. This is intentional: at runtime, near-misses are mostly the agent trying to talk around a forbidden term — exact-word matching would let too much through.

Action

ActionBehaviour
warn (default)Log only. The prompt already tells the model not to use these phrases; the guardrail logs the slip without overriding.
blockReplace the reply with a generic fallback string ("Let me get a colleague to help with that — I'll connect you now." / "دعني أحول طلبك لزميل من الفريق ليساعدك."). The dispatch logs the override.
handoffDrop the bot reply, route to a human.
Default for new tenants: action='warn'. Same matched-today’s-behaviour rationale as the hallucination guardrail.

Why three actions instead of two

block and handoff differ in customer experience: block keeps the bot in the conversation (the customer reads the fallback string and can keep talking); handoff drops the bot and routes to a human. Use block when you want to give the bot a graceful exit; use handoff when a forbidden-phrase hit means a human absolutely must take the conversation from here.

Where the guardrail policy lives

Per-tenant guardrails live in tenants.guardrails (JSONB column). The schema:
{
  "hallucination": {
    "threshold": "high",
    "action": "warn"
  },
  "forbidden_phrase": {
    "action": "warn"
  }
}
The parser is tolerant: bad / missing fields fall through to defaults. A stale operator save or a malformed value never breaks dispatch — the agent runs on defaults until the policy is fixed.

Operator UI

Configure per-tenant from app.vorel.ai/admin/tenants/[id]/guardrails. The form writes the JSONB column directly; changes take effect on the next agent turn (no code deploy, no service restart). The audit log records every change with the actor, the previous value, and the new value, so “who turned off the hallucination guardrail and when?” is always answerable.

Pack-level forbidden phrases (read-only floor)

The vertical pack’s forbidden phrases are a floor, not an override target. You can add to the list per-tenant; you cannot remove pack-shipped phrases via the standard pack-override UI. This protects against a clinic operator accidentally turning off the diagnose block. To remove a pack-level phrase requires a code-side change to the vertical pack JSON (and an explicit comment justifying the removal). Don’t.

Hallucination scoring details

The grader is a Gemini 2.5 Flash call run after the sub-agent returns its reply, with the conversation history + tool results + retrieved knowledge base in context. The grader’s prompt asks it to identify factual claims in the reply and judge them against the available evidence. Output kinds we’ve seen so far in the wild:
  • fabricated_offering — agent named a property / service / clinician that isn’t in the catalog.
  • wrong_price — agent quoted a price that doesn’t match the offering’s price field.
  • stale_state — agent claimed something was booked when no book_appointment tool call fired.
  • missing_handoff — agent didn’t escalate when a configured handoff trigger condition was met.
These flags also feed the Analytics weekly-rollup so you can track hallucination rate over time per tenant.

What’s NOT a guardrail today

Things you might expect that aren’t on this surface:
  • Profanity filter. The forbidden-phrase guardrail handles tenant-specific terms; we don’t ship a generic profanity list. Add brand-restricted vocabulary via pack overrides.
  • PII redaction in agent replies. The agent doesn’t have access to other customers’ data via RLS, so there’s nothing to redact at the reply layer. PII redaction happens at the data-export + audit-log layer instead.
  • Topic restriction. “The agent must only talk about real estate, not weather” is enforced via the faq_redirect_message_* strings in vertical packs, not as a separate guardrail.
  • Verticals — pack-level forbidden phrases (clinic is the load-bearing example)
  • Analytics — hallucination flag rates over time
  • Security overview — broader safety posture, RLS, PII handling
  • How it works — where guardrails sit in the dispatch pipeline