Vorel’s per-tenant guardrails are the safety differentiator. Most AI receptionist platforms ship a single prompt for every customer. Vorel lets your operator tune the safety policy per tenant — strict for clinics + regulated verticals, looser for low-stakes tenants — without a code deploy. Two guardrails ship today, each with its own threshold + action knobs.Documentation Index
Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt
Use this file to discover all available pages before exploring further.
What ships today
Hallucination guardrail
A v1 LLM-based grader scores every agent reply for factual fidelity against the conversation
history + tool results + retrieved knowledge base. Configurable threshold and action.
Forbidden-phrase guardrail
Substring-match every agent reply against the merged forbidden-phrase list (vertical pack
defaults + tenant-specific additions). Configurable action when a hit fires.
Hallucination guardrail
Every reply gets graded by a hallucination scorer that flags issues by severity (low /
medium / high) and kind (e.g. fabricated_offering, wrong_price, missing_handoff,
stale_state). The flags land in messages.hallucination_flags for analytics.
The guardrail’s job is to decide what to do when flags fire.
Threshold (which severities trip the guardrail)
| Threshold | Trips on |
|---|---|
low | Any flagged reply (low / medium / high) |
medium | Medium or high flags |
high (default) | Only high flags |
never | Never trips. Graders still record flags for the dashboard, but no runtime action. |
Action (what to do when it trips)
| Action | Behaviour |
|---|---|
warn (default) | Log only. Sentry alert fires for high-severity flags either way. |
handoff | Drop the bot reply. Route the conversation to a human via the existing handoff queue. |
threshold='high' + action='warn'. This matches today’s
pre-Phase-O behaviour — no behaviour change for tenants who haven’t tuned the policy yet.
Recommended configurations
- Clinics, regulated verticals, financial services:
threshold='medium'+action='handoff'. A medium-confidence hallucination should never reach the customer. Cost: more handoffs; benefit: no AI-generated misinformation in a regulated context. - High-volume retail, hospitality:
threshold='high'+action='warn'. The default; high flags get escalated via Sentry, but mid / low flags ride through. Cost: occasional surface of imperfect replies; benefit: no extra handoff load. - Pilot / staging tenants:
threshold='low'+action='warn'. Maximally noisy — surfaces every flag in the analytics dashboard so you can calibrate the threshold based on real data before promoting the tenant to production policy.
Forbidden-phrase guardrail
If the agent’s reply contains any phrase from the merged forbidden-phrase list, this guardrail’s action kicks in. The phrase list comes from two sources, concatenated + de-duplicated:- Vertical pack defaults — every pack ships its own list (e.g. clinic ships
diagnose,you have,definitely,it's nothing serious). - Tenant-specific additions — your operator adds phrases via the dashboard pack-overrides UI.
diagnose matches
“I diagnose…”, “let me diagnose…”, “I can’t diagnose…”. This is intentional: at runtime, near-misses
are mostly the agent trying to talk around a forbidden term — exact-word matching would let too
much through.
Action
| Action | Behaviour |
|---|---|
warn (default) | Log only. The prompt already tells the model not to use these phrases; the guardrail logs the slip without overriding. |
block | Replace the reply with a generic fallback string ("Let me get a colleague to help with that — I'll connect you now." / "دعني أحول طلبك لزميل من الفريق ليساعدك."). The dispatch logs the override. |
handoff | Drop the bot reply, route to a human. |
action='warn'. Same matched-today’s-behaviour rationale as the
hallucination guardrail.
Why three actions instead of two
block and handoff differ in customer experience: block keeps the bot in the conversation
(the customer reads the fallback string and can keep talking); handoff drops the bot and routes
to a human. Use block when you want to give the bot a graceful exit; use handoff when a
forbidden-phrase hit means a human absolutely must take the conversation from here.
Where the guardrail policy lives
Per-tenant guardrails live intenants.guardrails (JSONB column). The schema:
Operator UI
Configure per-tenant fromapp.vorel.ai/admin/tenants/[id]/guardrails. The form writes the JSONB
column directly; changes take effect on the next agent turn (no code deploy, no service restart).
The audit log records every change with the actor, the previous value, and the new value, so
“who turned off the hallucination guardrail and when?” is always answerable.
Pack-level forbidden phrases (read-only floor)
The vertical pack’s forbidden phrases are a floor, not an override target. You can add to the list per-tenant; you cannot remove pack-shipped phrases via the standard pack-override UI. This protects against a clinic operator accidentally turning off thediagnose block.
To remove a pack-level phrase requires a code-side change to the vertical pack JSON (and an
explicit comment justifying the removal). Don’t.
Hallucination scoring details
The grader is a Gemini 2.5 Flash call run after the sub-agent returns its reply, with the conversation history + tool results + retrieved knowledge base in context. The grader’s prompt asks it to identify factual claims in the reply and judge them against the available evidence. Output kinds we’ve seen so far in the wild:fabricated_offering— agent named a property / service / clinician that isn’t in the catalog.wrong_price— agent quoted a price that doesn’t match the offering’spricefield.stale_state— agent claimed something was booked when nobook_appointmenttool call fired.missing_handoff— agent didn’t escalate when a configured handoff trigger condition was met.
What’s NOT a guardrail today
Things you might expect that aren’t on this surface:- Profanity filter. The forbidden-phrase guardrail handles tenant-specific terms; we don’t ship a generic profanity list. Add brand-restricted vocabulary via pack overrides.
- PII redaction in agent replies. The agent doesn’t have access to other customers’ data via RLS, so there’s nothing to redact at the reply layer. PII redaction happens at the data-export + audit-log layer instead.
- Topic restriction. “The agent must only talk about real estate, not weather” is enforced
via the
faq_redirect_message_*strings in vertical packs, not as a separate guardrail.
Related docs
- Verticals — pack-level forbidden phrases (clinic is the load-bearing example)
- Analytics — hallucination flag rates over time
- Security overview — broader safety posture, RLS, PII handling
- How it works — where guardrails sit in the dispatch pipeline