What Vorel collects
End-customer data (the actual lead / patient / guest / vehicle owner) sits in a small, intentional set of tables. The full inventory is maintained internally; this is the summary.High-sensitivity (Direct Identifier + Communication Content)
conversations.customer_identifier: WhatsAppwa_id(E.164 phone) or voice DID. The primary cross-channel identity anchor.conversations.customer_name: set from WhatsApp profile name on first inbound; nullable.messages.content: every inbound + outbound message text. The transcript source-of- truth, append-only, never edited.messages.content_translated: translation of the same content (when language differs from tenant default).messages.media_url: inbound media (images / voice clips). May contain customer-uploaded selfies / IDs.leads.name,leads.email,leads.phone: collected by the qualification sub-agent.leads.notes: free-form; may quote message content.appointments.customer_name,customer_phone,customer_email: required at booking time for confirmation copy.appointments.location_text: sometimes the customer’s address.leads.attributes(JSONB): vertical-specific slots; may embed strings the customer mentioned (employer, neighborhood, allergies, etc.).
Medium-sensitivity (Behavioral / Metadata)
conversations.customer_language:'en' | 'ar'detected from first message.conversations.tags: operator-curated tags.messages.tool_payload: JSON payload from tool calls; may include customer-derived strings (search queries, slot values from qualification).qa_evaluations.score+criteria: per-conversation grading; doesn’t carry raw PII but links to it via FKs.
Operator data (not end-customer PII)
users.email,users.full_name: your dashboard team’s profile, mirrored from our authentication provider.audit_log.ip_address,audit_log.user_agent: operator dashboard activity.audit_log.metadata: free-form JSONB per action; references customer ids by FK only. Raw customer PII is never stored here directly.
Where data lives
Primary store
- Primary database (managed by our hosting provider): every tenant-scoped table.
Disk-encrypted at rest. Row-Level Security gates every read by
tenant_id. - In-memory data store (managed by our hosting provider): job queues + rate-limit counters. No direct PII (job payloads are short-lived; rate-limit keys are tenant id + IP / phone, not message content).
Caches + working storage
- In-process: request-scoped tenant context, conversation history during a single dispatch. Not persisted.
- Error monitoring: exception capture. Our error-monitoring system redacts
vapk_*API keys and known PII fields before sending. Customer PII can leak into stack traces if a downstream vendor returns it in an error message; the redactor catches the high-risk patterns.
Ephemeral vendor stores
- Voice-orchestration provider: voice call orchestration; holds transcripts + recordings until our worker picks up the end-of-call report and persists what we want.
- WhatsApp messaging provider: business-messaging ingest. Holds inbound messages until our webhook receiver acknowledges.
- Speech-to-text + text-to-speech providers: process audio in-flight; no long-term storage on their side per their commercial terms.
- LLM inference providers: per their commercial-tier terms, requests are not used for model training.
Object storage
The PII inventory references an object-storage bucket for voice clips. This bucket is dormant today: the dormant infrastructure provisions it but Vorel’s current deployment doesn’t use it. Recordings live on the voice-orchestration provider’s CDN with the URL referenced inmessages.media_url. It reactivates only with the alternate cloud deployment path.
Append-only tables
Three tables block UPDATE + DELETE at the DB level via Postgres triggers:messages: transcript integrity.audit_log: audit integrity.billing_events: financial-record integrity.
Right to access (PDPL Art. 15 / GDPR Art. 15)
POST /api/tenant/export: operator-gated. Returns a ZIP containing:
conversations.csvmessages.csvleads.csvappointments.csvofferings.csvknowledge_base.csvaudit_log.csvREADME.md: chain-of-custody (timestamp, operator, scope, redaction settings)
include_full_pii=true for
the unredacted version (audit-logged with the operator + a justification field).
When a customer requests their data: your operator routes the request through this endpoint,
attaches the chain-of-custody README, and provides the resulting ZIP per your DSAR workflow.
Right to erasure (PDPL Art. 17 / GDPR Art. 17)
POST /api/tenant/forget: operator-gated, dry-run-by-default.
The 7-step scrub for a single customer phone within a single tenant:
Tombstone conversations
customer_identifier replaced with redacted-<tenant_id>-<uuid8> so the conversation row
survives but doesn’t link to the real customer.Redact message content
messages.content and content_translated overwritten with [redacted]. Append-only
triggers permit this via the dedicated operator-side role’s bypass; the standard tenant role still cannot.Null leads PII
leads.name / email / phone / notes / attributes (the high-PII keys) nulled or
redacted.Null appointments PII
appointments.customer_name / customer_phone / customer_email / location_text /
notes redacted.Scrub audit-log JSONB references
Regex-replace E.164 phones +
wa_id-shaped numbers in audit_log.metadata JSONB.Salted-hash audit reference
A new audit row records the scrub, tagged with
sha256(salt + phone) of the original
customer identifier; this row IS the chain-of-custody artefact proving deletion happened.
The salt is supplied by a required environment variable; without a salt the hash would be
trivially reversible by any phone-number dictionary.What right-to-erasure does NOT touch
messagesrow deletion: we redact content, we don’t delete the rows. The append-only trigger is intentional; the audit-log record needs the conversation row to survive for the chain of custody to make sense.audit_logrows: append-only. The deletion event is recorded as a new audit row, not by removing prior rows.billing_events: financial integrity supersedes per-customer scrub.- Webhook deliveries already-fired: past deliveries to your tenant’s outbound webhook URL carried the message content. Vorel can’t reach those receivers to retroactively scrub. Customers should expect that data may also live in your downstream systems and address it there per your own retention policy.
Per-class retention windows (ADR-locked)
Retention windows are policy-locked: changing them requires a code change + a git-history-visible
policy amendment, not a UI toggle. Per-tenant overrides exist for narrowing retention below the
platform default; widening past the platform default requires the same policy-amendment path. See
Data retention for the full four-class taxonomy and the four
architectural invariants.
Class (c): transcripts + PII (cached until CRM-write success)
| Vorel-side row | Platform-default retention |
|---|---|
messages (per-turn transcript) | 7 days post-insert AND post CRM-write success |
conversations | 30 days post-close AND post CRM-write success |
customers | 30 days post-last-touch AND post CRM-write success |
customer_profiles | 30 days post-update AND post CRM-write success |
leads | 30 days post-update AND post CRM-write success |
appointments | 60 days post-scheduled-end AND post CRM-write success |
cases | 90 days post-close AND post CRM-write success |
case_transitions | 90 days post-insert AND post CRM-write success |
case_transition_proposals | 90 days post-decision AND post CRM-write success |
case_messages | 90 days post-insert AND post CRM-write success |
case_runtime_events | 90 days post-insert AND post CRM-write success |
qa_evaluations (aggregate) | 365 days post-insert AND post CRM-write success |
resolution_events | 365 days post-classification AND post CRM-write success |
Class (b): operational telemetry
| Vorel-side row | Platform-default retention | Purpose |
|---|---|---|
voice_call_cost | 365 days | Billing reconciliation |
voice_call_cost.payment_resolution | 365 days | PCI reconciliation |
voice_turn_latency | 90 days | First-token-to-speech regression window |
llm_calls | 90 days | LLM usage analytics + cost reconciliation |
webhook_deliveries | 30 days | Forensic-only after delivery |
prompt_experiment_assignments | 90 days | Aligned with conversation purge cadence |
prompt_variant_assignments | 90 days | Aligned with message purge |
prompt_revision_proposals | 365 days | Quarterly operator-triage cadence |
quality_signals + quality_failures | 365 days | Trend rollups for quality dashboards |
calibration_observations | 90 days | Aligned with message purge |
shadow_dispatch_comparisons | 30 days | Pipeline-cutover research artifact |
hallucination_flag_reviews | 365 days | Quality-trend analysis window |
Class (d): audit-only (long-lived)
audit_log, billing_events, tenants, users, voice_cutover_event, lora_adapters, adapter_promotions, offerings, knowledge_base_entries, webhooks, voice_numbers, prompt_experiments, incidents, api_keys, prompt_variants, tenant_credentials, tenant_crm_field_mappings, tenant_prompt_overrides: long-lived by design. Each table’s “permitted long-lived” rationale is documented in the schema-audit matrix. audit_log retention is owned by the existing N-3 retention regime; the rest are configuration / structured-audit surfaces.
Per-tenant overrides
Tenants who require shorter retention than the platform default can configure narrower windows during onboarding or via the operator console. The override is recorded as a per-tenant, per-table retention setting, emits an audit-log row visible to the tenant admin, and the next scheduled purge respects the narrower window. Widening past the platform default is not exposed as a tenant-side setting; that path goes through an audited policy amendment.Customer requests workflow
When a customer of yours (your end customer) requests a Data Subject Rights action:You receive the request
Customer emails / messages your team requesting their data or asking to be forgotten.
Notify your Vorel operator
Email your operator (or use your support channel) with: customer’s phone number + the nature of
the request (export / erasure) + any required documentation per your DSAR process.
Operator runs the endpoint
Vorel’s operator runs
/api/tenant/export or /api/tenant/forget against your tenant scoped to
the specific customer phone. Default settings; no include_full_pii unless you explicitly
request it.Receive the artefact
Operator sends you the ZIP (for export) or the audit-log reference (for erasure). The audit-log
entry includes the salted-hash of the customer phone: sufficient proof of deletion without
re-introducing the PII.
What about analytics on customer data?
Per-tenant analytics (the/(dashboard)/analytics surface + the weekly-rollup API) reads
the same RLS-scoped tables. The operator-side cross-tenant analytics surface
(/admin/tenants/[id]/analytics) is per-tenant only: there’s no cross-tenant aggregation
on the operator surface today.
We do not run platform-wide analytics over customer content for any purpose (model training,
benchmarking, marketing, etc.).
Related docs
- Security overview: RLS, encryption, audit, rate limiting, SLOs
- Data retention: full four-class taxonomy, four architectural invariants, “your transcripts live in your CRM”
- Payments + PCI: SAQ-A scope, vault-redirect payment flow, hard-forbidden builds
- Compliance: DPA + region model + sub-processor disclosure
- Product → Guardrails: runtime safety policy