Skip to main content
This page is the practical companion to Compliance. What is the data, where does it live, who can read it, and how does a Data Subject Rights request flow?

What Vorel collects

End-customer data (the actual lead / patient / guest / vehicle owner) sits in a small, intentional set of tables. The full inventory is maintained internally; this is the summary.

High-sensitivity (Direct Identifier + Communication Content)

  • conversations.customer_identifier: WhatsApp wa_id (E.164 phone) or voice DID. The primary cross-channel identity anchor.
  • conversations.customer_name: set from WhatsApp profile name on first inbound; nullable.
  • messages.content: every inbound + outbound message text. The transcript source-of- truth, append-only, never edited.
  • messages.content_translated: translation of the same content (when language differs from tenant default).
  • messages.media_url: inbound media (images / voice clips). May contain customer-uploaded selfies / IDs.
  • leads.name, leads.email, leads.phone: collected by the qualification sub-agent.
  • leads.notes: free-form; may quote message content.
  • appointments.customer_name, customer_phone, customer_email: required at booking time for confirmation copy.
  • appointments.location_text: sometimes the customer’s address.
  • leads.attributes (JSONB): vertical-specific slots; may embed strings the customer mentioned (employer, neighborhood, allergies, etc.).

Medium-sensitivity (Behavioral / Metadata)

  • conversations.customer_language: 'en' | 'ar' detected from first message.
  • conversations.tags: operator-curated tags.
  • messages.tool_payload: JSON payload from tool calls; may include customer-derived strings (search queries, slot values from qualification).
  • qa_evaluations.score + criteria: per-conversation grading; doesn’t carry raw PII but links to it via FKs.

Operator data (not end-customer PII)

  • users.email, users.full_name: your dashboard team’s profile, mirrored from our authentication provider.
  • audit_log.ip_address, audit_log.user_agent: operator dashboard activity.
  • audit_log.metadata: free-form JSONB per action; references customer ids by FK only. Raw customer PII is never stored here directly.

Where data lives

Primary store

  • Primary database (managed by our hosting provider): every tenant-scoped table. Disk-encrypted at rest. Row-Level Security gates every read by tenant_id.
  • In-memory data store (managed by our hosting provider): job queues + rate-limit counters. No direct PII (job payloads are short-lived; rate-limit keys are tenant id + IP / phone, not message content).

Caches + working storage

  • In-process: request-scoped tenant context, conversation history during a single dispatch. Not persisted.
  • Error monitoring: exception capture. Our error-monitoring system redacts vapk_* API keys and known PII fields before sending. Customer PII can leak into stack traces if a downstream vendor returns it in an error message; the redactor catches the high-risk patterns.

Ephemeral vendor stores

  • Voice-orchestration provider: voice call orchestration; holds transcripts + recordings until our worker picks up the end-of-call report and persists what we want.
  • WhatsApp messaging provider: business-messaging ingest. Holds inbound messages until our webhook receiver acknowledges.
  • Speech-to-text + text-to-speech providers: process audio in-flight; no long-term storage on their side per their commercial terms.
  • LLM inference providers: per their commercial-tier terms, requests are not used for model training.

Object storage

The PII inventory references an object-storage bucket for voice clips. This bucket is dormant today: the dormant infrastructure provisions it but Vorel’s current deployment doesn’t use it. Recordings live on the voice-orchestration provider’s CDN with the URL referenced in messages.media_url. It reactivates only with the alternate cloud deployment path.

Append-only tables

Three tables block UPDATE + DELETE at the DB level via Postgres triggers:
  • messages: transcript integrity.
  • audit_log: audit integrity.
  • billing_events: financial-record integrity.
Mutation against these tables fails at the DB role level: even superuser writes are rejected unless a dedicated operator-side database role explicitly bypasses for retention sweeps + the right-to-erasure path.

Right to access (PDPL Art. 15 / GDPR Art. 15)

POST /api/tenant/export: operator-gated. Returns a ZIP containing:
  • conversations.csv
  • messages.csv
  • leads.csv
  • appointments.csv
  • offerings.csv
  • knowledge_base.csv
  • audit_log.csv
  • README.md: chain-of-custody (timestamp, operator, scope, redaction settings)
Default redaction: customer email + phone are redacted. Pass include_full_pii=true for the unredacted version (audit-logged with the operator + a justification field). When a customer requests their data: your operator routes the request through this endpoint, attaches the chain-of-custody README, and provides the resulting ZIP per your DSAR workflow.

Right to erasure (PDPL Art. 17 / GDPR Art. 17)

POST /api/tenant/forget: operator-gated, dry-run-by-default. The 7-step scrub for a single customer phone within a single tenant:
1

Tombstone conversations

customer_identifier replaced with redacted-<tenant_id>-<uuid8> so the conversation row survives but doesn’t link to the real customer.
2

Redact message content

messages.content and content_translated overwritten with [redacted]. Append-only triggers permit this via the dedicated operator-side role’s bypass; the standard tenant role still cannot.
3

Null leads PII

leads.name / email / phone / notes / attributes (the high-PII keys) nulled or redacted.
4

Null appointments PII

appointments.customer_name / customer_phone / customer_email / location_text / notes redacted.
5

Scrub audit-log JSONB references

Regex-replace E.164 phones + wa_id-shaped numbers in audit_log.metadata JSONB.
6

Salted-hash audit reference

A new audit row records the scrub, tagged with sha256(salt + phone) of the original customer identifier; this row IS the chain-of-custody artefact proving deletion happened. The salt is supplied by a required environment variable; without a salt the hash would be trivially reversible by any phone-number dictionary.
7

CRM-side scrub

The configured CRM driver’s deleteRecord is called. Drivers without a delete API (Mindbody, Tekmetric) throw delete_unsupported; the audit row notes this so the operator knows manual rotation at the provider’s dashboard is required.
The whole flow is wrapped in a Postgres transaction for atomicity. The dry-run mode runs the same shape but reports what would change without committing.

What right-to-erasure does NOT touch

  • messages row deletion: we redact content, we don’t delete the rows. The append-only trigger is intentional; the audit-log record needs the conversation row to survive for the chain of custody to make sense.
  • audit_log rows: append-only. The deletion event is recorded as a new audit row, not by removing prior rows.
  • billing_events: financial integrity supersedes per-customer scrub.
  • Webhook deliveries already-fired: past deliveries to your tenant’s outbound webhook URL carried the message content. Vorel can’t reach those receivers to retroactively scrub. Customers should expect that data may also live in your downstream systems and address it there per your own retention policy.

Per-class retention windows (ADR-locked)

Retention windows are policy-locked: changing them requires a code change + a git-history-visible policy amendment, not a UI toggle. Per-tenant overrides exist for narrowing retention below the platform default; widening past the platform default requires the same policy-amendment path. See Data retention for the full four-class taxonomy and the four architectural invariants.
The platform default windows below are sealed by an internal, audited retention policy. Class-(c) tables hold conversation transcripts + PII and are cached only until a successful CRM mirror; class-(b) tables hold operational telemetry; class-(d) tables are long-lived by design with documented per-table rationale.

Class (c): transcripts + PII (cached until CRM-write success)

Vorel-side rowPlatform-default retention
messages (per-turn transcript)7 days post-insert AND post CRM-write success
conversations30 days post-close AND post CRM-write success
customers30 days post-last-touch AND post CRM-write success
customer_profiles30 days post-update AND post CRM-write success
leads30 days post-update AND post CRM-write success
appointments60 days post-scheduled-end AND post CRM-write success
cases90 days post-close AND post CRM-write success
case_transitions90 days post-insert AND post CRM-write success
case_transition_proposals90 days post-decision AND post CRM-write success
case_messages90 days post-insert AND post CRM-write success
case_runtime_events90 days post-insert AND post CRM-write success
qa_evaluations (aggregate)365 days post-insert AND post CRM-write success
resolution_events365 days post-classification AND post CRM-write success

Class (b): operational telemetry

Vorel-side rowPlatform-default retentionPurpose
voice_call_cost365 daysBilling reconciliation
voice_call_cost.payment_resolution365 daysPCI reconciliation
voice_turn_latency90 daysFirst-token-to-speech regression window
llm_calls90 daysLLM usage analytics + cost reconciliation
webhook_deliveries30 daysForensic-only after delivery
prompt_experiment_assignments90 daysAligned with conversation purge cadence
prompt_variant_assignments90 daysAligned with message purge
prompt_revision_proposals365 daysQuarterly operator-triage cadence
quality_signals + quality_failures365 daysTrend rollups for quality dashboards
calibration_observations90 daysAligned with message purge
shadow_dispatch_comparisons30 daysPipeline-cutover research artifact
hallucination_flag_reviews365 daysQuality-trend analysis window

Class (d): audit-only (long-lived)

audit_log, billing_events, tenants, users, voice_cutover_event, lora_adapters, adapter_promotions, offerings, knowledge_base_entries, webhooks, voice_numbers, prompt_experiments, incidents, api_keys, prompt_variants, tenant_credentials, tenant_crm_field_mappings, tenant_prompt_overrides: long-lived by design. Each table’s “permitted long-lived” rationale is documented in the schema-audit matrix. audit_log retention is owned by the existing N-3 retention regime; the rest are configuration / structured-audit surfaces.

Per-tenant overrides

Tenants who require shorter retention than the platform default can configure narrower windows during onboarding or via the operator console. The override is recorded as a per-tenant, per-table retention setting, emits an audit-log row visible to the tenant admin, and the next scheduled purge respects the narrower window. Widening past the platform default is not exposed as a tenant-side setting; that path goes through an audited policy amendment.

Customer requests workflow

When a customer of yours (your end customer) requests a Data Subject Rights action:
1

You receive the request

Customer emails / messages your team requesting their data or asking to be forgotten.
2

Notify your Vorel operator

Email your operator (or use your support channel) with: customer’s phone number + the nature of the request (export / erasure) + any required documentation per your DSAR process.
3

Operator runs the endpoint

Vorel’s operator runs /api/tenant/export or /api/tenant/forget against your tenant scoped to the specific customer phone. Default settings; no include_full_pii unless you explicitly request it.
4

Receive the artefact

Operator sends you the ZIP (for export) or the audit-log reference (for erasure). The audit-log entry includes the salted-hash of the customer phone: sufficient proof of deletion without re-introducing the PII.
5

Forward to customer

Per your DSAR workflow, deliver the export ZIP to the customer or confirm their erasure against the audit-log reference.
PDPL + GDPR both set 30-day windows for DSAR fulfilment. Vorel’s per-request flow runs in minutes; the elapsed time is your operations cycle time.

What about analytics on customer data?

Per-tenant analytics (the /(dashboard)/analytics surface + the weekly-rollup API) reads the same RLS-scoped tables. The operator-side cross-tenant analytics surface (/admin/tenants/[id]/analytics) is per-tenant only: there’s no cross-tenant aggregation on the operator surface today. We do not run platform-wide analytics over customer content for any purpose (model training, benchmarking, marketing, etc.).