> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vorel.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data handling

> What Vorel collects, where it lives, how long it's retained, and how a customer / operator triggers right-to-access or right-to-erasure.

This page is the practical companion to [Compliance](/security/compliance). What is the data,
where does it live, who can read it, and how does a Data Subject Rights request flow?

## What Vorel collects

End-customer data (the actual lead / patient / guest / vehicle owner) sits in a small, intentional
set of tables. The full inventory is maintained internally; this is the
summary.

### High-sensitivity (Direct Identifier + Communication Content)

* **`conversations.customer_identifier`**: WhatsApp `wa_id` (E.164 phone) or voice DID. The
  primary cross-channel identity anchor.
* **`conversations.customer_name`**: set from WhatsApp profile name on first inbound; nullable.
* **`messages.content`**: every inbound + outbound message text. The transcript source-of-
  truth, append-only, never edited.
* **`messages.content_translated`**: translation of the same content (when language differs
  from tenant default).
* **`messages.media_url`**: inbound media (images / voice clips). May contain customer-uploaded
  selfies / IDs.
* **`leads.name`, `leads.email`, `leads.phone`**: collected by the qualification sub-agent.
* **`leads.notes`**: free-form; may quote message content.
* **`appointments.customer_name`, `customer_phone`, `customer_email`**: required at booking
  time for confirmation copy.
* **`appointments.location_text`**: sometimes the customer's address.
* **`leads.attributes`** (JSONB): vertical-specific slots; may embed strings the customer
  mentioned (employer, neighborhood, allergies, etc.).

### Medium-sensitivity (Behavioral / Metadata)

* **`conversations.customer_language`**: `'en' | 'ar'` detected from first message.
* **`conversations.tags`**: operator-curated tags.
* **`messages.tool_payload`**: JSON payload from tool calls; may include customer-derived
  strings (search queries, slot values from qualification).
* **`qa_evaluations.score` + `criteria`**: per-conversation grading; doesn't carry raw PII
  but links to it via FKs.

### Operator data (not end-customer PII)

* **`users.email`, `users.full_name`**: your dashboard team's profile, mirrored from our
  authentication provider.
* **`audit_log.ip_address`, `audit_log.user_agent`**: operator dashboard activity.
* **`audit_log.metadata`**: free-form JSONB per action; references customer ids by FK only.
  Raw customer PII is **never** stored here directly.

## Where data lives

### Primary store

* **Primary database** (managed by our hosting provider): every tenant-scoped table.
  Disk-encrypted at rest. Row-Level Security gates every read by `tenant_id`.
* **In-memory data store** (managed by our hosting provider): job queues + rate-limit counters.
  No direct PII (job payloads are short-lived; rate-limit keys are tenant id + IP / phone, not
  message content).

### Caches + working storage

* **In-process**: request-scoped tenant context, conversation history during a single dispatch.
  Not persisted.
* **Error monitoring**: exception capture. Our error-monitoring system redacts `vapk_*`
  API keys and known PII fields before sending. Customer PII can leak into stack traces if a
  downstream vendor returns it in an error message; the redactor catches the high-risk patterns.

### Ephemeral vendor stores

* **Voice-orchestration provider**: voice call orchestration; holds transcripts + recordings
  until our worker picks up the end-of-call report and persists what we want.
* **WhatsApp messaging provider**: business-messaging ingest. Holds inbound messages until our
  webhook receiver acknowledges.
* **Speech-to-text + text-to-speech providers**: process audio in-flight; no long-term storage
  on their side per their commercial terms.
* **LLM inference providers**: per their commercial-tier terms, requests are not used for model
  training.

### Object storage

The PII inventory references an object-storage bucket for voice clips. **This bucket is dormant
today**: the dormant infrastructure provisions it but Vorel's current deployment doesn't use it.
Recordings live on the voice-orchestration provider's CDN with the URL referenced in
`messages.media_url`. It reactivates only with the alternate cloud deployment path.

## Append-only tables

Three tables block UPDATE + DELETE at the DB level via Postgres triggers:

* **`messages`**: transcript integrity.
* **`audit_log`**: audit integrity.
* **`billing_events`**: financial-record integrity.

Mutation against these tables fails at the DB role level: even superuser writes are rejected
unless a dedicated operator-side database role explicitly bypasses for retention sweeps + the
right-to-erasure path.

## Right to access (PDPL Art. 15 / GDPR Art. 15)

`POST /api/tenant/export`: operator-gated. Returns a ZIP containing:

* `conversations.csv`
* `messages.csv`
* `leads.csv`
* `appointments.csv`
* `offerings.csv`
* `knowledge_base.csv`
* `audit_log.csv`
* `README.md`: chain-of-custody (timestamp, operator, scope, redaction settings)

**Default redaction:** customer email + phone are redacted. Pass `include_full_pii=true` for
the unredacted version (audit-logged with the operator + a justification field).

When a customer requests their data: your operator routes the request through this endpoint,
attaches the chain-of-custody README, and provides the resulting ZIP per your DSAR workflow.

## Right to erasure (PDPL Art. 17 / GDPR Art. 17)

`POST /api/tenant/forget`: operator-gated, **dry-run-by-default**.

The 7-step scrub for a single customer phone within a single tenant:

<Steps>
  <Step title="Tombstone conversations">
    `customer_identifier` replaced with `redacted-<tenant_id>-<uuid8>` so the conversation row
    survives but doesn't link to the real customer.
  </Step>

  <Step title="Redact message content">
    `messages.content` and `content_translated` overwritten with `[redacted]`. Append-only
    triggers permit this via the dedicated operator-side role's bypass; the standard tenant role still cannot.
  </Step>

  <Step title="Null leads PII">
    `leads.name` / `email` / `phone` / `notes` / `attributes` (the high-PII keys) nulled or
    redacted.
  </Step>

  <Step title="Null appointments PII">
    `appointments.customer_name` / `customer_phone` / `customer_email` / `location_text` /
    `notes` redacted.
  </Step>

  <Step title="Scrub audit-log JSONB references">
    Regex-replace E.164 phones + `wa_id`-shaped numbers in `audit_log.metadata` JSONB.
  </Step>

  <Step title="Salted-hash audit reference">
    A new audit row records the scrub, tagged with `sha256(salt + phone)` of the original
    customer identifier; this row IS the chain-of-custody artefact proving deletion happened.
    The salt is supplied by a required environment variable; without a salt the hash would be
    trivially reversible by any phone-number dictionary.
  </Step>

  <Step title="CRM-side scrub">
    The configured CRM driver's `deleteRecord` is called. Drivers without a delete API
    (Mindbody, Tekmetric) throw `delete_unsupported`; the audit row notes this so the operator
    knows manual rotation at the provider's dashboard is required.
  </Step>
</Steps>

The whole flow is **wrapped in a Postgres transaction** for atomicity. The dry-run mode runs
the same shape but reports what would change without committing.

## What right-to-erasure does NOT touch

* **`messages` row deletion**: we redact content, we don't delete the rows. The append-only
  trigger is intentional; the audit-log record needs the conversation row to survive for the
  chain of custody to make sense.
* **`audit_log` rows**: append-only. The deletion event is recorded as a new audit row, not
  by removing prior rows.
* **`billing_events`**: financial integrity supersedes per-customer scrub.
* **Webhook deliveries already-fired**: past deliveries to your tenant's outbound webhook URL
  carried the message content. Vorel can't reach those receivers to retroactively scrub.
  Customers should expect that data may also live in your downstream systems and address it
  there per your own retention policy.

## Per-class retention windows (ADR-locked)

<Note>
  **Retention windows are policy-locked: changing them requires a code change + a git-history-visible
  policy amendment, not a UI toggle.** Per-tenant overrides exist for narrowing retention below the
  platform default; widening past the platform default requires the same policy-amendment path. See
  [Data retention](/security/data-retention) for the full four-class taxonomy and the four
  architectural invariants.
</Note>

The platform default windows below are sealed by an internal, audited retention policy. Class-(c) tables hold conversation transcripts + PII and are cached only until a successful CRM mirror; class-(b) tables hold operational telemetry; class-(d) tables are long-lived by design with documented per-table rationale.

### Class (c): transcripts + PII (cached until CRM-write success)

| Vorel-side row                   | Platform-default retention                                  |
| -------------------------------- | ----------------------------------------------------------- |
| `messages` (per-turn transcript) | **7 days** post-insert AND post CRM-write success           |
| `conversations`                  | **30 days** post-close AND post CRM-write success           |
| `customers`                      | **30 days** post-last-touch AND post CRM-write success      |
| `customer_profiles`              | **30 days** post-update AND post CRM-write success          |
| `leads`                          | **30 days** post-update AND post CRM-write success          |
| `appointments`                   | **60 days** post-scheduled-end AND post CRM-write success   |
| `cases`                          | **90 days** post-close AND post CRM-write success           |
| `case_transitions`               | **90 days** post-insert AND post CRM-write success          |
| `case_transition_proposals`      | **90 days** post-decision AND post CRM-write success        |
| `case_messages`                  | **90 days** post-insert AND post CRM-write success          |
| `case_runtime_events`            | **90 days** post-insert AND post CRM-write success          |
| `qa_evaluations` (aggregate)     | **365 days** post-insert AND post CRM-write success         |
| `resolution_events`              | **365 days** post-classification AND post CRM-write success |

### Class (b): operational telemetry

| Vorel-side row                         | Platform-default retention | Purpose                                   |
| -------------------------------------- | -------------------------- | ----------------------------------------- |
| `voice_call_cost`                      | **365 days**               | Billing reconciliation                    |
| `voice_call_cost.payment_resolution`   | **365 days**               | PCI reconciliation                        |
| `voice_turn_latency`                   | **90 days**                | First-token-to-speech regression window   |
| `llm_calls`                            | **90 days**                | LLM usage analytics + cost reconciliation |
| `webhook_deliveries`                   | **30 days**                | Forensic-only after delivery              |
| `prompt_experiment_assignments`        | **90 days**                | Aligned with conversation purge cadence   |
| `prompt_variant_assignments`           | **90 days**                | Aligned with message purge                |
| `prompt_revision_proposals`            | **365 days**               | Quarterly operator-triage cadence         |
| `quality_signals` + `quality_failures` | **365 days**               | Trend rollups for quality dashboards      |
| `calibration_observations`             | **90 days**                | Aligned with message purge                |
| `shadow_dispatch_comparisons`          | **30 days**                | Pipeline-cutover research artifact        |
| `hallucination_flag_reviews`           | **365 days**               | Quality-trend analysis window             |

### Class (d): audit-only (long-lived)

`audit_log`, `billing_events`, `tenants`, `users`, `voice_cutover_event`, `lora_adapters`, `adapter_promotions`, `offerings`, `knowledge_base_entries`, `webhooks`, `voice_numbers`, `prompt_experiments`, `incidents`, `api_keys`, `prompt_variants`, `tenant_credentials`, `tenant_crm_field_mappings`, `tenant_prompt_overrides`: long-lived by design. Each table's "permitted long-lived" rationale is documented in the schema-audit matrix. `audit_log` retention is owned by the existing N-3 retention regime; the rest are configuration / structured-audit surfaces.

### Per-tenant overrides

Tenants who require shorter retention than the platform default can configure narrower windows during onboarding or via the operator console. The override is recorded as a per-tenant, per-table retention setting, emits an audit-log row visible to the tenant admin, and the next scheduled purge respects the narrower window. Widening past the platform default is not exposed as a tenant-side setting; that path goes through an audited policy amendment.

## Customer requests workflow

When a customer of yours (your end customer) requests a Data Subject Rights action:

<Steps>
  <Step title="You receive the request">
    Customer emails / messages your team requesting their data or asking to be forgotten.
  </Step>

  <Step title="Notify your Vorel operator">
    Email your operator (or use your support channel) with: customer's phone number + the nature of
    the request (export / erasure) + any required documentation per your DSAR process.
  </Step>

  <Step title="Operator runs the endpoint">
    Vorel's operator runs `/api/tenant/export` or `/api/tenant/forget` against your tenant scoped to
    the specific customer phone. Default settings; no `include_full_pii` unless you explicitly
    request it.
  </Step>

  <Step title="Receive the artefact">
    Operator sends you the ZIP (for export) or the audit-log reference (for erasure). The audit-log
    entry includes the salted-hash of the customer phone: sufficient proof of deletion without
    re-introducing the PII.
  </Step>

  <Step title="Forward to customer">
    Per your DSAR workflow, deliver the export ZIP to the customer or confirm their erasure against
    the audit-log reference.
  </Step>
</Steps>

PDPL + GDPR both set 30-day windows for DSAR fulfilment. Vorel's per-request flow runs in
minutes; the elapsed time is your operations cycle time.

## What about analytics on customer data?

Per-tenant analytics (the `/(dashboard)/analytics` surface + the `weekly-rollup` API) reads
the same RLS-scoped tables. The operator-side cross-tenant analytics surface
(`/admin/tenants/[id]/analytics`) is **per-tenant only**: there's no cross-tenant aggregation
on the operator surface today.

We do not run platform-wide analytics over customer content for any purpose (model training,
benchmarking, marketing, etc.).

## Related docs

* [Security overview](/security/overview): RLS, encryption, audit, rate limiting, SLOs
* [Data retention](/security/data-retention): full four-class taxonomy, four architectural invariants, "your transcripts live in your CRM"
* [Payments + PCI](/security/payments): SAQ-A scope, vault-redirect payment flow, hard-forbidden builds
* [Compliance](/security/compliance): DPA + region model + sub-processor disclosure
* [Product → Guardrails](/product/guardrails): runtime safety policy

{/* verified-against: apps/web/prisma/schema.prisma (RLS policies, append-only triggers on messages + audit_log + billing_events) */}

{/* verified-against: apps/web/src/app/api/tenant/export/route.ts (operator-gated; default redaction; chain-of-custody README) */}

{/* verified-against: apps/web/src/lib/crm/index.ts deleteRecord throws delete_unsupported for Mindbody / Tekmetric */}
