Privacy-Safe Analytics: What to Keep, What to Drop

Thesis: You don’t need to choose between growth and privacy. You need a smaller, better-designed analytics stack: keep the signals that drive decisions, transform what you can’t keep raw, and drop the rest. Treat privacy as a product requirement—measured, versioned, and audited.

1) Principles (use these to decide every time)

Decision-first: If a data point doesn’t change a decision or a model, don’t collect it.
Data minimization: Collect the least granularity, for the shortest time, with the fewest people having access.
Context integrity: Don’t repurpose data outside the user’s context (product usage ≠ ad targeting without consent).
Consent & control: Make the user’s state drive the instrumentation (essential → analytics → ads).
Defense in depth: Pseudonymize, aggregate, limit retention, and log access—even when you have consent.

2) The Keep / Transform / Drop table

Web & Product Analytics

Keep (as is)
- Page views, sessions, referrer without third-party cookies
- Event counts and funnels at session or day grain
- Core Web Vitals and error/latency metrics
- A/B test assignment and outcomes (user-level only with consent)
Transform (before storage)
- IP → coarse geo (country/region) or drop entirely
- User identifiers → rotate daily/weekly pseudonymous IDs; salt+hash login emails (consented)
- Free-text fields → strip PII; run allow-lists
- Timestamps → bucket to 1–6h windows when user-level isn’t needed
Drop
- Device fingerprinting (canvas, fonts, battery, etc.)
- Cross-site tracking IDs and 3P cookies
- Precise geo (GPS), unless the product function requires it
- Sensitive attributes (health, religion, sexuality, political opinion) unless explicitly necessary and lawfully consented

Ads Measurement

Keep
- Channel/creative spend and impressions/clicks at campaign or ad-set level
- Modeled conversions and incremental lift from platform studies
- Geo-level outcomes for holdouts / switchbacks
Transform
- Use clean rooms or server-side conversion APIs with scoped, consented IDs
- Store cohorts (e.g., by DMA/week) instead of individuals for attribution
Drop
- Cross-platform identity stitching without explicit consent
- Long-lived user graphs for prospecting

CRM & Lifecycle

Keep
- Email/SMS send, open, click, unsubscribe, purchase events tied to consented profiles
Transform
- Hash emails when joining with ad platforms (with consent & TTL)
- Suppress raw message bodies in analytics; keep metadata only
Drop
- Uploading whole customer lists to platforms without documented consent & purpose

Define tiers: essential, analytics, ads. Instrument only what the user allowed.

function track(eventName, props, consent) {
  if (!consent.essential) return; // nothing fires
  const allowAnalytics = consent.analytics === true;
  const allowAds = consent.ads === true;

  const safeProps = scrub(props); // remove PII, enforce types

  if (eventName.endsWith('_error') || eventName.includes('perf')) {
    sendTo('observability', safeProps); // essential
  }

  if (allowAnalytics) sendTo('product_analytics', safeProps);

  if (allowAds && isConversion(eventName)) {
    sendTo('server_side_capi', capiShape(safeProps)); // scoped fields only
  }
}

Scrub rules: no emails/phones in events; IDs are opaque; IP is dropped or geo-coarsened at the edge; timestamps bucketed where possible.

4) Aggregation patterns that still answer the business question

Funnel health: store daily counts per step per channel/market; evaluate conversion with confidence intervals—no user table required.
Attribution: move from click-path MTA to geo holdouts + MMM (weekly spend vs outcomes).
Retention: cohort by month and plan tier; report logos/revenue retained and expansion at cohort level.
Experimentation: user-level only under analytics consent; otherwise run geo/switchback designs.

SQL sketch (cohort retention without user PII):

SELECT signup_month,
       month_since_signup,
       COUNT(*) AS accounts_active
FROM cohort_monthly_activity  -- aggregated upstream
GROUP BY 1,2;

5) Retention schedules (default to shorter)

Raw edge logs: ≤ 7 days
User-level analytics (consented): 3–6 months, then aggregate & delete raw
Aggregates (daily/weekly): 24 months (revisit yearly)
Join keys for paid media: TTL 30–90 days; rotate salts monthly
Access logs & consent records: as long as required for compliance

6) Access & governance (make it boring and safe)

Least privilege: analysts read aggregates by default; user-level behind a break-glass process.
Row-level security: partition by region (EEA vs rest) and data class.
Audit trails: log who queried user-level tables and why.
Change management: event specs + schema registry; privacy review in PR template.
DSR readiness: locate/delete by user key within SLA; test monthly.

7) What to do when you really need granularity

On-device/edge analytics: compute metrics locally; send only aggregates.
Private joins / clean rooms: scoped queries, sandboxed outputs, no raw export.
Noise & k-anonymity: add noise or require k ≥ 50 per group before release.
Short windows: analyze at 24–72h windows then roll up.

If these constraints break the use case, don’t ship the use case.

8) Cookbook: replace creep with craft

Instead of: 3P cookie trails + fingerprinting
Do: server-side events + geo holdouts + MMM (+/- Bayesian) for budget allocation

Instead of: raw session replays for everyone
Do: sample 1–5% with consent; mask inputs; keep 7 days

Instead of: email uploads for lookalikes
Do: modeled conversions + creative testing + publisher clean rooms with consented lists

Instead of: storing IPs for fraud checks forever
Do: risk score at the edge + discard IP; retain score & reason codes

9) 30-60-90 day plan

Days 1–30 — Inventory & kill list

Map all data flows; label fields (PII/sensitive/essential/derived)
Turn on CMP; wire tiered consent gating
Drop obvious creep (fingerprinting, 3P cookies, precise geo)

Days 31–60 — Rebuild the essentials

Ship event specs; enforce types and scrub rules
Move conversions server-side; set TTLs; rotate salts
Stand up aggregated fact tables; add retention jobs

Days 61–90 — Prove value

Replace click-path attribution with one geo holdout + MMM-lite
Run a consented A/B; report lift with CIs
Publish the privacy posture: what we keep, transform, drop—and why

10) Board-level policy (one slide)

We collect only what changes decisions.
User choice gates our instrumentation.
Identifiers are short-lived, salted, and minimized.
Aggregates are the default; experiments provide causality.
We can fulfill delete/export requests on time, every time.

Bottom line

Shrink the surface area, not your ambition. Keep decision-making signals, transform what’s sensitive, and drop what’s creepy. When privacy is designed in, analytics becomes simpler, clearer, and more defensible—and your growth story gets stronger, not weaker.

Working Hours

Privacy-Safe Analytics: What to Keep, What to Drop