Thesis: You don’t need to choose between growth and privacy. You need a smaller, better-designed analytics stack: keep the signals that drive decisions, transform what you can’t keep raw, and drop the rest. Treat privacy as a product requirement—measured, versioned, and audited.
1) Principles (use these to decide every time)
- Decision-first: If a data point doesn’t change a decision or a model, don’t collect it.
- Data minimization: Collect the least granularity, for the shortest time, with the fewest people having access.
- Context integrity: Don’t repurpose data outside the user’s context (product usage ≠ ad targeting without consent).
- Consent & control: Make the user’s state drive the instrumentation (essential → analytics → ads).
- Defense in depth: Pseudonymize, aggregate, limit retention, and log access—even when you have consent.
2) The Keep / Transform / Drop table
Web & Product Analytics
- Keep (as is)
- Page views, sessions, referrer without third-party cookies
- Event counts and funnels at session or day grain
- Core Web Vitals and error/latency metrics
- A/B test assignment and outcomes (user-level only with consent)
- Transform (before storage)
- IP → coarse geo (country/region) or drop entirely
- User identifiers → rotate daily/weekly pseudonymous IDs; salt+hash login emails (consented)
- Free-text fields → strip PII; run allow-lists
- Timestamps → bucket to 1–6h windows when user-level isn’t needed
- Drop
- Device fingerprinting (canvas, fonts, battery, etc.)
- Cross-site tracking IDs and 3P cookies
- Precise geo (GPS), unless the product function requires it
- Sensitive attributes (health, religion, sexuality, political opinion) unless explicitly necessary and lawfully consented
Ads Measurement
- Keep
- Channel/creative spend and impressions/clicks at campaign or ad-set level
- Modeled conversions and incremental lift from platform studies
- Geo-level outcomes for holdouts / switchbacks
- Transform
- Use clean rooms or server-side conversion APIs with scoped, consented IDs
- Store cohorts (e.g., by DMA/week) instead of individuals for attribution
- Drop
- Cross-platform identity stitching without explicit consent
- Long-lived user graphs for prospecting
CRM & Lifecycle
- Keep
- Email/SMS send, open, click, unsubscribe, purchase events tied to consented profiles
- Transform
- Hash emails when joining with ad platforms (with consent & TTL)
- Suppress raw message bodies in analytics; keep metadata only
- Drop
- Uploading whole customer lists to platforms without documented consent & purpose
3) Consent-gated instrumentation (practical)
Define tiers: essential, analytics, ads. Instrument only what the user allowed.
function track(eventName, props, consent) {
if (!consent.essential) return; // nothing fires
const allowAnalytics = consent.analytics === true;
const allowAds = consent.ads === true;
const safeProps = scrub(props); // remove PII, enforce types
if (eventName.endsWith('_error') || eventName.includes('perf')) {
sendTo('observability', safeProps); // essential
}
if (allowAnalytics) sendTo('product_analytics', safeProps);
if (allowAds && isConversion(eventName)) {
sendTo('server_side_capi', capiShape(safeProps)); // scoped fields only
}
}
Scrub rules: no emails/phones in events; IDs are opaque; IP is dropped or geo-coarsened at the edge; timestamps bucketed where possible.
4) Aggregation patterns that still answer the business question
- Funnel health: store daily counts per step per channel/market; evaluate conversion with confidence intervals—no user table required.
- Attribution: move from click-path MTA to geo holdouts + MMM (weekly spend vs outcomes).
- Retention: cohort by month and plan tier; report logos/revenue retained and expansion at cohort level.
- Experimentation: user-level only under analytics consent; otherwise run geo/switchback designs.
SQL sketch (cohort retention without user PII):
SELECT signup_month,
month_since_signup,
COUNT(*) AS accounts_active
FROM cohort_monthly_activity -- aggregated upstream
GROUP BY 1,2;
5) Retention schedules (default to shorter)
- Raw edge logs: ≤ 7 days
- User-level analytics (consented): 3–6 months, then aggregate & delete raw
- Aggregates (daily/weekly): 24 months (revisit yearly)
- Join keys for paid media: TTL 30–90 days; rotate salts monthly
- Access logs & consent records: as long as required for compliance
6) Access & governance (make it boring and safe)
- Least privilege: analysts read aggregates by default; user-level behind a break-glass process.
- Row-level security: partition by region (EEA vs rest) and data class.
- Audit trails: log who queried user-level tables and why.
- Change management: event specs + schema registry; privacy review in PR template.
- DSR readiness: locate/delete by user key within SLA; test monthly.
7) What to do when you really need granularity
- On-device/edge analytics: compute metrics locally; send only aggregates.
- Private joins / clean rooms: scoped queries, sandboxed outputs, no raw export.
- Noise & k-anonymity: add noise or require k ≥ 50 per group before release.
- Short windows: analyze at 24–72h windows then roll up.
If these constraints break the use case, don’t ship the use case.
8) Cookbook: replace creep with craft
Instead of: 3P cookie trails + fingerprinting
Do: server-side events + geo holdouts + MMM (+/- Bayesian) for budget allocation
Instead of: raw session replays for everyone
Do: sample 1–5% with consent; mask inputs; keep 7 days
Instead of: email uploads for lookalikes
Do: modeled conversions + creative testing + publisher clean rooms with consented lists
Instead of: storing IPs for fraud checks forever
Do: risk score at the edge + discard IP; retain score & reason codes
9) 30-60-90 day plan
Days 1–30 — Inventory & kill list
- Map all data flows; label fields (PII/sensitive/essential/derived)
- Turn on CMP; wire tiered consent gating
- Drop obvious creep (fingerprinting, 3P cookies, precise geo)
Days 31–60 — Rebuild the essentials
- Ship event specs; enforce types and scrub rules
- Move conversions server-side; set TTLs; rotate salts
- Stand up aggregated fact tables; add retention jobs
Days 61–90 — Prove value
- Replace click-path attribution with one geo holdout + MMM-lite
- Run a consented A/B; report lift with CIs
- Publish the privacy posture: what we keep, transform, drop—and why
10) Board-level policy (one slide)
- We collect only what changes decisions.
- User choice gates our instrumentation.
- Identifiers are short-lived, salted, and minimized.
- Aggregates are the default; experiments provide causality.
- We can fulfill delete/export requests on time, every time.
Bottom line
Shrink the surface area, not your ambition. Keep decision-making signals, transform what’s sensitive, and drop what’s creepy. When privacy is designed in, analytics becomes simpler, clearer, and more defensible—and your growth story gets stronger, not weaker.
Add comment