Thesis: Analytics breaks not from lack of data but from inconsistent names. A scalable event taxonomy is a contract across product, data, and marketing: consistent verbs, predictable nouns, typed properties, and governance to prevent drift.
1) Goals & Non‑Goals
Goals:
- Make events guessable (developers) and joinable (analysts).
- Keep a stable OEC/metrics layer intact across releases.
- Enable experiments and attribution with clean context.
- Protect privacy and unit economics (no PII in names; typed props).
Non‑Goals: BI dashboard themes, ad‑hoc one‑off event names, synonyms.
2) Event Model (3 layers)
- Name —
object_actioninsnake_case(noun first, verb second). - Properties — typed fields describing the event (
snake_case). - Context — shared envelopes (identity, device, marketing, experiment, geo, app version, consent).
Why object_action: Groups by object (e.g., checkout_*), keeps verbs finite (see §3), and mirrors domain models.
3) Naming Rules (style guide)
- Case & charset:
snake_case, lowercase a–z, digits,_. Regex:^[a-z][a-z0-9]*(?:_[a-z0-9]+)*$. - Order:
object_action(singular noun + present‑tense verb):signup_submitted,cart_abandoned,invoice_paid. - Verb whitelist (finite):
viewed,opened,started,completed,clicked,submitted,created,updated,deleted,added,removed,shared,purchased,paid,refunded. - No synonyms: choose one (
signupnotsign_up/register). - No PII in names or property keys (no emails/IDs in the name).
- Length: ≤ 36 chars for event names; ≤ 30 chars for property keys.
- Units & suffixes:
_id(opaque string),_count(int),_amount(int cents),_currency(ISO‑4217),_rate(float 0–1),_at(ISO‑8601 UTC),_ms(milliseconds). - Collections: plural for lists (
items), singular for scalars (item_id). - Versions: breaking changes → new event with
_v2; never recycle names. - Deprecation: freeze old name; emit both for one release with
deprecated=true; remove after 2 cycles.
4) Canonical Context (attach to every event)
Identity
user_id(stable 1P),anonymous_id(pre‑auth),account_id(B2B),session_id.
Device/App/Webplatform(ios|android|web|backend),app_version,os,browser,screen.
Marketingutm_source,utm_medium,utm_campaign,utm_content,utm_term,referrer,gclid|fbclid(if present).
Experimentexp_name,exp_variant(per active experiment; array OK).
Geo/Localecountry_iso,locale,timezone.
Consent/Privacyconsent_advertising(bool),consent_analytics(bool),gdpr_region(bool).
Note: Keep context in a reserved namespace (e.g., context.*) to avoid collisions with event properties.
5) Required Properties by Pattern
- Form submit (
*_submitted) →form_id,field_error_count,validation_ms. - Checkout (
checkout_*) →order_id,items(array of{sku, qty, unit_amount, currency}),subtotal_amount,discount_amount,tax_amount,shipping_amount,total_amount. - Content (
article_*,video_*) →content_id,content_type,topic,duration_ms,position_ms. - B2B actions →
account_id,seat_count,plan_tier,arr_amount,billing_period. - Auth →
method(password|oauth|sso),mfa_used(bool). - Error →
error_code,error_message(truncated),severity(info|warn|error).
6) Domain Blueprints (starter libraries)
A) B2B SaaS
workspace_created,invite_sent,invite_accepted,data_import_started|completed|failed,dashboard_created,report_shared,billing_updated,invoice_paid,plan_changed.
B) E‑commerce
product_viewed,product_added,cart_viewed,checkout_started,address_submitted,payment_submitted,order_purchased,order_refunded,order_cancelled.
C) Content/Media
article_viewed,cta_clicked,subscription_started,subscription_cancelled,video_started|completed,ad_skipped.
7) Examples
Event JSON (B2B purchase)
{
"event": "invoice_paid",
"user_id": "u_9f3a",
"account_id": "acc_42",
"context": {
"platform": "web", "app_version": "2.18.0",
"utm_source": "linkedin", "utm_medium": "paid_social",
"exp_name": ["pricing_table_v2"], "exp_variant": ["B"]
},
"properties": {
"invoice_id": "in_8127",
"total_amount": 125000,
"currency": "USD",
"plan_tier": "pro",
"billing_period": "monthly",
"payment_method": "card"
},
"timestamp": "2025-09-15T14:23:11Z"
}
SQL (funnel: checkout → purchase)
WITH steps AS (
SELECT user_id,
MIN(CASE WHEN event='checkout_started' THEN timestamp END) AS t_checkout,
MIN(CASE WHEN event='payment_submitted' THEN timestamp END) AS t_pay,
MIN(CASE WHEN event='order_purchased' THEN timestamp END) AS t_order
FROM events
WHERE timestamp >= DATE('2025-09-01')
GROUP BY 1
)
SELECT
COUNTIF(t_checkout IS NOT NULL) AS started,
COUNTIF(t_pay IS NOT NULL) AS payment,
COUNTIF(t_order IS NOT NULL) AS purchased,
SAFE_DIVIDE(COUNTIF(t_order IS NOT NULL), COUNTIF(t_checkout IS NOT NULL)) AS checkout_to_purchase_cvr
FROM steps;
8) UTM & Campaign Hygiene (copy‑paste)
utm_source=platform
utm_medium={paid_social|paid_search|email|referral|organic_social}
utm_campaign=yyyymm_topic_or_offer
utm_content=creativeAngle_adFormat_v1
utm_term={keyword or audience}
Rules: all lowercase; hyphens allowed in values; never put PII in UTMs; map to context.* on ingest.
9) Typing & Units (do this, not that)
- Amounts: store in minor units (cents) as integers; pair with
currency(USD/EUR). - Time: ISO‑8601 UTC strings (
*_at) and/or epoch ms (*_ms). - Booleans: true/false, never
"yes"/"no". - Enums: documented allowed values; reject on ingest if not in list.
- IDs: opaque strings; never emails/phone as IDs. For emails, store hashes where needed.
- PII: keep in restricted tables; don’t mirror into event streams unless necessary and consented.
10) Governance & Change Management
Artifacts:
- Metric dictionary: NSM, driver metrics, guardrails; queries; owners.
- Event spec (one MD file per event): purpose, triggering code, required properties, sample payload, owner, downstream tables, deprecation plan.
- Schema registry: JSON Schema per event; semver version; linter.
Process:
- Propose in PR → review by Data/Analytics.
- Emit behind a flag to staging; run AA test on volume and invariants.
- Release; keep old + new names for 1–2 cycles if breaking.
- Update dictionary, BI, and dbt models; note in changelog.
Linter (regex) examples:
- Event names:
^[a-z][a-z0-9]*(?:_[a-z0-9]+)*$ - Property keys: same as above; forbid reserved keys (
event,timestamp,context).
11) QA & Monitoring
- Volume & SRM: alert on unexpected dips/spikes; sample‑ratio mismatch on A/B splits.
- Nulls/missingness: <1% for required fields; drop events that fail schema.
- Latency: event ingest p95 < 300ms (client→collector).
- Join keys:
user_id,account_id,session_idcoverage monitored. - Privacy: consent flags present for ad/analytics where required.
12) 14‑Day Implementation Plan
Days 1–3: Inventory current events; cluster by object_*; map to metrics.
Days 4–5: Approve verb whitelist; freeze synonyms; publish style guide.
Days 6–8: Draft top‑50 event specs (MD templates); add JSON Schemas.
Days 9–10: Build linter in CI (regex + Schema validation); set reject rules.
Days 11–12: Stage rollout behind flags; AA test volume & invariants.
Days 13–14: Release; backfill dbt models; publish metric dictionary and changelog.
13) Templates
Event spec (Markdown)
# <event_name>
**Purpose:** …
**Trigger:** …
**Required properties:** …
**Optional properties:** …
**Sample payload:** JSON
**Owner:** Team / person
**Downstream:** tables, dashboards
**Deprecation:** replaces <old_event> at version/date
JSON Schema (stub)
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "invoice_paid",
"type": "object",
"required": ["event", "user_id", "timestamp", "properties"],
"properties": {
"event": {"const": "invoice_paid"},
"user_id": {"type": "string"},
"timestamp": {"type": "string", "format": "date-time"},
"properties": {
"type": "object",
"required": ["invoice_id", "total_amount", "currency"],
"properties": {
"invoice_id": {"type": "string"},
"total_amount": {"type": "integer", "minimum": 0},
"currency": {"type": "string", "pattern": "^[A-Z]{3}$"}
}
}
}
}
14) SEO Kit
- Title (≤60): Event Taxonomy: Naming Conventions That Scale
- Meta (≤160): A practical naming system for analytics events—object_action names, typed properties, context, governance, and templates that prevent drift.
- Slug:
/event-taxonomy-naming-conventions - Keywords: event taxonomy, analytics naming conventions, object action events, event properties schema, json schema events, utm hygiene, event governance, event dictionary template, analytics linter, schema registry
Bottom line: Pick one naming pattern, lock a verb whitelist, type your properties, and govern change. That’s how event data stays trustworthy as you scale.
Add comment