Data Quality Debt: Avoiding Dashboard Theater

Thesis: Dashboards don’t fail because they’re ugly—they fail because the data contract is weak. Data quality debt accrues interest as broken trust, bad decisions, and rework. The cure isn’t “more charts,” it’s a Decision‑First Data OS: contracts → tests → observability → runbooks → governance tied to the business.

1) Definitions

Data Quality Debt: The compounding cost of missing contracts, ambiguous metrics, and untested pipelines. It grows every time a new dataset is shipped without owners, tests, or SLAs.
Dashboard Theater: Attractive but untrusted charts that optimize for presentation, not decision. Symptoms: weekly copy‑paste rituals, conflicting numbers across teams, “can you export this to CSV?” requests.

2) The Decision‑First Data OS (overview)

Decision & Metric Contracts — Start from the decision: what will change if the number moves? Create a metric one‑pager (definition, query, owner, thresholds, caveats).
Data Contracts — Formal schemas at source boundaries; typed fields; PII policy; breaking‑change rules.
Quality Tests — At ingest, transform, and publish layers; promote only if tests pass.
Observability & SLAs — Freshness, volume, nulls, distribution drift, lineage, error budgets.
Runbooks — Incident severities, rollback/backfill steps, communication templates.
Governance — RACI, change logs, dashboard lifecycle (birth → usage → retirement).

3) Quality Dimensions → SLOs (set targets you can defend)

Dimension	What it means	Example SLO	Guardrail
Freshness	Data arrives on time	`orders` ≤ 20 min delay p95	Freeze downstream if > 1h
Completeness	Expected rows/fields present	Day‑1 coverage ≥ 99.5%	Alert at –1σ vs 4‑week mean
Validity	Values match type/range	`price_cents ≥ 0`, ISO dates	Drop/flag invalid rows
Uniqueness	No duplicates where keys should be unique	`order_id` unique/day	Hard fail build
Consistency	Same business rule across tables	`revenue = qty×price`	Diff check < 0.5%
Accuracy	Matches ground truth/source of record	Payment totals within ±0.2%	Reconcile daily
Lineage	Trace from metric → sources	Auto‑updated graph	Required for sign‑off

Error budget: percent of time an SLO can be breached before a change freeze (e.g., 1% monthly for critical tables).

4) Test Matrix (attach tests where they matter)

Ingest (raw → staged)

Schema checksum, column count, types, required fields non‑null.
Volume vs 4‑week rolling mean; outlier guardrails.
PII scan against allowlist.

Transform (staged → modeled)

Primary/foreign‑key constraints (surrogate keys OK).
Valid ranges & enums; referential integrity.
Business rules: gross = net + tax + shipping (tolerances).
Slowly changing dims: no retroactive key churn.

Publish (modeled → marts/BI)

Metric recon: day‑over‑day diffs within tolerance.
Freshness SLO checks; row‑level sample audits.
Experiment invariants (SRM) where applicable.

Naming tests
<table>::<layer>::<dimension> → orders::transform::uniqueness_order_id

Name: Monthly Active Accounts (MAA)
Definition: Distinct account_id with ≥1 core_action in calendar month
SQL: link to versioned query
Owner: Analytics → Jane D.
Consumers: Exec weekly, Growth monthly
Guardrails: event latency p95 < 300ms; SRM alarms in experiments
Caveats: excludes sandbox/test tenants; backfills marked
Change log: YYYY‑MM‑DD reason, PR link

6) Data Contract (YAML stub)

name: orders
owner: data-platform
schema:
  order_id: {type: string, constraints: [primary_key]}
  account_id: {type: string, constraints: [not_null]}
  price_cents: {type: integer, constraints: [min: 0]}
  currency: {type: string, constraints: [enum: [USD, EUR, CAD]]}
  created_at: {type: timestamp, constraints: [not_null]}
slas:
  freshness_p95_minutes: 20
  completeness_min_pct: 99.5
breaking_change_protocol: require-new-column; deprecate-after: 30d
pii_policy: forbid

7) Observability Signals (what to alert on)

Freshness lag vs SLO
Row count anomaly (±3σ or robust z‑score)
Null ratio drift per critical column
Dimension cardinality spikes (e.g., status unexpected values)
Join key coverage (user_id, account_id, session_id)
Metric diffs vs prior day/week beyond tolerance
Lineage break (upstream failure)

Alert routing: Sev‑1 (payments, revenue) → Pager on‑call; Sev‑2 (product analytics) → Slack + ticket; Sev‑3 (ad‑hoc) → daily digest.

8) Runbooks (when—not if—things break)

Sev‑1 checklist

Announce incident with scope & affected metrics.
Stop the bleeding: halt downstream builds; set banner on BI with timestamp.
Identify regression PR; roll back or hotfix; re‑run backfill with checkpoints.
Reconcile vs source of record; attach evidence.
Close with RCA doc and prevention action (test added, contract updated).

Communication template

Status: Data incident Sev‑1. Impact: revenue dashboards stale since 09:40 UTC. ETA: 30–45m. Workaround: export from billing app if urgent. Next update: 10:15 UTC.

9) Dashboard Lifecycle (end the theater)

Birth: requires metric one‑pager + owner + SLOs.
Usage SLO: viewed by ≥2 teams or ≥N users/month.
Review: quarterly redundancy check; consolidate overlapping boards.
Retirement: archive if usage below SLO for 2 cycles or metric deprecated.
Badge trust: green (all SLOs), yellow (minor breach), red (do not use).

10) Program KPIs (measure the Data OS)

Trust score: share of dashboards green
MTTD/MTTR: mean time to detect/resolve incidents
Test coverage: % critical tables with ≥1 test per dimension
SLO adherence: error budget burn/month
Dashboard count: net reduction of redundant boards
Decision velocity: time from question → approved metric → decision

11) 30‑60‑90 Day Plan

Days 1–30 — Inventory critical tables/metrics; write metric one‑pagers; add basic tests (schema, nulls, uniqueness); set SLOs; add freshness/volume alerts.
Days 31–60 — Implement data contracts on top 5 sources; expand tests to validity/consistency; wire lineage; publish runbooks; badge dashboards.
Days 61–90 — Error budgets + change freeze policy; quarterly dashboard review; add distribution‑drift detection; tie a11y/latency to BI performance.

12) SQL & dbt Snippets

Freshness check (BigQuery)

SELECT TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), MAX(created_at), MINUTE) AS minutes_since_last
FROM raw.orders;

dbt tests (schema.yml fragment)

models:
  - name: fct_orders
    tests:
      - unique:
          column_name: order_id
      - not_null:
          column_name: account_id
      - relationships:
          to: dim_accounts
          field: account_id
      - accepted_values:
          column_name: currency
          values: [USD, EUR, CAD]

13) Anti‑Theater Checklist (print this)

Metric has one‑pager, owner, versioned SQL
Table has contract, tests, SLOs
Observability alerts wired (freshness, volume, nulls, diffs)
Dashboard shows data freshness badge
Runbook link visible on dashboard
Retirement criteria scheduled

14) ROI Model (quick math)

Let I be incidents/month, H hours/incident, C blended cost/hour, R revenue at risk/day, p probability a Sev‑1 misguides a decision.
Monthly cost of data debt ≈ I × H × C + p × R.
Even halving I or p via SLOs/tests often pays for the Data OS in < 1 quarter.

15) Comms Kit

Exec one‑liner:
“We don’t ship dashboards without contracts, tests, and owners. If the trust badge is red, we don’t use it to decide.”

LinkedIn post (short):
Dashboard theater dies when your data has a contract. Define the decision, write the metric one‑pager, enforce SLOs (freshness, completeness, validity), and add runbooks. Trust goes up, rework goes down.

16) SEO Kit

Title (≤60): Data Quality Debt: Avoiding Dashboard Theater
Meta (≤160): A decision‑first playbook to kill dashboard theater—data contracts, quality tests, observability, runbooks, and governance linked to business outcomes.
Slug: /data-quality-debt-dashboard-theater
Keywords: data quality debt, dashboard theater, data contracts, data quality tests, freshness completeness validity, data observability, lineage, data SLA, error budgets, dbt tests

17) Image Briefs

Cover: Editorial 3D—two scenes split diagonally: left shows glossy dashboards with a red trust badge; right shows a contract, checklist, and green shield over a lineage graph. Cool blue palette, subtle grid.
Diagram: Data OS flow: Decisions → Metric Contracts → Data Contracts → Tests → Observability → Runbooks → Governance.

Bottom line: Treat dashboards as outputs of a robust data system, not the system itself. Contracts, tests, and SLAs turn data from theater into decisions you can defend.

Working Hours

Data Quality Debt: Avoiding Dashboard Theater