Monitoring dbt Models in Production: Beyond Build-Time Tests

What dbt tests actually do

dbt tests, both built-in (not_null, unique, accepted_values, relationships) and custom, run during dbt build or dbt test. They fail the build if a condition isn't met. That's the entire scope.

Useful: yes. Sufficient: no. Here's what they don't catch.

Failure modes dbt tests miss

1. Issues that emerge between runs

dbt runs on a schedule, usually hourly or daily. Anything that drifts between runs (an upstream source goes stale at 11 AM, a downstream report runs at noon) is invisible to dbt. The next dbt run might pass cleanly while the dashboard is showing broken numbers.

2. Quantitative anomalies

not_null catches NULL. It doesn't catch "your average order value just dropped from $87 to $12." That's not a NULL, it's a number-that's-wrong, and dbt has no concept for it. You can write a custom test that fails if AOV is below some floor, but you'll write 200 of these by hand to cover a typical warehouse.

3. Schema drift from upstream

When the source system renames customer_email to email, your dbt model that selects customer_email will either fail loudly (good) or silently fill NULL via a default join (bad). dbt's relationships test catches some of this. Most of it slips through.

4. Source freshness gaps in production

dbt has sources.yml with freshness blocks. Useful for the source itself. Doesn't cover the gap between source and your transformed model, and doesn't cover sources that aren't declared (which, in practice, is most of them in a young warehouse).

The layered approach: dbt tests + observability

The pattern that works in production:

Keep dbt tests for build-time invariants. Schema-level rules that must be true for the model to be valid. Primary keys are unique. Required columns are non-null. Foreign keys resolve. Run on every dbt build.
Add continuous monitors for production behavior. Row counts, freshness, column statistics, business-metric SQL, with ML or threshold detection, segmented by region/product/tenant when an aggregate would hide the problem. Run on a schedule independent of dbt. Alert on drift, not just hard failures.
Use lineage to correlate. When a downstream model breaks, you want to know whether it was caused by a recent dbt change or a source-system change. Lineage between the two layers is the connective tissue.

What to monitor: a starter list

For every mart-layer model (the ones that feed dashboards):

Row count delta: alert if today's row count differs from the 30-day baseline by > 3σ.
Freshness: alert if MAX(updated_at) is older than the model's expected SLA.
NULL rate on key columns: alert if the rate exceeds historical p99.
Distinct count on join keys: alert if cardinality drops more than 10% (signals broken upstream join).
Sum of any revenue/quantity column: alert on day-over-day drift > ±25%.

For staging and intermediate models, less is more. Monitor the layers stakeholders see; trust dbt tests for the layers nobody renders.

How this looks with Sparvi

Sparvi monitors auto-profile your warehouse tables (dbt models included) and generate row-count, freshness, and null-rate monitors with statistical evaluation. You don't write them by hand. The auto-generated monitors run on Sparvi's schedule, independent of when dbt last ran.

For business-metric monitors, you write the SQL once, usually 3-5 per dashboard, and they run continuously. Segment by region or channel so a regional collapse never gets averaged away by a healthy global total. When a dbt test passes but the EMEA revenue number drifts 60%, you find out from Sparvi at 6 AM, not from sales at 11.

What to keep in dbt

Don't move everything. dbt tests are cheap to maintain when they're schema-level invariants. Keep:

Primary key uniqueness (unique)
Required column nullness (not_null)
Referential integrity for joins you control (relationships)
Accepted-values for low-cardinality enums (accepted_values)
Custom tests for invariants the warehouse must satisfy to be deployable

Move out to observability:

Row count thresholds (they drift with business; observability handles baselines)
Freshness against an SLA (continuous, not just at build time)
Statistical anomaly detection on monitors (dbt tests don't do z-score / IQR / moving averages)
Business-metric drift via custom SQL monitors (not a data-shape invariant)
Segmented metrics, per region, product, tenant, that an aggregate test would smooth over

The 30-minute setup

If you're running dbt against Snowflake or BigQuery today, you can layer continuous monitoring on top in about half an hour:

Sign up for a Sparvi trial (14 days, no credit card)
Connect your warehouse with read-only credentials
Pick the dbt models you want monitored (start with mart layer)
Sparvi auto-generates the rules; you accept or adjust
Wire alerts into your existing Slack channel

That's it. dbt keeps doing what dbt does well. The gap dbt leaves, continuous behavior in production, is covered.

Layer Sparvi on top of dbt

Keep your dbt tests. Add Sparvi monitors on top, row counts, freshness, business metrics, with ML or threshold detection, segmented when it matters. Connect in 30 minutes. 14-day free trial, no credit card.

Start Free Trial