dbt Data Quality Testing: Complete Guide with Examples
dbt's built-in testing framework is powerful for catching data quality issues during transformation. Learn how to use it effectively—and when you need additional monitoring.
Table of Contents
What is dbt Data Testing?
dbt (data build tool) includes a testing framework that lets you validate data quality as part of your transformation pipeline. When you run dbt test, it executes SQL queries that check whether your data meets defined expectations.
Think of dbt tests as unit tests for your data. Just like software engineers write tests to catch bugs before deployment, data engineers write dbt tests to catch data quality issues before they reach production dashboards.
dbt Test Types
- Schema tests: Defined in YAML, test column properties (unique, not_null, etc.)
- Data tests: Custom SQL queries in the tests/ directory
- Unit tests: (dbt 1.8+) Test model logic with mock inputs
dbt Built-in Tests
dbt provides four built-in tests that cover the most common data quality checks:
1. unique
Ensures all values in a column are unique (no duplicates).
# schema.yml
models:
- name: orders
columns:
- name: order_id
tests:
- uniqueUse for: Primary keys, natural keys, any column that should have no duplicates.
2. not_null
Ensures no null values exist in a column.
# schema.yml
models:
- name: orders
columns:
- name: customer_id
tests:
- not_nullUse for: Required fields, foreign keys, any column where nulls indicate a problem.
3. accepted_values
Ensures all values are from a predefined list.
# schema.yml
models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
values: ['pending', 'shipped', 'delivered', 'cancelled']Use for: Status fields, category columns, enum-like values.
4. relationships
Ensures referential integrity—all values exist in another table.
# schema.yml
models:
- name: orders
columns:
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: idUse for: Foreign key validation, ensuring no orphaned records.
Writing Custom dbt Tests
When built-in tests aren't enough, you can write custom tests as SQL files in the tests/ directory.
Generic Custom Tests
Reusable tests that can be applied to any model/column. Place in macros/:
-- macros/test_positive_values.sql
{% test positive_values(model, column_name) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} < 0
{% endtest %}Use it in schema.yml:
# schema.yml
models:
- name: orders
columns:
- name: amount
tests:
- positive_valuesSingular Data Tests
One-off tests for specific business rules. Place in tests/:
-- tests/assert_total_revenue_matches.sql
-- This test fails if the rows returned > 0
SELECT
date,
SUM(order_amount) as calculated_revenue,
daily_revenue_total as reported_revenue
FROM {{ ref('orders') }}
JOIN {{ ref('daily_summary') }} USING (date)
GROUP BY 1, daily_revenue_total
HAVING ABS(SUM(order_amount) - daily_revenue_total) > 0.01Common Custom Test Patterns
-- Test: Column value within range
{% test in_range(model, column_name, min_val, max_val) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} < {{ min_val }}
OR {{ column_name }} > {{ max_val }}
{% endtest %}
-- Test: Row count within threshold
{% test row_count_threshold(model, min_rows) %}
SELECT 1
FROM {{ model }}
HAVING COUNT(*) < {{ min_rows }}
{% endtest %}
-- Test: No future dates
{% test no_future_dates(model, column_name) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} > CURRENT_DATE
{% endtest %}Test Configuration and Severity
dbt allows you to configure test behavior including severity levels and error thresholds.
Severity Levels
# schema.yml
models:
- name: orders
columns:
- name: email
tests:
- not_null:
severity: warn # warn instead of error
- unique:
severity: error # fail the build (default)Error Thresholds
Allow some failures before the test fails:
# schema.yml
models:
- name: orders
columns:
- name: email
tests:
- not_null:
config:
error_if: ">100" # fail if more than 100 nulls
warn_if: ">10" # warn if more than 10 nullsStore Test Failures
Save failing rows for debugging:
# dbt_project.yml
tests:
+store_failures: true
+schema: dbt_test_failuresdbt Test Limitations
While dbt tests are valuable, they have important limitations for comprehensive data quality:
dbt Test Limitations
- 1.Build-time only: Tests run when you run
dbt test, not continuously. Data can go bad between builds. - 2.No anomaly detection: Tests can't automatically detect unusual patterns. You must manually set thresholds.
- 3.No historical tracking: No visibility into how metrics change over time or trend analysis.
- 4.Manual threshold maintenance: You must update thresholds as data grows and patterns change.
- 5. No freshness monitoring: dbt source freshness helps but doesn't monitor transformed tables.
- 6.Limited alerting: Native alerting requires additional tooling or CI/CD integration.
dbt + Data Observability: Better Together
The best data quality strategy combines dbt tests with a data observability platform. Here's how they complement each other:
| Capability | dbt Tests | Data Observability |
|---|---|---|
| When it runs | Build time | Continuously |
| Anomaly detection | Manual thresholds | ML-powered, automatic |
| Historical trends | No | Yes |
| Schema monitoring | Via tests | Automatic |
| Freshness monitoring | Sources only | All tables |
| Best for | Known rules, build validation | Unknown unknowns, production monitoring |
Recommended Approach
- Use dbt tests for known rules: Primary key uniqueness, not-null constraints, valid enum values, referential integrity
- Use observability for the rest: Anomaly detection, freshness monitoring, volume tracking, distribution shifts
- Integrate alerts: Route both dbt test failures and observability alerts to the same channel
Complete Your dbt Stack with Sparvi
Sparvi complements dbt by providing continuous monitoring, ML-powered anomaly detection, and automatic freshness tracking. Catch what dbt tests miss—without writing more YAML.
Learn More About SparviFrequently Asked Questions
What is dbt data testing?
dbt data testing is the practice of validating data quality within your dbt (data build tool) pipelines. dbt provides built-in tests like unique, not_null, accepted_values, and relationships, plus support for custom SQL tests that run automatically when you build your models.
How do I add data quality tests in dbt?
Add tests in your schema.yml file under each column or model. Use built-in tests like "unique" and "not_null", or write custom tests in the tests/ directory. Run them with the "dbt test" command. Tests fail the build if data doesn't meet expectations.
What are dbt test limitations?
dbt tests only run at build time, not continuously. They can't detect anomalies or establish baselines, don't track historical trends, and require manual threshold configuration. For production monitoring, teams often combine dbt tests with data observability tools.
Should I use dbt tests or a data observability tool?
Use both. dbt tests catch issues during transformation (build-time validation), while data observability tools monitor data continuously in production. dbt tests are like unit tests; observability tools are like production monitoring—you need both for complete coverage.