dbt Data Quality Testing: Beyond Built-in Tests (2025 Guide)

What is dbt Data Testing?

dbt (data build tool) includes a testing framework that lets you validate data quality as part of your transformation pipeline. When you run dbt test, it executes SQL queries that check whether your data meets defined expectations.

Think of dbt tests as unit tests for your data. Just like software engineers write tests to catch bugs before deployment, data engineers write dbt tests to catch data quality issues before they reach production dashboards.

dbt Test Types

Schema tests: Defined in YAML, test column properties (unique, not_null, etc.)
Data tests: Custom SQL queries in the tests/ directory
Unit tests: (dbt 1.8+) Test model logic with mock inputs

dbt Built-in Tests

dbt provides four built-in tests that cover the most common data quality checks:

1. unique

Ensures all values in a column are unique (no duplicates).

# schema.yml
models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique

Use for: Primary keys, natural keys, any column that should have no duplicates.

2. not_null

Ensures no null values exist in a column.

# schema.yml
models:
  - name: orders
    columns:
      - name: customer_id
        tests:
          - not_null

Use for: Required fields, foreign keys, any column where nulls indicate a problem.

3. accepted_values

Ensures all values are from a predefined list.

# schema.yml
models:
  - name: orders
    columns:
      - name: status
        tests:
          - accepted_values:
              values: ['pending', 'shipped', 'delivered', 'cancelled']

Use for: Status fields, category columns, enum-like values.

4. relationships

Ensures referential integrity—all values exist in another table.

# schema.yml
models:
  - name: orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: id

Use for: Foreign key validation, ensuring no orphaned records.

Writing Custom dbt Tests

When built-in tests aren't enough, you can write custom tests as SQL files in the tests/ directory.

Generic Custom Tests

Reusable tests that can be applied to any model/column. Place in macros/:

-- macros/test_positive_values.sql
{% test positive_values(model, column_name) %}

SELECT *
FROM {{ model }}
WHERE {{ column_name }} < 0

{% endtest %}

Use it in schema.yml:

# schema.yml
models:
  - name: orders
    columns:
      - name: amount
        tests:
          - positive_values

Singular Data Tests

One-off tests for specific business rules. Place in tests/:

-- tests/assert_total_revenue_matches.sql
-- This test fails if the rows returned > 0

SELECT
    date,
    SUM(order_amount) as calculated_revenue,
    daily_revenue_total as reported_revenue
FROM {{ ref('orders') }}
JOIN {{ ref('daily_summary') }} USING (date)
GROUP BY 1, daily_revenue_total
HAVING ABS(SUM(order_amount) - daily_revenue_total) > 0.01

Common Custom Test Patterns

-- Test: Column value within range
{% test in_range(model, column_name, min_val, max_val) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} < {{ min_val }}
   OR {{ column_name }} > {{ max_val }}
{% endtest %}

-- Test: Row count within threshold
{% test row_count_threshold(model, min_rows) %}
SELECT 1
FROM {{ model }}
HAVING COUNT(*) < {{ min_rows }}
{% endtest %}

-- Test: No future dates
{% test no_future_dates(model, column_name) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} > CURRENT_DATE
{% endtest %}

Test Configuration and Severity

dbt allows you to configure test behavior including severity levels and error thresholds.

Severity Levels

# schema.yml
models:
  - name: orders
    columns:
      - name: email
        tests:
          - not_null:
              severity: warn  # warn instead of error
          - unique:
              severity: error  # fail the build (default)

Error Thresholds

Allow some failures before the test fails:

# schema.yml
models:
  - name: orders
    columns:
      - name: email
        tests:
          - not_null:
              config:
                error_if: ">100"      # fail if more than 100 nulls
                warn_if: ">10"        # warn if more than 10 nulls

Store Test Failures

Save failing rows for debugging:

# dbt_project.yml
tests:
  +store_failures: true
  +schema: dbt_test_failures

dbt Test Limitations

While dbt tests are valuable, they have important limitations for comprehensive data quality:

dbt Test Limitations

1.Build-time only: Tests run when you run dbt test, not continuously. Data can go bad between builds.
2.No anomaly detection: Tests can't automatically detect unusual patterns. You must manually set thresholds.
3.No historical tracking: No visibility into how metrics change over time or trend analysis.
4.Manual threshold maintenance: You must update thresholds as data grows and patterns change.
5. No freshness monitoring: dbt source freshness helps but doesn't monitor transformed tables.
6.Limited alerting: Native alerting requires additional tooling or CI/CD integration.

dbt + Data Observability: Better Together

The best data quality strategy combines dbt tests with a data observability platform. Here's how they complement each other:

Capability	dbt Tests	Data Observability
When it runs	Build time	Continuously
Anomaly detection	Manual thresholds	ML-powered, automatic
Historical trends	No	Yes
Schema monitoring	Via tests	Automatic
Freshness monitoring	Sources only	All tables
Best for	Known rules, build validation	Unknown unknowns, production monitoring

Recommended Approach

Use dbt tests for known rules: Primary key uniqueness, not-null constraints, valid enum values, referential integrity
Use observability for the rest: Anomaly detection, freshness monitoring, volume tracking, distribution shifts
Integrate alerts: Route both dbt test failures and observability alerts to the same channel

Complete Your dbt Stack with Sparvi

Sparvi complements dbt by providing continuous monitoring, ML-powered anomaly detection, and automatic freshness tracking. Catch what dbt tests miss—without writing more YAML.

Learn More About Sparvi

Frequently Asked Questions

What is dbt data testing?

dbt data testing is the practice of validating data quality within your dbt (data build tool) pipelines. dbt provides built-in tests like unique, not_null, accepted_values, and relationships, plus support for custom SQL tests that run automatically when you build your models.

How do I add data quality tests in dbt?

Add tests in your schema.yml file under each column or model. Use built-in tests like "unique" and "not_null", or write custom tests in the tests/ directory. Run them with the "dbt test" command. Tests fail the build if data doesn't meet expectations.

What are dbt test limitations?

dbt tests only run at build time, not continuously. They can't detect anomalies or establish baselines, don't track historical trends, and require manual threshold configuration. For production monitoring, teams often combine dbt tests with data observability tools.

Should I use dbt tests or a data observability tool?

Use both. dbt tests catch issues during transformation (build-time validation), while data observability tools monitor data continuously in production. dbt tests are like unit tests; observability tools are like production monitoring—you need both for complete coverage.

dbt Data Quality Testing: Complete Guide with Examples

Table of Contents

What is dbt Data Testing?

dbt Test Types

dbt Built-in Tests

1. unique

2. not_null

3. accepted_values

4. relationships

Writing Custom dbt Tests

Generic Custom Tests

Singular Data Tests

Common Custom Test Patterns

Test Configuration and Severity

Severity Levels

Error Thresholds

Store Test Failures

dbt Test Limitations

dbt Test Limitations

dbt + Data Observability: Better Together

Recommended Approach

Complete Your dbt Stack with Sparvi

Frequently Asked Questions

What is dbt data testing?

How do I add data quality tests in dbt?

What are dbt test limitations?

Should I use dbt tests or a data observability tool?

Related Articles

Data Quality Best Practices

Best Data Observability Tools

Data Validation

Fix Data Pipeline Failures