Great Expectations vs dbt Tests: Which Should You Use?
Both Great Expectations and dbt tests validate data quality, but they work differently. Here's how to choose—and why many teams use both.
If you're building a data pipeline, you need data quality validation. The two most popular open-source options are Great Expectations (a Python library) and dbt tests (built into dbt). Both catch data problems, but they work in fundamentally different ways.
This guide compares the two approaches to help you decide which to use—or whether you should use both.
Quick Summary
Use dbt tests when:
- • You're already using dbt
- • You want simple, integrated testing
- • Your testing needs are straightforward
- • You prefer YAML over Python
Use Great Expectations when:
- • You need advanced validation logic
- • You test data outside dbt (sources, APIs)
- • You want auto-generated documentation
- • You need 300+ built-in expectations
Understanding the Two Approaches
dbt Tests: Built-In and Simple
dbt tests are data quality checks that run as part of your dbt workflow. They're defined in YAML and execute as SQL queries against your data warehouse.
dbt includes four built-in test types:
- unique: No duplicate values in a column
- not_null: No NULL values in a column
- accepted_values: Values match a defined list
- relationships: Foreign key integrity (values exist in another table)
# schema.yml
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['pending', 'shipped', 'delivered', 'cancelled']
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: customer_idYou can also write custom tests as SQL queries (singular tests) or reusable macros (generic tests), and packages like dbt-utils and dbt-expectations add more test types.
Great Expectations: Powerful and Flexible
Great Expectations (GE) is a standalone Python library for data validation. It's not tied to dbt or any specific tool—you can use it with Pandas DataFrames, Spark, SQL databases, or any data source accessible from Python.
GE provides 300+ built-in "Expectations" (their term for tests), from simple null checks to complex statistical validations:
import great_expectations as gx
# Create a context and data source
context = gx.get_context()
# Define expectations
expectation_suite = context.add_expectation_suite("orders_suite")
# Column-level expectations
expectation_suite.add_expectation(
gx.expectations.ExpectColumnValuesToNotBeNull(column="order_id")
)
expectation_suite.add_expectation(
gx.expectations.ExpectColumnValuesToBeUnique(column="order_id")
)
expectation_suite.add_expectation(
gx.expectations.ExpectColumnValuesToBeBetween(
column="order_total",
min_value=0,
max_value=100000
)
)
# Statistical expectations
expectation_suite.add_expectation(
gx.expectations.ExpectColumnMeanToBeBetween(
column="order_total",
min_value=50,
max_value=500
)
)
# Run validation
results = context.run_checkpoint(checkpoint_name="orders_checkpoint")GE also generates "Data Docs"—HTML documentation of your data quality with validation history and statistics.
Feature Comparison
| Feature | dbt Tests | Great Expectations |
|---|---|---|
| Configuration | YAML | Python or YAML |
| Built-in test types | 4 (+ packages) | 300+ |
| Execution | In-warehouse SQL | Python runtime (SQL, Pandas, Spark) |
| dbt integration | Native | Possible but separate |
| Documentation | dbt Docs | Data Docs (auto-generated) |
| Learning curve | Low (if you know dbt) | Medium-high |
| Non-dbt data sources | No | Yes |
| Statistical tests | Limited | Comprehensive |
| Custom tests | SQL macros | Python classes |
When to Use dbt Tests
Your stack centers on dbt
If dbt is your transformation layer and most of your data quality concerns are about transformed data, dbt tests are the natural choice. They run as part of dbt test or dbt build, require no additional infrastructure, and fit seamlessly into your existing workflow.
You need simple validation
For common checks—uniqueness, not-null, referential integrity, accepted values—dbt's built-in tests are sufficient. With packages like dbt-utils and dbt-expectations, you can cover most scenarios without writing Python.
You prefer YAML over Python
dbt tests are configured in YAML, which many data teams find more accessible than Python. Your analytics engineers can add tests without learning a new programming language.
You want everything in one place
dbt tests live in the same repository as your models, documented in the same schema files. This keeps your data definitions and quality rules together, making it easier to maintain consistency.
When to Use Great Expectations
You need advanced validation
Great Expectations offers expectations that dbt doesn't, especially for statistical validation:
- Column mean/median/std within expected ranges
- Distribution matching (KL divergence, chi-square)
- Regex pattern matching on text columns
- Cross-column comparisons
- Conditional expectations (column A should be X when column B is Y)
You test data outside dbt
GE works with any data source accessible from Python—not just your data warehouse. Use it to validate:
- Source data before it enters your warehouse
- API responses
- Files (CSV, Parquet, JSON)
- Pandas DataFrames in Python pipelines
- Spark DataFrames
You want automated documentation
GE's Data Docs automatically generates HTML documentation showing:
- All defined expectations
- Validation history and results
- Data profiling statistics
- Trend analysis over time
This is valuable for compliance, auditing, and sharing data quality status with stakeholders.
You need profiling-driven expectations
GE can automatically profile your data and suggest expectations based on what it finds. This helps discover implicit assumptions about your data and codify them as tests.
Using Both Together
Many teams use both tools, each for what it does best:
Pattern 1: GE for Sources, dbt for Transformations
Run Great Expectations on source data as it enters your warehouse (or before). Validate that source systems are sending data that meets your expectations. Then use dbt tests to validate your transformation logic.
# Pipeline flow:
# 1. Extract from source
# 2. Run GE validation on raw data
# 3. Load to staging
# 4. dbt transformations with dbt tests
# 5. (Optional) GE validation on final outputsPattern 2: dbt Tests Daily, GE for Deep Profiling
Run dbt tests on every pipeline run for fast, integrated validation. Run Great Expectations periodically (weekly, monthly) for deeper statistical profiling and trend analysis.
Pattern 3: dbt-expectations Package
The dbt-expectations package brings Great Expectations-style tests to dbt. It's a middle ground: stay in dbt's YAML world but get more advanced test types.
# Using dbt-expectations package
models:
- name: orders
columns:
- name: order_total
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 100000
- dbt_expectations.expect_column_mean_to_be_between:
min_value: 50
max_value: 500The Limitations of Both
Both dbt tests and Great Expectations are testing frameworks—they validate data when you run them. They don't provide:
- Continuous monitoring: Tests only run when triggered
- Anomaly detection: You define the rules; they don't learn patterns
- Alerting: You need to build this on top
- Dashboards: No built-in observability UI (GE has Data Docs, but it's static)
- Data lineage: Neither tracks data flow automatically
For these capabilities, you need a data observability platform on top of—or instead of—these testing tools.
Beyond Testing: Continuous Data Observability
dbt tests and Great Expectations are great for validating known expectations. But what about detecting unknown issues? Sparvi provides automated monitoring that catches anomalies, freshness issues, and schema changes—without writing tests for every possible failure mode.
See How Sparvi Complements dbt TestsDecision Framework
Ask yourself these questions:
- Is your data stack dbt-centric? If yes, start with dbt tests.
- Do you need to test data outside dbt? If yes, add Great Expectations.
- Are your testing needs simple (unique, not null, referential)? dbt tests are sufficient.
- Do you need statistical validation or advanced expectations? Use Great Expectations or dbt-expectations package.
- Do you need auto-generated documentation? Great Expectations' Data Docs are excellent.
- Is your team more comfortable with YAML or Python? This often tips the decision.
Frequently Asked Questions
Should I use Great Expectations or dbt tests?
Use dbt tests if you're already using dbt and want simple, integrated testing. Use Great Expectations if you need comprehensive data validation across multiple systems, advanced expectations, or testing outside the dbt ecosystem. Many teams use both: dbt tests for transformation validation and Great Expectations for source data quality.
Can I use Great Expectations with dbt?
Yes, Great Expectations and dbt work well together. Common patterns include: running GE on source data before dbt transformations, using dbt tests for transformation logic, and running GE on final outputs. The dbt-expectations package also brings GE-style expectations to dbt's YAML config.
What are the main differences between Great Expectations and dbt tests?
dbt tests are built into dbt, YAML-based, and run as part of dbt workflows. Great Expectations is standalone Python, offers 300+ built-in expectations, generates data documentation, and works with any Python environment. dbt tests are simpler; GE is more powerful but requires more setup.
Conclusion
dbt tests and Great Expectations aren't competing tools—they're complementary. dbt tests excel for integrated, simple validation in dbt workflows. Great Expectations shines for advanced validation, non-dbt data sources, and auto-generated documentation.
Many mature data teams use both. Start with dbt tests if you're already using dbt—they're the path of least resistance. Add Great Expectations when you hit the limits of what dbt tests can express, or when you need to validate data that dbt doesn't touch.
About Sparvi: We help small data teams (3-15 people) prevent data quality issues before they impact the business. Learn more at sparvi.io.