Snowflake Data Quality Monitoring: Complete Guide
Snowflake makes storing and querying data easy. But without proper monitoring, bad data can silently corrupt your dashboards and reports. Here's how to catch issues early.
Snowflake has become the go-to data warehouse for modern data teams. It's fast, scalable, and requires minimal maintenance. But there's a catch: Snowflake doesn't tell you when your data is wrong.
Your queries will execute perfectly on bad data. Your dashboards will render confidently with incorrect numbers. And by the time someone notices, the damage is done—wrong decisions made, reports recalled, trust eroded.
This guide covers everything you need to know about monitoring data quality in Snowflake: common issues, monitoring approaches, and how to set up automated detection.
Why Snowflake Needs Data Quality Monitoring
Snowflake excels at compute and storage, but it doesn't validate your data. It will happily store:
- Duplicate records that inflate your metrics
- Null values that break downstream calculations
- Stale data from failed ingestion jobs
- Schema changes that silently break transformations
- Outliers that skew averages and totals
These issues don't cause errors—they cause wrong answers. That's far more dangerous because problems go undetected longer.
Common Snowflake Data Quality Issues
1. Stale Data (Freshness Issues)
Your Fivetran sync failed silently. Your Airflow DAG didn't run. Your API rate-limited and dropped records. Whatever the cause, the result is the same: your data hasn't updated.
Signs to watch for:
- Tables that normally update hourly showing yesterday's timestamp
MAX(updated_at)not advancing as expected- Row counts flat when they should be growing
2. Schema Drift
An upstream system adds a column. Renames a field. Changes a data type. Your transformations break—or worse, silently produce wrong results.
Common causes:
- Third-party API changes (Salesforce, Stripe, etc.)
- Application database migrations
- Changes to CSV file formats
- ELT tool schema evolution
3. Row Count Anomalies
Your daily orders table usually gets 10,000 rows. Today it got 500. Is that a real business trend, or did something break?
Volume anomalies often indicate:
- Partial data loads (job failed mid-way)
- Filter logic changes upstream
- Source system outages
- Duplicate record injection
4. Null Value Spikes
A column that's 99% populated suddenly has 40% nulls. This can happen when:
- Source systems change required fields to optional
- ETL logic changes how missing data is handled
- New data sources are merged with different schemas
- API responses change structure
5. Distribution Shifts
Your order_amount column averages $50. Suddenly it's averaging $5,000. Did your business 100x overnight, or is there a decimal point issue?
Distribution changes can indicate:
- Currency conversion issues
- Unit changes (cents vs. dollars)
- Incorrect joins creating duplicates
- Test data leaking into production
How to Monitor Snowflake Data Quality
There are three main approaches to monitoring Snowflake data quality, each with trade-offs.
Approach 1: Native Snowflake Queries
You can write SQL queries to check data quality directly in Snowflake:
-- Check freshness
SELECT MAX(updated_at) as last_update,
DATEDIFF('hour', MAX(updated_at), CURRENT_TIMESTAMP()) as hours_stale
FROM my_table;
-- Check null rates
SELECT
COUNT(*) as total_rows,
SUM(CASE WHEN important_column IS NULL THEN 1 ELSE 0 END) as null_count,
ROUND(null_count / total_rows * 100, 2) as null_percentage
FROM my_table;
-- Check row count vs yesterday
SELECT
DATE(created_at) as date,
COUNT(*) as row_count
FROM my_table
WHERE created_at >= DATEADD('day', -7, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC;Pros: No additional tools, full control, free
Cons: Manual effort, no alerting built-in, doesn't scale
Approach 2: dbt Tests
If you're using dbt for transformations, you can add data tests:
# schema.yml
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: order_amount
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
max_value: 100000Pros: Integrated with your transformations, version controlled
Cons: Only runs when dbt runs, limited to test-time detection, requires dbt expertise
Approach 3: Dedicated Data Observability Tools
Tools like Sparvi, Monte Carlo, and Bigeye provide automated monitoring:
- Automatic profiling: Understand your data without writing queries
- ML-based anomaly detection: Learn normal patterns and alert on deviations
- Schema change tracking: Get notified when structures change
- Integrated alerting: Slack, email, PagerDuty notifications
- Historical trends: See how data quality changes over time
Pros: Comprehensive, automated, proactive
Cons: Additional tool to manage, cost (though affordable options exist)
Key Metrics to Track in Snowflake
Regardless of which approach you choose, these are the essential metrics to monitor:
Essential Snowflake Data Quality Metrics
| Metric | What It Catches | Alert Threshold |
|---|---|---|
| Freshness | Stale data, failed syncs | 2x normal update interval |
| Row Count | Partial loads, duplicates | ±30% from 7-day average |
| Null Rate | Missing data, schema issues | +10% increase from baseline |
| Unique Count | Duplicate keys, cardinality changes | Any unexpected duplicates |
| Value Distribution | Outliers, unit changes | Mean ±3 standard deviations |
| Schema | Column adds/removes/renames | Any change |
Setting Up Snowflake Monitoring with Sparvi
Here's how to get comprehensive data quality monitoring running on your Snowflake warehouse in under 15 minutes:
Step 1: Connect Your Snowflake Account
Sparvi connects to Snowflake using a read-only service account. You'll need:
- Account identifier (e.g.,
xy12345.us-east-1) - Warehouse name
- Database and schema to monitor
- Service account credentials (username/password or key pair)
We recommend creating a dedicated SPARVI_READER role with SELECT permissions on the schemas you want to monitor. See our Snowflake connection guide for detailed setup instructions.
Step 2: Auto-Profile Your Tables
Once connected, Sparvi automatically profiles your tables:
- Row counts and growth patterns
- Column statistics (nulls, uniques, distributions)
- Data types and schema structure
- Freshness patterns based on timestamp columns
This creates a baseline understanding of "normal" for your data.
Step 3: Enable Anomaly Detection
Sparvi uses your historical data to automatically detect anomalies:
- Freshness alerts: Data hasn't updated when expected
- Volume alerts: Row counts significantly higher or lower
- Distribution alerts: Values outside normal ranges
- Schema alerts: Columns added, removed, or modified
No manual threshold configuration required—though you can customize if needed.
Step 4: Add Custom Validation Rules
For business-specific checks, add custom validation rules:
order_amount > 0(no negative orders)status IN ('pending', 'shipped', 'delivered')customer_id EXISTS IN customers.id(referential integrity)
Step 5: Configure Alerts
Route alerts to the right people:
- Critical issues (freshness, schema changes) → Slack #data-alerts + PagerDuty
- Warnings (volume anomalies) → Slack #data-team
- Info (distribution drift) → Daily digest email
Best Practices for Snowflake Data Quality
1. Monitor Your Most Critical Tables First
Don't try to monitor everything at once. Start with tables that:
- Feed executive dashboards
- Power financial reports
- Drive customer-facing features
- Are frequently queried by other teams
You can expand coverage over time, but catching issues in critical data delivers immediate value.
2. Set Up Monitoring Before Problems Occur
Don't wait for a data incident to start monitoring. By then, you're in reactive mode. Set up monitoring proactively so you catch the next issue before it impacts the business.
3. Include Stakeholders in Alerts
Data quality isn't just a data engineering problem. When issues happen:
- Analytics needs to know their dashboards might be wrong
- Product needs to know customer data might be affected
- Finance needs to know reports might need correction
Use tools that make data quality everyone's responsibility.
4. Document Known Issues and Quirks
Every data warehouse has quirks. Document them:
- "The
orderstable has duplicates on the 1st of each month due to retry logic" - "
customer_emailcan be null for guests" - "Historical data before 2023 has different schema"
This context helps new team members and prevents false positive alerts.
Conclusion
Snowflake is a powerful data warehouse, but it won't protect you from bad data. Without monitoring, issues will slip through and erode trust in your analytics.
The good news: setting up data quality monitoring doesn't have to be complex or expensive. Start with your most critical tables, automate the basics (freshness, volume, nulls), and expand from there.
Your future self—and your stakeholders—will thank you.
Ready to Monitor Your Snowflake Data?
Sparvi connects to Snowflake in minutes and automatically monitors your most important tables. No complex setup, no enterprise pricing—just reliable data quality monitoring built for growing teams.
Get Started with SparviAbout Sparvi: We help small data teams (3-15 people) monitor Snowflake data quality without enterprise complexity. Learn more at sparvi.io.