Guide10 min read

Snowflake Data Quality Monitoring: Complete Guide

Snowflake makes storing and querying data easy. But without proper monitoring, bad data can silently corrupt your dashboards and reports. Here's how to catch issues early.

By the Sparvi Team

Snowflake has become the go-to data warehouse for modern data teams. It's fast, scalable, and requires minimal maintenance. But there's a catch: Snowflake doesn't tell you when your data is wrong.

Your queries will execute perfectly on bad data. Your dashboards will render confidently with incorrect numbers. And by the time someone notices, the damage is done—wrong decisions made, reports recalled, trust eroded.

This guide covers everything you need to know about monitoring data quality in Snowflake: common issues, monitoring approaches, and how to set up automated detection.

Why Snowflake Needs Data Quality Monitoring

Snowflake excels at compute and storage, but it doesn't validate your data. It will happily store:

  • Duplicate records that inflate your metrics
  • Null values that break downstream calculations
  • Stale data from failed ingestion jobs
  • Schema changes that silently break transformations
  • Outliers that skew averages and totals

These issues don't cause errors—they cause wrong answers. That's far more dangerous because problems go undetected longer.

Common Snowflake Data Quality Issues

1. Stale Data (Freshness Issues)

Your Fivetran sync failed silently. Your Airflow DAG didn't run. Your API rate-limited and dropped records. Whatever the cause, the result is the same: your data hasn't updated.

Signs to watch for:

  • Tables that normally update hourly showing yesterday's timestamp
  • MAX(updated_at) not advancing as expected
  • Row counts flat when they should be growing

2. Schema Drift

An upstream system adds a column. Renames a field. Changes a data type. Your transformations break—or worse, silently produce wrong results.

Common causes:

  • Third-party API changes (Salesforce, Stripe, etc.)
  • Application database migrations
  • Changes to CSV file formats
  • ELT tool schema evolution

3. Row Count Anomalies

Your daily orders table usually gets 10,000 rows. Today it got 500. Is that a real business trend, or did something break?

Volume anomalies often indicate:

  • Partial data loads (job failed mid-way)
  • Filter logic changes upstream
  • Source system outages
  • Duplicate record injection

4. Null Value Spikes

A column that's 99% populated suddenly has 40% nulls. This can happen when:

  • Source systems change required fields to optional
  • ETL logic changes how missing data is handled
  • New data sources are merged with different schemas
  • API responses change structure

5. Distribution Shifts

Your order_amount column averages $50. Suddenly it's averaging $5,000. Did your business 100x overnight, or is there a decimal point issue?

Distribution changes can indicate:

  • Currency conversion issues
  • Unit changes (cents vs. dollars)
  • Incorrect joins creating duplicates
  • Test data leaking into production

How to Monitor Snowflake Data Quality

There are three main approaches to monitoring Snowflake data quality, each with trade-offs.

Approach 1: Native Snowflake Queries

You can write SQL queries to check data quality directly in Snowflake:

-- Check freshness
SELECT MAX(updated_at) as last_update,
       DATEDIFF('hour', MAX(updated_at), CURRENT_TIMESTAMP()) as hours_stale
FROM my_table;

-- Check null rates
SELECT
  COUNT(*) as total_rows,
  SUM(CASE WHEN important_column IS NULL THEN 1 ELSE 0 END) as null_count,
  ROUND(null_count / total_rows * 100, 2) as null_percentage
FROM my_table;

-- Check row count vs yesterday
SELECT
  DATE(created_at) as date,
  COUNT(*) as row_count
FROM my_table
WHERE created_at >= DATEADD('day', -7, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC;

Pros: No additional tools, full control, free
Cons: Manual effort, no alerting built-in, doesn't scale

Approach 2: dbt Tests

If you're using dbt for transformations, you can add data tests:

# schema.yml
models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: order_amount
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 100000

Pros: Integrated with your transformations, version controlled
Cons: Only runs when dbt runs, limited to test-time detection, requires dbt expertise

Approach 3: Dedicated Data Observability Tools

Tools like Sparvi, Monte Carlo, and Bigeye provide automated monitoring:

  • Automatic profiling: Understand your data without writing queries
  • ML-based anomaly detection: Learn normal patterns and alert on deviations
  • Schema change tracking: Get notified when structures change
  • Integrated alerting: Slack, email, PagerDuty notifications
  • Historical trends: See how data quality changes over time

Pros: Comprehensive, automated, proactive
Cons: Additional tool to manage, cost (though affordable options exist)

Key Metrics to Track in Snowflake

Regardless of which approach you choose, these are the essential metrics to monitor:

Essential Snowflake Data Quality Metrics

MetricWhat It CatchesAlert Threshold
FreshnessStale data, failed syncs2x normal update interval
Row CountPartial loads, duplicates±30% from 7-day average
Null RateMissing data, schema issues+10% increase from baseline
Unique CountDuplicate keys, cardinality changesAny unexpected duplicates
Value DistributionOutliers, unit changesMean ±3 standard deviations
SchemaColumn adds/removes/renamesAny change

Setting Up Snowflake Monitoring with Sparvi

Here's how to get comprehensive data quality monitoring running on your Snowflake warehouse in under 15 minutes:

Step 1: Connect Your Snowflake Account

Sparvi connects to Snowflake using a read-only service account. You'll need:

  • Account identifier (e.g., xy12345.us-east-1)
  • Warehouse name
  • Database and schema to monitor
  • Service account credentials (username/password or key pair)

We recommend creating a dedicated SPARVI_READER role with SELECT permissions on the schemas you want to monitor. See our Snowflake connection guide for detailed setup instructions.

Step 2: Auto-Profile Your Tables

Once connected, Sparvi automatically profiles your tables:

  • Row counts and growth patterns
  • Column statistics (nulls, uniques, distributions)
  • Data types and schema structure
  • Freshness patterns based on timestamp columns

This creates a baseline understanding of "normal" for your data.

Step 3: Enable Anomaly Detection

Sparvi uses your historical data to automatically detect anomalies:

  • Freshness alerts: Data hasn't updated when expected
  • Volume alerts: Row counts significantly higher or lower
  • Distribution alerts: Values outside normal ranges
  • Schema alerts: Columns added, removed, or modified

No manual threshold configuration required—though you can customize if needed.

Step 4: Add Custom Validation Rules

For business-specific checks, add custom validation rules:

  • order_amount > 0 (no negative orders)
  • status IN ('pending', 'shipped', 'delivered')
  • customer_id EXISTS IN customers.id (referential integrity)

Step 5: Configure Alerts

Route alerts to the right people:

  • Critical issues (freshness, schema changes) → Slack #data-alerts + PagerDuty
  • Warnings (volume anomalies) → Slack #data-team
  • Info (distribution drift) → Daily digest email

Best Practices for Snowflake Data Quality

1. Monitor Your Most Critical Tables First

Don't try to monitor everything at once. Start with tables that:

  • Feed executive dashboards
  • Power financial reports
  • Drive customer-facing features
  • Are frequently queried by other teams

You can expand coverage over time, but catching issues in critical data delivers immediate value.

2. Set Up Monitoring Before Problems Occur

Don't wait for a data incident to start monitoring. By then, you're in reactive mode. Set up monitoring proactively so you catch the next issue before it impacts the business.

3. Include Stakeholders in Alerts

Data quality isn't just a data engineering problem. When issues happen:

  • Analytics needs to know their dashboards might be wrong
  • Product needs to know customer data might be affected
  • Finance needs to know reports might need correction

Use tools that make data quality everyone's responsibility.

4. Document Known Issues and Quirks

Every data warehouse has quirks. Document them:

  • "The orders table has duplicates on the 1st of each month due to retry logic"
  • "customer_email can be null for guests"
  • "Historical data before 2023 has different schema"

This context helps new team members and prevents false positive alerts.

Conclusion

Snowflake is a powerful data warehouse, but it won't protect you from bad data. Without monitoring, issues will slip through and erode trust in your analytics.

The good news: setting up data quality monitoring doesn't have to be complex or expensive. Start with your most critical tables, automate the basics (freshness, volume, nulls), and expand from there.

Your future self—and your stakeholders—will thank you.

Ready to Monitor Your Snowflake Data?

Sparvi connects to Snowflake in minutes and automatically monitors your most important tables. No complex setup, no enterprise pricing—just reliable data quality monitoring built for growing teams.

Get Started with Sparvi

About Sparvi: We help small data teams (3-15 people) monitor Snowflake data quality without enterprise complexity. Learn more at sparvi.io.