Anomaly Detection

Catch Data Issues Before They Catch You

Every Sparvi monitor can run in ML / statistical mode, learning a baseline from your data and flagging deviations automatically. Z-score, IQR, and moving-average detection across row counts, columns, and custom SQL, segmented by region, product, tenant, or any dimension you care about.

Recent Anomalies
Volume Drop: orders
Row count down 85% from expected
2h ago
Null Spike: users.email
Null rate increased from 2% to 15%
5h ago
Segment Drop: revenue / EMEA
EMEA revenue 62% below baseline; AMER and APAC normal
1d ago

What We Detect

Sparvi monitors multiple dimensions of your data to catch issues early.

Volume Anomalies

Row-count monitors detect when volumes deviate from expected patterns. Catch partial loads, duplicate data, and failed pipelines.

Example: A table that averages 10K rows/day suddenly has only 500.

Distribution Anomalies

Column metric monitors (avg, stddev, min, max) flag when values shift outside normal ranges. Catch calculation errors, unit changes, and data corruption.

Example: Average order value jumps from $50 to $5,000.

Null Rate Spikes

Null % monitors across columns catch upstream changes, ETL bugs, and data source issues without writing any SQL.

Example: Email column nulls jump from 2% to 40%.

Freshness Issues

Use a custom SQL monitor on max(updated_at) or a watermark column. Catch pipeline failures, source outages, and scheduling issues.

Example: Hourly table hasn't updated in 6 hours.

Segment Anomalies

Any monitor can be segmented by region, product, tenant, or any dimension. Each segment has its own baseline, so a problem in one slice never hides in the average.

Example: EMEA revenue down 60% while AMER and APAC stay flat.

Uniqueness Changes

Distinct count and distinct % monitors catch unintended duplicates, ID collisions, and cardinality drift.

Example: Primary key column suddenly has duplicates.

How Anomaly Detection Works

1

Learn Baselines

Each monitor in statistical mode analyzes the configured baseline window of historical data, typically 14 days, to understand normal patterns for row counts, distributions, null rates, and more.

2

Monitor Continuously

On its configured schedule (every N minutes, daily, or weekly), the monitor compares the latest value against the baseline using z-score, IQR, or moving-average detection.

3

Alert on Deviations

When a value falls outside the expected range, and minimum data points has been met, Sparvi creates an issue with the segment name (if applicable) and routes alerts via Slack, email, or PagerDuty.

Why Teams Choose Sparvi for Anomaly Detection

No manual threshold configuration, ML learns what's normal automatically
Configurable sensitivity (0.5–2.0) to balance signal vs noise
Instant alerting via Slack, email, or PagerDuty when monitors fire
Per-segment baselines so a bad slice never hides in the average
Integrated issue management to track and resolve problems
AI-powered suggestions for root cause analysis
Historical trend per monitor and per segment

Without Sparvi

  • Stakeholders discover issues in dashboards
  • Aggregates mask outages in individual segments
  • Decisions made on bad data

With Sparvi

  • Proactive alerts before stakeholders notice
  • Per-segment baselines surface localized issues immediately
  • Confidence in data-driven decisions

Frequently Asked Questions

What is anomaly detection in data quality?

Anomaly detection in data quality identifies data points, patterns, or values that deviate significantly from expected behavior. In Sparvi this is how a monitor in statistical evaluation mode raises an issue, when the latest value of the monitored metric falls outside the baseline learned from history.

How does Sparvi detect data anomalies?

Each Sparvi monitor in ML / statistical mode learns a baseline from your historical data using z-score, IQR, or moving-average methods. When the monitored value deviates beyond your configured sensitivity, Sparvi creates an issue and routes alerts to Slack, email, or PagerDuty. For segmented monitors, each segment has its own baseline.

Do I need to configure thresholds manually?

Not for statistical monitors, the baseline is learned automatically. If you have a hard contract (like "revenue should never drop below $50K/day"), switch that monitor to threshold mode and set warning and critical values explicitly. The two modes can coexist across your monitor library.

How quickly will I be alerted to anomalies?

Each monitor has its own schedule, every N minutes, daily, or weekly. The minimum interval is 5 minutes, and most teams find that 60-minute checks on critical tables strike the right balance between latency and warehouse cost.

Stop Discovering Issues After the Fact

Get proactive anomaly detection that catches data issues before they impact your business.

Start Free Trial