Introduction to Sparvi

What is Sparvi?

Sparvi is a comprehensive data observability platform that helps you catch data issues early and resolve what impacts business.

Sparvi Cloud

A full-featured data observability platform for data teams, featuring:

  • Interactive data exploration and lineage discovery
  • Monitors that watch row counts, column statistics, and custom SQL on a schedule
  • Segmented metrics so per-region or per-tenant problems never hide in the average
  • Issue management with stakeholder notifications
  • Team collaboration and automated monitoring
  • Enterprise security with SSO and role-based access

Sparvi Cloud helps data teams maintain high-quality data, prevent issues before they impact business operations, and build confidence in their data assets.

Core Concepts

Data Observability

Data observability provides comprehensive visibility into your data's health, quality, and relationships across your organization. Sparvi implements data observability through:

  • Monitors: Schedule a metric, built-in (row count, null %, distinct count, min, max, avg, stddev) or any custom SQL, and let Sparvi raise issues when its value deviates
  • Data Profiling: Automated analysis of data structure, completeness, and statistical patterns
  • Lineage Discovery: Understanding how data flows through your systems and who depends on it
  • Issue Management: Tracking and resolving data problems with stakeholder awareness
  • Business Impact Analysis: Understanding which teams and processes are affected by data issues

Monitors

A monitor is the unit of observability in Sparvi. Each monitor:

  • Targets one of three source types: built-in table metric (row count), built-in column metric (null %, distinct, min, max, avg, stddev), or custom SQL (any numeric query)
  • Runs in one of two evaluation modes: ML / statistical (z-score, IQR, or moving-average detection against a learned baseline) or threshold (warning and critical levels you set yourself)
  • Runs on its own schedule, every N minutes/hours, daily, weekly, or manual-only
  • Can optionally be segmented by a dimension column so each segment value gets its own baseline and its own alert path

Anomalies in Sparvi are simply what happens when a monitor in statistical mode flags a value that falls outside the learned baseline.

Segmented Metrics

Any monitor can be segmented by a dimension column, for example, row count per region, signup count per channel, p95 latency per tenant, or revenue per product line. Sparvi tracks each segment independently:

  • Each segment has its own baseline (for statistical evaluation) or its own threshold (for threshold evaluation)
  • Issues name the offending segment so on-call work is actionable on the first read
  • The dashboard collapses per-segment values into one overall number using your choice of rollup (sum, average, min, or max) and shows all four side by side in the detail view

Validation Rules

Validation rules enforce hard contracts on your data. Sparvi Cloud provides:

  • SQL-Based Rules: Define validation criteria using familiar SQL syntax
  • Health Dashboards: Visual monitoring of validation performance over time
  • Automated Scheduling: Run validations on configurable schedules
  • Issue Integration: Failed validations automatically create tracked issues with business context

Validations answer pass/fail questions on known rules; monitors track numeric metrics over time and alert on deviation. Most teams use both.

Metadata Management

Sparvi automatically collects and tracks metadata changes across your data infrastructure:

  • Schema Change Detection: Identify new, modified, or deleted columns
  • Relationship Discovery: Find dependencies between tables, views, and downstream systems
  • Historical Tracking: Maintain a complete audit trail of all metadata changes
  • Impact Analysis: Understand which systems and teams are affected by changes

Architecture Overview

Sparvi Cloud Architecture

  • Web Application: React-based frontend with comprehensive dashboards and workflows
  • API Layer: RESTful APIs for all platform functionality
  • Database Connectors: Native integrations with Snowflake, BigQuery, and dbt Core (Redshift and dbt Cloud coming H2 2026)
  • Automation Engine: Scheduled monitor evaluation, validation runs, and metadata collection
  • Notification System: Multi-channel alerting with Azure Communication Services
  • Enterprise Security: Multi-factor authentication, role-based access, user management

Key Features

  • Monitors: ML or threshold detection on row counts, column statistics, or custom SQL, segmented when you want it
  • Data Explorer: Navigate database schemas, tables, and columns with comprehensive metadata
  • Lineage Discovery: Automatically map data relationships and business dependencies
  • Issue Management: Track and resolve data quality problems with business impact context
  • Validations: Custom SQL rules with scheduling and alerting
  • Team Collaboration: User management, notifications, and shared workflows
  • Enterprise Integrations: Snowflake, BigQuery, dbt Core, automated scheduling