Anomaly Detection Guide

Anomaly detection is a key feature of Sparvi that helps you identify unusual patterns, outliers, and unexpected changes in your data. This guide explains how Sparvi's anomaly detection works and how you can leverage it to maintain high data quality.

Understanding Anomaly Detection in Sparvi

Anomaly detection in Sparvi works by:

Analyzing current data to find statistical outliers and pattern deviations
Comparing with historical profiles to detect changes over time
Applying domain-specific rules to identify business anomalies
Ranking anomalies by severity to prioritize the most important issues

Sparvi can detect anomalies in several ways:

Automatically during regular profiling operations
Comparatively when historical data is provided
Rule-based through custom validation rules
Time-series based by analyzing trends and seasonality patterns

Types of Anomalies

Sparvi detects different types of anomalies:

Statistical Outliers

Values that deviate significantly from the statistical norm:

Numeric Outliers: Values far from the mean/median (typically beyond 3 standard deviations)
Temporal Outliers: Unusual spikes or drops in time-series data
Frequency Outliers: Unusual distribution of categorical values

Data Quality Anomalies

Issues that indicate potential data quality problems:

Null Rate Changes: Unexpected increases in null values
Duplicate Rate Changes: Unexpected increases in duplicate records
Format Inconsistencies: Deviations from established data formats
Referential Integrity Issues: Invalid references between tables

Schema Anomalies

Changes in the structure of your data:

Column Additions/Removals: New or missing columns
Data Type Changes: Changes in column data types
Constraint Changes: Changes in column nullability or uniqueness

Business Logic Anomalies

Violations of business rules or expectations:

Range Violations: Values outside of acceptable business ranges
Process Breaks: Indications of broken business processes
Relationship Anomalies: Unusual relationships between entities

Detecting Anomalies with Sparvi Cloud

Sparvi Cloud provides automated anomaly detection through the web interface:

Automatic Detection: Anomalies are detected automatically during data profiling
Historical Comparison: Sparvi Cloud tracks changes over time and alerts you to unusual patterns
Anomaly Dashboard: View all detected anomalies in the Anomaly Detection Dashboard
Configurable Thresholds: Customize detection sensitivity for different data patterns
Severity Ranking: Anomalies are automatically ranked by severity to help you prioritize
Business Impact Analysis: See which teams and processes are affected by detected anomalies
Notifications: Get alerted via Slack, email, or webhooks when anomalies are detected

Anomaly Types in Detail

Row Count Anomalies

Unusual changes in the number of rows in your tables. Sparvi detects when row counts increase or decrease beyond expected thresholds, which could indicate:

Data pipeline failures
Missing data loads
Unexpected data deletions
Data quality issues upstream

Example: "Row count decreased by 25% (from 1000 to 750)" - marked as high severity

Null Rate Anomalies

Unexpected changes in null values across your columns. Sparvi monitors null rates and alerts when they deviate from normal patterns, which could indicate:

Missing data from upstream sources
Changes in data collection processes
ETL pipeline issues
Schema changes affecting data population

Example: "Null rate for 'email' increased from 2% to 15%" - marked as medium severity

Distribution Anomalies

Changes in the distribution of values within your data. Sparvi detects when value distributions shift significantly, which could indicate:

Skewed data processing
Changes in business operations
Data collection biases
Incorrect default values

Example: "Unusual distribution of 'status' values (98% are now 'pending')" - marked as high severity

Statistical Outliers

Values that are statistical outliers compared to historical patterns. Sparvi identifies values that fall outside expected ranges, which could indicate:

Data entry errors
System bugs generating invalid values
Legitimate but unusual business events
Integration issues with external systems

Example: "Found 5 outlier values in 'amount' (beyond 3 std deviations)" - marked as medium severity

Format Anomalies

Inconsistencies in data formats across your columns. Sparvi detects when data doesn't match established patterns, which could indicate:

Changes in data collection processes
Integration issues with external systems
Data validation failures
Migration or import errors

Example: "15% of 'phone_number' values don't match the established pattern" - marked as medium severity

Schema Shift Anomalies

Changes in your table schema structure. Sparvi monitors for schema changes including:

New columns added
Columns removed
Data type changes
Constraint modifications

Examples:

"New column 'discount_code' was added" - marked as medium severity
"Data type changed from 'DECIMAL(10,2)' to 'INTEGER'" - marked as high severity

Working with Anomalies in Sparvi Cloud

Filtering and Viewing Anomalies

In the Anomaly Detection Dashboard, you can:

Filter anomalies by severity (high, medium, low)
Filter by anomaly type (row count, null rate, distribution, etc.)
Filter by affected columns or tables
View anomalies over specific time periods
Search for specific anomalies by description

Taking Action on Anomalies

When Sparvi Cloud detects an anomaly, you can:

Create an Issue: Convert the anomaly into a tracked issue with business context
Assign to Team Members: Use @mentions to notify the right people
Add Comments: Collaborate with your team to investigate and resolve
Track Resolution: Monitor the status of anomalies from detection to resolution
Document Findings: Keep a record of what caused the anomaly and how it was fixed

Anomaly Dashboard

The Anomaly Detection Dashboard provides:

Visual Trends: See anomaly patterns over time
Severity Breakdown: Understand the distribution of high, medium, and low severity anomalies
Type Analysis: View which anomaly types are most common in your data
Table Heatmap: Identify which tables have the most data quality issues
Business Impact: See which teams and processes are affected

Configuring Anomaly Detection

Custom Thresholds

In Sparvi Cloud, you can configure detection sensitivity through the settings interface:

Row Count Changes: Set the percentage threshold for row count changes (default: 20%)

Example: Alert only on 30% or greater changes

Null Rate Changes: Configure sensitivity for null rate changes (default: 10%)

Example: Alert on 5% changes for critical columns like email

Distribution Changes: Set thresholds for value distribution shifts (default: 20%)

Example: More sensitive monitoring for status or category columns

Expected Patterns

Define expected data formats for your columns:

Email format: Standard email validation patterns
Phone numbers: Custom phone number formats for your region
Postal codes: ZIP code or postal code patterns
Custom formats: Define your own regex patterns for specialized data

Column-Specific Sensitivity

Configure different sensitivity levels for different columns:

Critical columns: Higher sensitivity (e.g., customer email, payment amounts)
Reference columns: Standard sensitivity (e.g., descriptions, notes)
Audit columns: Lower sensitivity (e.g., created_by, updated_at)

Best Practices

Start Simple: Begin with default settings and refine thresholds as you learn your data patterns
Historical Comparison: Sparvi Cloud automatically maintains historical profiles for meaningful anomaly detection
Adjust Thresholds: Customize thresholds based on your data's natural variability
Column Sensitivity: Set different sensitivity levels for critical columns vs. less important ones
Severity Triage: Focus on high-severity anomalies first, then address medium and low ones
Regular Monitoring: Sparvi Cloud runs profiles on automated schedules to establish normal patterns
Document Findings: Use issue tracking to keep a record of anomalies and their resolutions
Team Collaboration: Use @mentions and comments to involve the right people quickly
Pattern Library: Build a library of expected patterns for format validation
Notification Strategy: Configure Slack, email, or webhook alerts for critical anomalies

Next Steps

After mastering anomaly detection:

Set up Validation Rules to complement anomaly detection with business-specific checks
Explore Data Profiling to understand comprehensive data quality metrics
Learn about Sparvi Cloud Database Connections to connect your data sources

Documentation

Anomaly Detection Guide

Understanding Anomaly Detection in Sparvi

Types of Anomalies

Statistical Outliers

Data Quality Anomalies

Schema Anomalies

Business Logic Anomalies

Detecting Anomalies with Sparvi Cloud

Anomaly Types in Detail

Row Count Anomalies

Null Rate Anomalies

Distribution Anomalies

Statistical Outliers

Format Anomalies

Schema Shift Anomalies

Working with Anomalies in Sparvi Cloud

Filtering and Viewing Anomalies

Taking Action on Anomalies

Anomaly Dashboard

Configuring Anomaly Detection

Custom Thresholds

Expected Patterns

Column-Specific Sensitivity

Best Practices

Next Steps