Anomaly Detection Guide

Anomaly detection is a key feature of Sparvi that helps you identify unusual patterns, outliers, and unexpected changes in your data. This guide explains how Sparvi's anomaly detection works and how you can leverage it to maintain high data quality.

Understanding Anomaly Detection in Sparvi

Anomaly detection in Sparvi works by:

  1. Analyzing current data to find statistical outliers and pattern deviations
  2. Comparing with historical profiles to detect changes over time
  3. Applying domain-specific rules to identify business anomalies
  4. Ranking anomalies by severity to prioritize the most important issues

Sparvi can detect anomalies in several ways:

  • Automatically during regular profiling operations
  • Comparatively when historical data is provided
  • Rule-based through custom validation rules
  • Time-series based by analyzing trends and seasonality patterns

Types of Anomalies

Sparvi detects different types of anomalies:

Statistical Outliers

Values that deviate significantly from the statistical norm:

  • Numeric Outliers: Values far from the mean/median (typically beyond 3 standard deviations)
  • Temporal Outliers: Unusual spikes or drops in time-series data
  • Frequency Outliers: Unusual distribution of categorical values

Data Quality Anomalies

Issues that indicate potential data quality problems:

  • Null Rate Changes: Unexpected increases in null values
  • Duplicate Rate Changes: Unexpected increases in duplicate records
  • Format Inconsistencies: Deviations from established data formats
  • Referential Integrity Issues: Invalid references between tables

Schema Anomalies

Changes in the structure of your data:

  • Column Additions/Removals: New or missing columns
  • Data Type Changes: Changes in column data types
  • Constraint Changes: Changes in column nullability or uniqueness

Business Logic Anomalies

Violations of business rules or expectations:

  • Range Violations: Values outside of acceptable business ranges
  • Process Breaks: Indications of broken business processes
  • Relationship Anomalies: Unusual relationships between entities

Detecting Anomalies with Sparvi Cloud

Sparvi Cloud provides automated anomaly detection through the web interface:

  1. Automatic Detection: Anomalies are detected automatically during data profiling
  2. Historical Comparison: Sparvi Cloud tracks changes over time and alerts you to unusual patterns
  3. Anomaly Dashboard: View all detected anomalies in the Anomaly Detection Dashboard
  4. Configurable Thresholds: Customize detection sensitivity for different data patterns
  5. Severity Ranking: Anomalies are automatically ranked by severity to help you prioritize
  6. Business Impact Analysis: See which teams and processes are affected by detected anomalies
  7. Notifications: Get alerted via Slack, email, or webhooks when anomalies are detected

Anomaly Types in Detail

Row Count Anomalies

Unusual changes in the number of rows in your tables. Sparvi detects when row counts increase or decrease beyond expected thresholds, which could indicate:

  • Data pipeline failures
  • Missing data loads
  • Unexpected data deletions
  • Data quality issues upstream

Example: "Row count decreased by 25% (from 1000 to 750)" - marked as high severity

Null Rate Anomalies

Unexpected changes in null values across your columns. Sparvi monitors null rates and alerts when they deviate from normal patterns, which could indicate:

  • Missing data from upstream sources
  • Changes in data collection processes
  • ETL pipeline issues
  • Schema changes affecting data population

Example: "Null rate for 'email' increased from 2% to 15%" - marked as medium severity

Distribution Anomalies

Changes in the distribution of values within your data. Sparvi detects when value distributions shift significantly, which could indicate:

  • Skewed data processing
  • Changes in business operations
  • Data collection biases
  • Incorrect default values

Example: "Unusual distribution of 'status' values (98% are now 'pending')" - marked as high severity

Statistical Outliers

Values that are statistical outliers compared to historical patterns. Sparvi identifies values that fall outside expected ranges, which could indicate:

  • Data entry errors
  • System bugs generating invalid values
  • Legitimate but unusual business events
  • Integration issues with external systems

Example: "Found 5 outlier values in 'amount' (beyond 3 std deviations)" - marked as medium severity

Format Anomalies

Inconsistencies in data formats across your columns. Sparvi detects when data doesn't match established patterns, which could indicate:

  • Changes in data collection processes
  • Integration issues with external systems
  • Data validation failures
  • Migration or import errors

Example: "15% of 'phone_number' values don't match the established pattern" - marked as medium severity

Schema Shift Anomalies

Changes in your table schema structure. Sparvi monitors for schema changes including:

  • New columns added
  • Columns removed
  • Data type changes
  • Constraint modifications

Examples:

  • "New column 'discount_code' was added" - marked as medium severity
  • "Data type changed from 'DECIMAL(10,2)' to 'INTEGER'" - marked as high severity

Working with Anomalies in Sparvi Cloud

Filtering and Viewing Anomalies

In the Anomaly Detection Dashboard, you can:

  • Filter anomalies by severity (high, medium, low)
  • Filter by anomaly type (row count, null rate, distribution, etc.)
  • Filter by affected columns or tables
  • View anomalies over specific time periods
  • Search for specific anomalies by description

Taking Action on Anomalies

When Sparvi Cloud detects an anomaly, you can:

  1. Create an Issue: Convert the anomaly into a tracked issue with business context
  2. Assign to Team Members: Use @mentions to notify the right people
  3. Add Comments: Collaborate with your team to investigate and resolve
  4. Track Resolution: Monitor the status of anomalies from detection to resolution
  5. Document Findings: Keep a record of what caused the anomaly and how it was fixed

Anomaly Dashboard

The Anomaly Detection Dashboard provides:

  • Visual Trends: See anomaly patterns over time
  • Severity Breakdown: Understand the distribution of high, medium, and low severity anomalies
  • Type Analysis: View which anomaly types are most common in your data
  • Table Heatmap: Identify which tables have the most data quality issues
  • Business Impact: See which teams and processes are affected

Configuring Anomaly Detection

Custom Thresholds

In Sparvi Cloud, you can configure detection sensitivity through the settings interface:

Row Count Changes: Set the percentage threshold for row count changes (default: 20%)

  • Example: Alert only on 30% or greater changes

Null Rate Changes: Configure sensitivity for null rate changes (default: 10%)

  • Example: Alert on 5% changes for critical columns like email

Distribution Changes: Set thresholds for value distribution shifts (default: 20%)

  • Example: More sensitive monitoring for status or category columns

Expected Patterns

Define expected data formats for your columns:

  • Email format: Standard email validation patterns
  • Phone numbers: Custom phone number formats for your region
  • Postal codes: ZIP code or postal code patterns
  • Custom formats: Define your own regex patterns for specialized data

Column-Specific Sensitivity

Configure different sensitivity levels for different columns:

  • Critical columns: Higher sensitivity (e.g., customer email, payment amounts)
  • Reference columns: Standard sensitivity (e.g., descriptions, notes)
  • Audit columns: Lower sensitivity (e.g., created_by, updated_at)

Best Practices

  1. Start Simple: Begin with default settings and refine thresholds as you learn your data patterns
  2. Historical Comparison: Sparvi Cloud automatically maintains historical profiles for meaningful anomaly detection
  3. Adjust Thresholds: Customize thresholds based on your data's natural variability
  4. Column Sensitivity: Set different sensitivity levels for critical columns vs. less important ones
  5. Severity Triage: Focus on high-severity anomalies first, then address medium and low ones
  6. Regular Monitoring: Sparvi Cloud runs profiles on automated schedules to establish normal patterns
  7. Document Findings: Use issue tracking to keep a record of anomalies and their resolutions
  8. Team Collaboration: Use @mentions and comments to involve the right people quickly
  9. Pattern Library: Build a library of expected patterns for format validation
  10. Notification Strategy: Configure Slack, email, or webhook alerts for critical anomalies

Next Steps

After mastering anomaly detection:

  1. Set up Validation Rules to complement anomaly detection with business-specific checks
  2. Explore Data Profiling to understand comprehensive data quality metrics
  3. Learn about Sparvi Cloud Database Connections to connect your data sources