Schema Validation
Schema validation is automated verification that data conforms to expected structure, including column names, data types, nullability, and constraints.
Schema validation checks that incoming data matches expected structure: required columns exist, data types are correct, null constraints are enforced, and relationships are valid. For example, a schema says the customers table has columns (customer_id: integer, name: string, created_at: timestamp), and validation ensures all rows conform. Schema validation catches structural errors: missing columns, type mismatches (string instead of integer), or unexpected columns.
Schema validation emerged because data structures change and break downstream systems. A source system adds a column, a pipeline developer removes a column by accident, or a schema migration adds a constraint. Without validation, these changes propagate downstream undetected, causing query failures or silent data corruption. Schema validation detects changes immediately.
Schema validation includes both strict validation (changes must be approved) and flexible validation (new columns allowed, missing columns flagged). It can be enforced at different stages: at ingestion (reject data that doesn't match schema), during transformation (enforce that outputs match expected schema), or continuously in production (monitor for schema drift). Tools like Great Expectations, dbt, and data quality platforms offer schema validation capabilities.
Key Characteristics
- ▶Validates column names, types, and constraints
- ▶Detects schema changes and drift
- ▶Can enforce strict or flexible schema requirements
- ▶Runs at ingestion, transformation, or production stages
- ▶Integrates with data quality platforms
- ▶Prevents propagation of structural errors
Why It Matters
- ▶Stability: Detects breaking schema changes early
- ▶Reliability: Prevents queries from failing due to missing columns
- ▶Clarity: Schema definitions document structure
- ▶Automation: Validates without manual inspection
- ▶Governance: Enforces change control for schemas
Example
Schema validation for orders table expects: order_id (integer, non-null), customer_id (integer, non-null, FK to customers), amount (decimal, non-null), created_at (timestamp, non-null). If a data load adds a column (new_field: string) or removes one (amount missing), validation flags the discrepancy.
Coginiti Perspective
SMDL provides a form of schema validation at the semantic layer. Entity definitions declare expected dimensions with specific types (text, number, date, datetime, bool), and Semantic SQL enforces type compatibility at query time. CoginitiScript #+test blocks can validate physical schemas by querying information_schema or system catalogs to assert expected columns, types, and constraints exist before pipeline logic executes. Publication with schema specification ensures materialized outputs conform to declared structure.
Related Concepts
More in Data Governance & Quality
Analytics Catalog
An analytics catalog is a specialized data catalog focused on analytics assets such as metrics, dimensions, dashboards, and saved queries, enabling discovery and governance of analytics-specific objects.
Business Metadata
Business metadata is contextual information that gives data meaning to business users, including definitions, descriptions, ownership, and guidance on appropriate use.
Data Catalog
A data catalog is a searchable repository of metadata about data assets that helps users discover available datasets, understand their content, and assess their quality and suitability for use.
Data Certification
Data certification is a formal process of validating and approving data quality, documenting that data meets governance standards and is safe for use in critical business decisions.
Data Contracts
A data contract is a formal agreement specifying the expectations between data producers and consumers, including schema, quality guarantees, freshness SLAs, and remediation obligations.
Data Governance
Data governance is a framework of policies, processes, and controls that define how data is managed, who is responsible for it, and how it should be used to ensure quality, security, and compliance.
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.