Glossary/Data Governance & Quality

Data Quality

Data quality is the degree to which data is accurate, complete, timely, and conforms to business requirements, enabling confident use for decision-making and analysis.

Data quality encompasses multiple dimensions: accuracy (does the data represent reality?), completeness (are there unexpected nulls or missing values?), timeliness (is the data current or stale?), consistency (does the data align with business rules?), and validity (does it conform to specified formats and ranges?). Poor quality in any dimension undermines trust. High-quality data enables confident decision-making; low-quality data leads to incorrect conclusions.

Data quality deteriorates through multiple mechanisms: source system bugs (producing incorrect data), pipeline failures (losing or corrupting data), schema changes (breaking downstream expectations), or business rule violations (data that shouldn't exist does). Without active quality management, quality gradually degrades as systems age. Organizations often discover poor quality only after bad decisions are made based on flawed data.

Data quality management involves defining quality standards, measuring quality, detecting issues, and remediating root causes. Standards vary by use case: financial data requires high accuracy; marketing attribution might tolerate 10% error. Quality is measured through metrics: null rates, duplicate rates, schema conformance. Detection uses validation rules and anomaly detection. Remediation involves fixing data or adjusting processes to prevent future issues.

Key Characteristics

  • Multidimensional: accuracy, completeness, timeliness, consistency
  • Measured through quality metrics and tests
  • Managed through validation rules and monitoring
  • Tied to business requirements and use cases
  • Requires root cause analysis and remediation
  • Continuous process, not one-time effort

Why It Matters

  • Confidence: High-quality data enables confident decisions
  • Efficiency: Reduces time spent investigating and fixing bad data
  • Compliance: Many regulations require demonstrating data quality
  • Trust: Quality is foundational to analytics adoption
  • Cost: Poor quality leads to wrong decisions and wasted resources

Example

A customer table should have: (1) unique customer IDs (no duplicates), (2) required fields like email (no nulls), (3) consistent phone number format, (4) valid registration dates (not in future). Data quality tests validate these. If tests fail, data stewards investigate and fix root causes.

Coginiti Perspective

Coginiti addresses data quality at multiple levels. CoginitiScript #+test blocks define quality assertions that run within pipelines, returning pass/fail based on whether results are empty (pass) or contain rows (fail), with onFailure options to stop or continue execution. SMDL enforces semantic quality by typing dimensions and declaring measure aggregation rules, preventing misuse at query time. The Analytics Catalog's promotion workflow ensures that only reviewed and tested logic produces the data that downstream consumers rely on.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.