Glossary/Open Table Formats

Hidden Partitioning

Hidden partitioning is a table format feature that partitions data logically for query optimization without encoding partition values in file paths or requiring file reorganization during partition scheme changes.

Traditional partitioning embeds partition values in directory structures (e.g., /year=2024/month=4/), constraining future flexibility. Hidden partitioning decouples partition logic from physical file organization by storing partition information in table metadata. The query engine automatically evaluates partition predicates against metadata, achieving the same performance optimization without exposing partition semantics in file names.

This approach solves several real problems. When partition requirements change (e.g., from monthly to quarterly), hidden partitioning requires only metadata updates, not expensive file reorganization. Multiple partition schemes can coexist transparently, allowing different engines or applications to optimize differently. The metadata layer tracks which files belong to which partitions, enabling efficient file pruning without explicit path parsing.

Hidden partitioning is a distinguishing feature of modern table formats like Apache Iceberg. It requires metadata sophistication to track per-file partition ranges (min/max partition values), enabling query engines to implement predicate pushdown. While initially developed by Iceberg, the advantages have driven adoption of hidden partitioning concepts across the table format ecosystem.

Key Characteristics

  • Partition data without encoding values in file paths or directory structures
  • Store partition information in table metadata for all-or-nothing queries
  • Eliminate file reorganization when evolving partition schemes
  • Support multiple partition schemes transparently on the same data
  • Enable predicate pushdown through metadata-based partition statistics
  • Simplify operations by decoupling physical organization from logical partitioning

Why It Matters

  • Dramatically reduces operational overhead of schema or partition evolutions
  • Eliminates expensive file reorganization projects for partition strategy changes
  • Supports flexible partition evolution without data rewrites
  • Improves query optimization through metadata-based partition elimination
  • Simplifies multi-engine access by hiding physical file organization details
  • Enables efficient partition management without manual maintenance scripts

Example

`
-- Table created with hidden partitioning (no physical path encoding)
CREATE TABLE events (
  event_id INT,
  user_id INT,
  event_timestamp TIMESTAMP,
  region STRING
) USING iceberg
PARTITIONED BY (region, month(event_timestamp));

-- Files are organized physically without partition encoding:
-- s3://warehouse/events/00000-a1b2c3.parquet
-- s3://warehouse/events/00001-d4e5f6.parquet

-- Metadata tracks partition values:
{
  "partitions": [
    {"min_region": "US-EAST", "max_region": "US-WEST", "file": "00000-a1b2c3.parquet"},
    {"min_region": "EU", "max_region": "EU", "file": "00001-d4e5f6.parquet"}
  ]
}

-- Query with partition filter (pruning happens via metadata)
SELECT * FROM events WHERE region = 'US-EAST';
-- Engine uses metadata to skip irrelevant files without parsing paths
`

Coginiti Perspective

Hidden partitioning aligns with Coginiti's philosophy of abstracting physical storage details from analytics consumers. Coginiti's semantic layer operates at the business concept level, defining dimensions and measures without reference to partitioning schemes. When Iceberg tables published through CoginitiScript use hidden partitioning, the combination provides two layers of abstraction: the table format hides partition mechanics from the query engine, and the semantic layer hides physical column details from the analyst.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.