Glossary/Data Storage & Compute

Projection Pushdown

Projection Pushdown is a query optimization technique that limits data scanning to only the columns needed, avoiding unnecessary I/O for unselected columns.

Projection pushdown pushes the SELECT columns (projections) down to the storage layer: if a query selects only customer_id and name from a customer table with 50 columns, pushdown ensures only those two columns are read. This is especially valuable in columnar systems where columns are stored separately: reading two columns from a 50-column table with pushdown reads 2/50ths the data. Without pushdown, all 50 columns are read from storage, transferred across network, and brought into memory even though 48 columns are unused.

Projection pushdown is automatically applied by modern query optimizers: columns not referenced in SELECT, WHERE, or JOIN are excluded from storage access. Columnar databases make this transparent: the optimizer understands that different columns are stored separately, so it automatically reads only required columns. Row-based databases also use pushdown: indexes often include selected columns, enabling the database to return data without accessing the main table (index-only scans).

In practice, projection pushdown is one of the most effective optimizations: a query selecting 10 columns from a 1000-column table processes data 100x smaller. The impact is even larger in columnar systems where compression works better on homogeneous column data.

Key Characteristics

  • Limits storage access to required columns only
  • Avoids I/O for unselected columns
  • Works automatically in columnar systems
  • Reduces network bandwidth in distributed systems
  • Reduces memory requirements
  • Improves with column statistics and indexes

Why It Matters

  • Reduces I/O volume by scanning fewer columns
  • Reduces network bandwidth in distributed systems
  • Reduces memory and cache usage
  • Dramatically improves query performance
  • Enables compression of selected columns only
  • Is automatically handled by modern optimizers

Example

Query "SELECT customer_id, name FROM customers" on customer table with 50 columns (id, name, email, phone, address, billing_address, shipping_address, etc.): without projection pushdown, storage reads all 50 columns for every row scanned. With pushdown, storage reads only two columns (customer_id, name). If 100 million customer records with 1KB per row, without pushdown transfers 100GB, with pushdown transfers only 2GB to query engine. Reduction improves query latency by 50x and reduces network usage proportionally.

Coginiti Perspective

The semantic layer's MEASURE() function and explicit dimension references in Semantic SQL give the target engine clear signals about which columns are actually needed, supporting projection pushdown in columnar storage formats like Parquet. When CoginitiScript publishes results to Parquet on object storage, the file format preserves column-level metadata that downstream query engines use for projection pushdown. This means the combination of governed publication and columnar format produces storage that is optimized for selective column reads.

Related Concepts

Query OptimizationSELECT ClauseColumnar StorageIndexStorage LayoutColumn PruningExecution PlanI/O Optimization

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.