Glossary/Core Data Architecture

Data Storage

Data Storage is the selection, configuration, and management of systems and infrastructure that persists data in ways optimized for retrieval speed, cost efficiency, and scalability.

Data storage encompasses choices about where and how data is kept: cloud object storage (S3, GCS) for low-cost, durable storage; data warehouses (Snowflake, BigQuery) for fast analytics queries; databases for operational applications; and specialized systems like feature stores or vector databases for machine learning. Each storage system optimizes for different trade-offs: object storage is cheap but slow, data warehouses are fast but more expensive, databases support transactions but may not scale to analytics volumes.

Organizations rarely use a single storage system; instead, they use a tiered approach: hot storage for current data (data warehouse), warm storage for recent historical data, and cold storage (S3 Glacier) for archival data. This strategy optimizes cost by paying premium prices only for data that needs to be accessed quickly.

Storage technology evolved from on-premises databases toward cloud-native systems with separated compute and storage, enabling independent scaling. This separation allows organizations to grow storage without proportionally growing compute costs, and query expensive historical data without keeping all compute running.

Key Characteristics

  • Supports multiple access patterns: sequential scans, random lookups, range queries
  • Separates storage capacity from compute resources in cloud systems
  • Implements durability and replication to prevent data loss
  • Provides tiered storage (hot, warm, cold) for cost optimization
  • Supports compression and encoding for space efficiency
  • Tracks data residency for compliance and performance

Why It Matters

  • Reduces total cost of ownership by choosing appropriate storage tier for each data type
  • Improves query performance by storing data in formats optimized for access patterns
  • Enables scalability to petabyte-scale data volumes without rearchitecting
  • Supports compliance by enabling data residency in specific geographies
  • Reduces costs by moving infrequently accessed data to cheaper storage
  • Enables organizations to retain historical data for trend analysis without excessive costs

Example

A media company stores raw video uploads in S3 (cheap, durable), transcoded videos in CloudFront CDN (fast for viewers), metadata in Snowflake (fast analytics), and archive video after 2 years to S3 Glacier (cold storage). Recent analytics are queried from hot warehouse; historical trends are materialized and cached. This tiered approach balances cost, performance, and compliance across the data lifecycle.

Coginiti Perspective

Inexpensive cloud storage has shifted the economics of data management: it now costs less to store data than to re-extract it. Coginiti leans into this by favoring ELT patterns that preserve raw data alongside modeled views, leaving it available to be remodeled for new analytical needs. CoginitiScript pipelines can materialize results as Parquet or CSV directly on object storage, or as Iceberg tables on Snowflake, Databricks, BigQuery, Trino, and Athena. Coginiti's object store browser and direct query capabilities let teams manage and analyze data files on object storage without requiring a separate warehouse, while the semantic layer provides consistent definitions across all storage tiers.

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.