Glossary/Data Storage & Compute

Object Storage

Object Storage is a cloud storage system that manages data as individual, discrete objects with metadata, accessed via HTTP APIs rather than file systems or block storage.

Object storage (S3, Google Cloud Storage, Azure Blob) treats data as individual objects with key names: instead of navigating file hierarchies, you request objects by key. Objects are immutable (replace entire object rather than edit in place) and can store any data (files, structured records, images, video). Object storage uses distributed infrastructure: objects are stored redundantly across geographic regions, offering extreme durability (11 nines) and high availability. Object storage is optimized for throughput and scalability rather than latency: bulk operations (scanning billions of objects) are efficient, but random access is slower than local disk.

Object storage became the foundation of cloud data architecture because it's cheap (cents per terabyte per month), durable, and scales to exabytes. Data lakes, data warehouses, and backup systems all use object storage. Organizations consolidate data in object storage and add compute/analytics layers on top rather than moving data between systems.

In practice, object storage is transparent to analytics users: querying data in S3 through a data warehouse feels like querying local tables. Query engines handle fetching objects. The key insight is separation of storage from compute: data sits in object storage permanently while different compute engines (Spark, Snowflake, Athena) query the same data.

Key Characteristics

  • Stores data as individual objects with metadata
  • Accessed via HTTP APIs (not file systems)
  • Immutable: replace objects rather than edit in place
  • Extremely durable through geographic redundancy
  • Optimized for bulk throughput, not latency
  • Scales to exabytes of data

Why It Matters

  • Reduces storage costs to cents per terabyte
  • Provides durability suitable for long-term data retention
  • Enables separation of storage from compute infrastructure
  • Scales to massive data volumes without performance degradation
  • Supports multiple compute engines querying same data
  • Provides global data distribution for compliance and performance

Example

A video platform stores all video content in S3: raw uploads are written to one bucket (immutable), processed versions (transcoded, lower bitrate) in another, analytics data (view counts, quality metrics) in a third. CloudFront CDN caches frequently accessed videos, Spark jobs analyze view patterns in S3 data directly (no copying), backup copies are replicated to another region automatically. Total cost is dramatically lower than traditional storage because S3 costs less than a penny per gigabyte per month.

Coginiti Perspective

Coginiti has first-class object storage integration. The object store browser lets teams manage files on S3, Azure Blob, and GCS directly within the platform. CoginitiScript publishes results as Parquet or CSV to object storage via configured connections, and users can query data files on object storage without loading them into a warehouse first. This supports ELT patterns where raw data lands on object storage and is transformed in place or selectively loaded into analytical platforms.

Related Concepts

Cloud StorageCloud Data WarehouseData LakeBlock StorageData DurabilityCloud InfrastructureStorage RedundancyImmutable Data

See Semantic Intelligence in Action

Coginiti operationalizes business meaning across your entire data estate.