Data Modeling
Data Modeling is the design of database schemas and table structures that organize data to support efficient queries, analytics, and maintain semantic consistency across users and applications.
Data modeling determines how tables are structured, how they relate to each other, and how dimensions and facts are organized. Common approaches include star schemas (facts surrounded by dimensions), snowflake schemas (normalized dimensions), and third-normal-form (3NF) designs for operational databases. The choice of model affects query performance, storage efficiency, maintainability, and how intuitive data is for business users. A star schema optimizes analytics queries; a normalized schema optimizes storage but requires complex joins for analysis.
Data modeling evolved as a discipline because poor schemas create confusion and poor query performance. The semantic layer in tools like dbt and Looker emerged to provide business-friendly names and definitions on top of underlying schemas, acknowledging that the most efficient schema for querying may not be the most intuitive for end users.
In practice, analytics teams often use star schemas with slowly-changing dimensions to track how attributes (like customer region) change over time. Operational databases use normalized schemas to reduce redundancy. Organizations increasingly use both: normalized raw data in data lakes, star schema analytics data in data warehouses, and semantic layers for business definitions.
Key Characteristics
- ▶Defines table structures, columns, and data types
- ▶Specifies relationships and foreign keys between tables
- ▶Organizes data into facts (measurements) and dimensions (attributes)
- ▶Balances query performance with storage efficiency
- ▶Establishes naming conventions and business definitions
- ▶Supports version control and testing like other code
Why It Matters
- ▶Improves query performance by organizing data for typical access patterns
- ▶Reduces storage costs through efficient schema design
- ▶Improves maintainability by creating consistent, documented structures
- ▶Enables business users to navigate data intuitively through semantic naming
- ▶Reduces time to analysis by providing pre-structured data
- ▶Supports compliance by enabling audit trails and tracking changes over time
Example
A retail star schema: orders_fact table contains transaction-level data (order_id, customer_id, date_id, amount), customer_dim tracks slowly-changing customer attributes (region, segment, lifetime value), product_dim has product details (category, brand), and date_dim enables efficient time-based analysis. Slow-change-dimension-2 technique tracks when customer region changed, enabling historical analysis of cohort behavior by original region.
Coginiti Perspective
Coginiti's semantic layer formalizes data models as governed assets: metrics, dimensions, hierarchies, and relationships defined once and consumed consistently across tools. Rather than modeling being a one-time design exercise, Coginiti treats the semantic model as a living, version-controlled artifact that evolves alongside the data and can be accessed programmatically by both analysts and AI agents.
Related Concepts
More in Core Data Architecture
Batch Processing
Batch Processing is the execution of computational jobs on large volumes of data in scheduled intervals, processing complete datasets at once rather than responding to individual requests.
Data Architecture
Data Architecture is the structural design of systems, tools, and processes that capture, store, process, and deliver data across an organization to support analytics and business operations.
Data Ecosystem
Data Ecosystem is the complete collection of interconnected data systems, platforms, tools, people, and processes that organizations use to collect, manage, analyze, and act on data.
Data Fabric
Data Fabric is an integrated, interconnected architecture that unifies diverse data sources, platforms, and tools to provide seamless access and movement of data across the organization.
Data Integration
Data Integration is the process of combining data from multiple heterogeneous sources into a unified, consistent format suitable for analysis or operational use.
Data Lifecycle
Data Lifecycle is the complete journey of data from creation or ingestion through processing, usage, governance, and eventual deletion or archival.
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.