Entity Resolution
Entity Resolution is the process of identifying and matching records that represent the same real-world entity across databases, data sources, or versions, enabling unified views and accurate analytics.
When organizations integrate data from multiple sources, they face a fundamental challenge: the same entity appears in different systems with different identifiers and variations. "John Smith" in the CRM might be "J. Smith" in accounting and "John Edward Smith" in HR. These are likely the same person, but systems treat them as different entities. Entity resolution (also called entity matching, record linkage, or deduplication) automatically identifies these matches using techniques like string similarity, machine learning, and domain rules.
Entity resolution is critical for analytics accuracy. Without resolution, metrics are distorted: customer lifetime value is split across duplicate records, churn rates are understated (customers appear to leave but are just duplicates), and customer segmentation is unreliable. High-quality entity resolution enables unified customer views, accurate metrics, and reliable analytics. The challenge is balancing precision (avoiding false matches) and recall (finding all true matches).
Modern entity resolution combines multiple techniques: string matching for similar names, machine learning models trained on labeled examples, knowledge graphs capturing entity relationships, and manual review for ambiguous cases. The field is evolving rapidly with improvements in AI: language models can understand context, reducing false matches that simpler algorithms would make.
Key Characteristics
- ▶Identifies and matches records representing the same entity across sources
- ▶Uses techniques including string similarity, machine learning, and domain rules
- ▶Handles variations in naming, formatting, and identifier schemes
- ▶Balances precision (correct matches only) and recall (finding all true matches)
- ▶Produces unified entity identifiers enabling single-customer-view analytics
- ▶Often includes manual review workflows for ambiguous or high-value matches
Why It Matters
- ▶Enables accurate analytics by eliminating duplicate entities distorting metrics
- ▶Supports unified customer views essential for customer analytics and personalization
- ▶Improves data quality by surfacing inconsistencies in entity representation
- ▶Reduces customer service issues where systems treat the same customer differently
- ▶Enables compliance by establishing canonical customer identities
- ▶Facilitates fraud detection by connecting entities that appear disconnected
Example
An e-commerce company integrates web sales (customer emails), in-store transactions (names), and customer service (phone numbers). Entity resolution matches these sources: web customer "john.smith@email.com" matches in-store customer "John Smith" (using email/phone/address matching) and service customer "+1-555-123-4567" (using phone number). Result: a unified customer view instead of three separate records.
Coginiti Perspective
Coginiti supports entity resolution workflows through CoginitiScript and testing frameworks, enabling organizations to build and validate entity matching logic that feeds into semantic models. By formalizing entity resolution in code with test coverage, teams ensure that analytics consume deduplicated, canonical entities, producing reliable metrics and single-view analytics across integrated data sources.
Related Concepts
More in Knowledge Representation
Concept Modeling
Concept Modeling is the process of defining and structuring the fundamental ideas, entities, and relationships within a domain to create a shared understanding that can be used for analytics, integration, and AI reasoning.
Entity
An Entity is a distinct object or concept that can be uniquely identified and described using properties and relationships, serving as a fundamental unit in knowledge representation and data modeling.
Graph Database
A Graph Database is a specialized data system that stores and retrieves data organized as networks of connected entities and relationships, optimizing for traversal and pattern-matching queries over relational structure.
Knowledge Graph
A Knowledge Graph is a structured representation of information where entities (people, places, concepts) are nodes and relationships between them are edges, enabling semantic understanding and traversal of complex data.
Linked Data
Linked Data is a method of publishing structured information on the web using standard formats and linking that data to external sources, enabling automatic discovery and integration across diverse systems.
Ontology
An Ontology is a formal specification of concepts, categories, relationships, and rules that define and organize knowledge within a domain, enabling machines to understand meaning and relationships.
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.