skip to content
Newvick's blog
Table of Contents

I gave a presentation at $work about data modelling based on William Kent’s Data and Reality (read the 2nd edition, not the third). As expected, the discussion turned out to be quite philosophical.

How difficult is it to define “one thing” in the database? Turns out, there’s no absolute correct answer to this.

(The rest of the post is the presentation itself)

Data Modeling: The Challenge of Vagueness


Entities are a state of mind. No two people agree on what the real world view is.

  • Metaxides

Introduction

  • Data modeling seems simple: map real-world things to database structures
  • Reality: It’s profoundly complex and philosophical
  • An information system models a “small, finite subset of the real world”
  • But what exactly is that subset, and how do we define it?

The Deceptive Simplicity

We expect:

  • One record in the employee file for each person employed
  • Clear correspondence between database constructs and real-world things

But this immediately runs into trouble…


Four Fundamental Questions

  1. What constitutes “one thing”?
  2. When are two things “the same thing”?
  3. How do we handle change while maintaining identity?
  4. What categories should we use to classify things?

Question 1: What is “One Thing”?

That appears at first to be a trivial, irrelevant, irreverent, absurd question.

It’s not.


The Parts Example

Consider a parts inventory system:

  • Does “part” mean one physical object?
  • Or does it mean one kind of part?
Part #A123: Quantity 500 (in Warehouse 1)
Part #A123: Quantity 200 (in Warehouse 2)

Is this one thing or many things?


The Book Example

What is “one book”?

  • The abstract work (regardless of language or edition)
  • A specific edition
  • A specific physical copy
  • A specific printing

Library database vs. Bookstore database vs. Publisher database


The Warehouse Example

What is “one warehouse”?

  • A single building?
  • A group of buildings?
  • A floor within a building?

IBM location in Santa Teresa has one building number but eight distinct towers called “building A”, “building B”, etc. How many buildings are there?


The Healthcare Example

What is “one patient record”?

  • All information about a person across their lifetime?
  • Information from one hospital visit?
  • Information related to one condition?
  • Information accessible to one provider?

Question 2: How Many Things Is It?

A single entity can be multiple things simultaneously in our data model.


The Soccer Player Example

When Joe Smith, playing halfback, scores a goal:

  • Data about two things is modified:
    • The number of goals by Joe Smith
    • The number of goals by a halfback

That human figure is represented as (and is) two things.


The Healthcare Example

A doctor in a hospital system might be:

  • An employee (HR system)
  • A care provider (clinical system)
  • A researcher (research database)
  • A resource (scheduling system)

Each with different attributes and relationships.


The Dual Role Example

Two related people (husband and wife) who work for the same company:

Each person must be considered twice:

  • Once as an employee
  • Once as a dependent of an employee

How many people are involved?


Question 3: The Challenge of Change

How much can something change before it becomes something else?


The Car Example

If you and I start trading parts of our cars:

  • Tires, wheels, transmissions, suspensions, etc.

At what point have we exchanged cars?

The DMV’s arbitrary decision: the “essence” of a car is the engine block.


The Healthcare Example

Patient identity through time:

  • Different physical body (cells replace themselves)
  • Different mental states
  • Different capabilities
  • Different diagnoses

Is a patient with dementia the “same person” as before?


The Organization Example

Is it still the same company after changes in:

  • Employees? (Of course)
  • Management? (Yes)
  • Owners? (Maybe)
  • Buildings and facilities? (Yes)
  • Locations? (Probably)
  • Name? (Probably)
  • Principal business? (Maybe)

Versions and Time

  • When do we discard the old and let the new replace it?
  • When do we treat old and new as distinct things?
  • When do we try to do both?

“These several things are different versions of the same thing”


Question 4: Categories and Classification

What is it? In what categories do we perceive the thing to be?


The Employee Example

Does “employee” include:

  • Part-time employees?
  • Contract employees?
  • Employees of subsidiary companies?
  • Former employees?
  • Retired employees?
  • Employees on leave?
  • Someone who has accepted an offer but not started?

The Healthcare Example

What is a “patient”?

  • Someone currently admitted to the hospital?
  • Anyone who has ever received care?
  • Someone with an upcoming appointment?
  • Someone in the emergency waiting room?
  • An unborn fetus being monitored?

Fuzzy Boundaries

“A more amusing example is to imagine a continuum of physical objects between some given chair and table… There will be some strange objects in this continuum which cannot clearly be assigned to either class.”


The Role vs. Category Problem

Is something defined by:

  • What it is? (intrinsic nature)
  • What it’s used for? (role)
  • Where it is? (context)

The same hollow metal tube might be called a pipe, an axle, a lamp pole, a mop handle…


The Changing Category

Categories can change with time:

  • A dependent becomes an employee, then a customer
  • A slab of marble becomes a sculpture
  • A person becomes a patient, then recovers

Practical Implications for Data Modeling


The Arbitrary Nature of Models

  • No model is “correct” in an absolute sense
  • Models are conventions agreed upon by users
  • Different applications may need different models
  • Integration requires reconciling these differences

Guidelines for Better Data Modeling

  1. Acknowledge ambiguity upfront
  2. Define clear conventions for your specific context
  3. Document assumptions about identity and categories
  4. Design for change and evolution
  5. Consider how different stakeholders view the same entities

Example: Healthcare Patient Model

Option 1: Person-centric

  • One record per person
  • All encounters, conditions as related entities
  • Good for: Longitudinal care, population health

Option 2: Encounter-centric

  • One record per hospital visit
  • Person as a related entity
  • Good for: Billing, operational metrics

The Philosophical Reality

“Before we go charging off to design or use a data structure, let’s think about the information we want to represent. Do we have a very clear idea of what that information is like? Do we have a good grasp of the semantic problems involved?”


Remember:

“Becoming an expert in data structures is like becoming an expert in sentence structure and grammar. It’s not of much value if the thoughts you want to express are all muddled.”


Conclusion: Embracing the Challenge

  • Data modeling is as much philosophy as technology
  • The goal isn’t perfect modeling (impossible) but useful modeling
  • Success comes from understanding the inherent vagueness and making deliberate choices

Discussion

  • What entities in our organization have ambiguous boundaries?
  • Where have we encountered “one thing vs. many things” problems?
  • How do we handle identity through change?

💬 Have thoughts on this post? Send me an email or use this form

If you're interested in updates, you can subscribe below or via the RSS feed