Operational data modeling (blueprint)

Operational data modeling covers the design of data structures for transactional, application-serving databases — the storage that powers day-to-day product features. This complements the analytical/scale focus of the…

Guide · Updated 2026-07-03 · Source

1. Relational modeling

Normalization

Normal form	Rule	Why it matters
1NF	Atomic values; no repeating groups	Eliminates multi-valued columns; enables indexing
2NF	1NF + no partial dependencies on composite keys	Reduces data duplication in tables with composite PKs
3NF	2NF + no transitive dependencies	Each non-key column depends only on the primary key; reduces update anomalies
BCNF	Every determinant is a candidate key	Stronger form of 3NF; handles edge cases in key dependencies

Practical guidance: Most operational schemas should be in 3NF. Denormalize deliberately for read performance (and document the reason in an ADR).

Entity-Relationship modeling

Concept	Description
Entity	A distinct thing tracked by the system (User, Order, Product)
Attribute	A property of an entity (name, created_at, price)
Relationship	An association between entities (User places Order)
Cardinality	1:1, 1:N, M:N — determines foreign key placement and junction tables
Participation	Total (every entity must participate) vs partial (optional)

Naming conventions

Convention	Example	Rationale
snake_case for tables and columns	`order_items`, `created_at`	Wide RDBMS convention; case-insensitive across databases
Singular table names	`user`, `order`	Entity represents one thing; collection is implied
Explicit join table names	`user_role`	Alphabetical order of participating entities
`_id` suffix for foreign keys	`user_id`, `order_id`	Clear FK identification
Avoid reserved words	`user_account` not `user`	Prevents quoting issues across databases

2. Indexing strategies

Index type	Use case	Trade-off
B-tree (default)	Equality and range queries, ORDER BY, unique constraints	General-purpose; write overhead per index
Hash	Exact equality lookups only	Faster equality; no range support; not all RDBMS support
GIN (Generalized Inverted)	Full-text search, JSONB containment, array overlap	Fast reads on complex types; larger index, slower writes
GiST (Generalized Search Tree)	Geometric, range, and proximity queries	Spatial and range data; less efficient for simple equality
Partial / filtered	Index subset of rows matching a condition	Smaller index; faster for filtered queries; requires predicate match
Covering / include	Store extra columns in index to avoid table lookup	Faster reads; larger index; write overhead
Composite	Multi-column index for combined predicates	Column order matters — leftmost prefix rule applies

Query plan analysis

Step	Activity
1. EXPLAIN ANALYZE	Run the query with execution plan and actual timings
2. Identify bottlenecks	Look for sequential scans on large tables, nested loops on unindexed joins, sort operations
3. Check index usage	Verify expected indexes are used; check for index-only scans vs table lookups
4. Statistics freshness	Ensure table statistics are current (`ANALYZE` / `UPDATE STATISTICS`)
5. Iterate	Add/modify indexes, rewrite query, or restructure schema; re-measure

3. Transaction isolation and concurrency

Isolation level	Dirty reads	Non-repeatable reads	Phantom reads	Performance
Read Uncommitted	Possible	Possible	Possible	Highest
Read Committed	Prevented	Possible	Possible	Good (PostgreSQL default)
Repeatable Read	Prevented	Prevented	Possible (prevented in PostgreSQL)	Moderate
Serializable	Prevented	Prevented	Prevented	Lowest (potential serialization failures)

Concurrency patterns

Pattern	Description	When to use
Optimistic locking	Check version/timestamp at write time; retry on conflict	Low contention; read-heavy workloads
Pessimistic locking	Lock row(s) with `SELECT FOR UPDATE` before modification	High contention; critical sections
Advisory locks	Application-level locks using database primitives	Coordinating distributed processes; preventing duplicate work
SKIP LOCKED	Skip rows locked by other transactions	Job queues; work distribution without contention

4. Schema migration

Principles

Principle	Description
Version controlled	Every schema change is a numbered migration in source control
Forward-only	Avoid rollback migrations in production; use forward-fixing migrations
Backward compatible	New schema must work with both old and new application code during deployment
Tested	Migrations run in CI against a test database; verify data integrity
Small and incremental	One concern per migration; avoid combining schema and data changes

Expand-contract pattern

Safe schema changes for zero-downtime deployments:

Phase	Activity	Code state
Expand	Add new column/table alongside existing	Code writes to both old and new; reads from old
Migrate	Backfill data from old to new column/table	Code writes to both; reads from new
Contract	Drop old column/table; remove dual-write code	Code uses only new schema

Migration tooling

Tool	Language/framework	Approach
Flyway	JVM, SQL	SQL-based migrations, version tracking
Liquibase	JVM	XML/YAML/SQL changesets, rollback support
Alembic	Python (SQLAlchemy)	Code-generated migrations, branch management
Prisma Migrate	TypeScript/JavaScript	Schema-first, declarative migrations
golang-migrate	Go	SQL file migrations, database-agnostic
dbmate	Any	Plain SQL migrations, lightweight, database-agnostic

5. NoSQL data modeling

Document databases (MongoDB, DynamoDB, Firestore)

Pattern	Description	When to use
Embedding	Nest related data within a single document	1:1 or 1:few relationships; data accessed together; bounded growth
Referencing	Store IDs and fetch separately	M:N relationships; unbounded growth; data updated independently
Bucket pattern	Group time-series data into fixed-size buckets	IoT telemetry, logs, time-based queries
Polymorphic pattern	Different document shapes in same collection with a type discriminator	Flexible schemas; product catalogs; event stores

Key-value stores (Redis, DynamoDB)

Pattern	Description	Use case
Cache-aside	Application reads cache first; on miss, reads database and populates cache	Read-heavy, latency-sensitive
Write-through	Writes go to cache and database simultaneously	Strong consistency between cache and database
TTL-based expiration	Keys expire after configurable time	Session storage, rate limiting, temporary data
Sorted sets	Ordered collections with score-based ranking	Leaderboards, priority queues, time-based feeds

Graph databases (Neo4j, Amazon Neptune)

When to choose graph	When to avoid graph
Relationships are the primary query target	Simple key-value or document lookups
Traversal depth is variable (friends-of-friends, shortest path)	Fixed-depth joins (standard SQL)
Schema evolves frequently	Write-heavy, high-throughput transactional workloads
Social networks, recommendation, fraud detection, knowledge graphs	OLAP/analytical workloads (better in columnar stores)

Time-series databases (TimescaleDB, InfluxDB, QuestDB)

Concern	Guidance
Schema design	Wide vs narrow table; tags vs fields; time partitioning
Retention policies	Automatic downsampling (raw → hourly → daily); tiered storage
Query patterns	Aggregation over time windows; last-known-value; gap filling
Ingestion	Batch vs streaming; out-of-order handling; write amplification

6. Polyglot persistence

Using multiple database types, each optimized for its access pattern:

Access pattern	Database type	Example
Transactional CRUD	Relational (PostgreSQL, MySQL)	User accounts, orders, inventory
Full-text search	Search engine (Elasticsearch, Meilisearch)	Product search, log search
Caching	Key-value (Redis, Memcached)	Session store, API response cache
Graph traversal	Graph (Neo4j, Neptune)	Recommendations, social graph
Time-series	Time-series (TimescaleDB, InfluxDB)	Metrics, IoT telemetry, financial ticks
Document / flexible schema	Document (MongoDB, DynamoDB)	Content management, product catalogs
Analytical	Columnar (ClickHouse, BigQuery)	Business intelligence, event analytics

Synchronization strategies

Strategy	Description	Consistency
Dual write	Application writes to both stores	Inconsistent on failure (avoid without outbox)
Change data capture (CDC)	Capture database changes and propagate to secondary stores	Eventually consistent; reliable
Event sourcing	Events are the source of truth; projections build views in each store	Eventually consistent; auditable
ETL / scheduled sync	Periodic batch transfer from primary to secondary	Latency = batch interval; simple

External references

Topic	URL	Why it is linked
Use The Index, Luke	https://use-the-index-luke.com/	SQL indexing and performance — practical and thorough
PostgreSQL Documentation	https://www.postgresql.org/docs/	Authoritative reference for advanced SQL features
Designing Data-Intensive Applications	https://dataintensive.net/	Distributed data systems — the essential reference
Database Internals (Petrov)	https://www.databass.dev/	How databases work under the hood
MongoDB Data Modeling	https://www.mongodb.com/docs/manual/data-modeling/	Document database modeling patterns
DynamoDB Book (Alex DeBrie)	https://www.dynamodbbook.com/	DynamoDB single-table design patterns

Big data & data engineering