Handbook

Data architectures (blueprint)

Purpose: Catalog of data architecture patterns — structural approaches to organizing data storage, processing, and access at scale. Each pattern describes its structure, trade-offs, and fitness criteria.

Why it matters: Data architecture is the foundation of data strategy. It constrains what latency, consistency, ownership, and cost profiles are achievable before tool selection. Patterns here are reference models; a concrete system often blends ideas (for example, lakehouse storage with mesh-style ownership).

Audience: Teams adopting Big data & data engineering; architecture choices for a specific project are documented as ADRs in docs/adr/.

Architecture evolution (conceptual timeline)

Mesh is not a strict successor to lakehouse — it is primarily an operating model that can sit on top of lakehouse or warehouse technology.

Use the linked guides when you need comparison matrices, decision flowcharts, technology mapping, and anti-patterns beyond the one-line summaries in the table.

Architecture	Core idea	Best fit	Guide
Lambda	Dual path — batch layer for completeness + speed layer for low latency; serving layer merges	Systems needing both historical accuracy and real-time views; complex but flexible	Lambda, Kappa & unified data architectures
Kappa	Single stream-processing path; reprocess by replaying the event log	Stream-first systems where batch reprocessing from the log is sufficient	Lambda, Kappa & unified data architectures
Unified streaming / lakehouse	One engine and/or table-centric storage for batch + stream + serving (e.g., Flink, Delta/Iceberg/Hudi)	Reducing dual-path cost while keeping reprocessing and analytics	Lambda, Kappa & unified data architectures
Data mesh	Domain-oriented, decentralized data ownership; data as a product; federated governance	Large organizations with multiple domains; autonomous teams; data product thinking	Data mesh: domain-oriented decentralized architecture
Data lakehouse	Unified storage combining data lake flexibility with data warehouse reliability (Delta Lake, Iceberg, Hudi)	Organizations wanting to eliminate separate lake + warehouse; ML and BI on same storage	Lambda, Kappa & unified data architectures
Medallion (bronze/silver/gold)	Layered data quality — raw ingestion (bronze), cleaned/conformed (silver), business-ready (gold)	Lakehouse and data lake environments; progressive quality refinement	BIGDATA.md §3
Data warehouse (Kimball)	Dimensional modeling — facts and dimensions; star/snowflake schemas; bottom-up	BI and reporting; well-understood business processes; SQL-centric analytics	BIGDATA.md §1
Data warehouse (Inmon)	Enterprise data warehouse — normalized; top-down; single source of truth	Enterprise-wide integration; strict consistency requirements	BIGDATA.md §1

Decision guidance: Architecture selection depends on latency requirements, data volume, team structure, analytics use cases, and operational maturity. See Big data & data engineering body of knowledge §1 (principles) for the decision framework and governance/quality context in later sections.

Cross-reference: For pipeline patterns (batch ETL/ELT, CDC, micro-batch) and DataOps practices that implement these architectures, use Big data & data engineering body of knowledge §§3–4. Processing engine trade-offs are summarized in Data processing engines & platforms.

Medallion and dimensional warehouse rows link to the relevant body-of-knowledge sections in Big data & data engineering body of knowledge; add project-specific star schemas and conformed dimensions in docs/adr/ or modeling appendices as your repo convention requires.

Keep project-specific data architecture decisions in docs/adr/ and pipeline documentation in docs/development/, not in this file.