Handbook
Data engineering techniques (blueprint)
**Purpose:** Deep, **project-agnostic** guides for data engineering techniques. Each guide covers principles, implementation patterns, trade-offs, and tooling so teams can reuse patterns without copyi
Data engineering techniques (blueprint)
Purpose: Deep, project-agnostic guides for data engineering techniques. Each guide covers principles, implementation patterns, trade-offs, and tooling so teams can reuse patterns without copying project-specific schemas or pipelines.
Audience: Teams adopting Big data & data engineering; project-specific data configuration stays in docs/product/data/ where that convention exists.
Techniques here complement architecture and product docs: modeling and quality rules apply regardless of whether you run Lambda, Kappa, or a lakehouse. Planned guides will extend pipeline and validation topics without duplicating BIGDATA.md.
| Technique | Focus | Guide |
|---|---|---|
| Operational data modeling | Relational modeling (normalization, ER), indexing strategies, query plan analysis, transaction isolation, schema migration, NoSQL modeling (document, key-value, graph, time-series), polyglot persistence | operational-data-modeling.md |
| Pipeline patterns | ETL/ELT, CDC, idempotency, orchestration integration | BIGDATA.md §3 |
| Data quality | Validation layers, testing data pipelines, observability for freshness and volume | BIGDATA.md §2 |
How to use this section: Read operational-data-modeling.md when designing schemas, choosing indexes, or comparing relational vs NoSQL shapes. When pipeline-pattern and data-quality guides exist, use them alongside ADRs for project-specific pipeline layout and SLAs.
Suggested reading order for new practitioners:
BIGDATA.md§1–2 for vocabulary and governance framing.operational-data-modeling.mdbefore locking physical schemas.- Data architectures (blueprint) when choosing batch vs stream vs mesh posture.
Core knowledge: BIGDATA.md — data engineering principles, governance, pipeline patterns, DataOps.
Bridge: Big Data ↔ SDLC ↔ PDLC bridge — how data engineering maps to delivery and product lifecycles.
Architectures & engines: Data architectures (blueprint), Data technologies (blueprint) (including processing-engines.md).
Cross-reference: BIGDATA.md §2 (governance and quality dimensions) and §5 (competencies) align with the techniques catalog; mesh-specific ownership is covered in architectures/data-mesh.md.
Scope boundary: Blueprint technique guides stay pattern-level (what to consider, how to trade off). Concrete DDL, DAG definitions, and environment-specific connection strings belong in docs/development/ or your pipeline repo, with ADRs for durable decisions.
Related blueprints: For end-to-end data discipline context, start at Big data & data engineering, then drill into architectures and technologies as needed.
Keep project-specific data architecture decisions in docs/adr/ and pipeline documentation in docs/development/, not in this file.
Canonical source
Edit https://github.com/autowww/blueprints/blob/main/disciplines/data/bigdata/techniques/README.md first; regenerate with docs/build-handbook.py.