Handbook
Data engineering techniques (blueprint)
Purpose: Deep, project-agnostic guides for data engineering techniques. Each guide covers principles, implementation patterns, trade-offs, and tooling so teams can reuse patterns without copying project-specific schemas…
Audience: Teams adopting Big data & data engineering; project-specific data configuration stays in docs/product/data/ where that convention exists.
Techniques here complement architecture and product docs: modeling and quality rules apply regardless of whether you run Lambda, Kappa, or a lakehouse. Planned guides will extend pipeline and validation topics without duplicating Big data & data engineering body of knowledge.
| Technique | Focus | Guide |
|---|---|---|
| Operational data modeling | Relational modeling (normalization, ER), indexing strategies, query plan analysis, transaction isolation, schema migration, NoSQL modeling (document, key-value, graph, time-series), polyglot persistence | Operational data modeling (blueprint) |
| Pipeline patterns | ETL/ELT, CDC, idempotency, orchestration integration | BIGDATA.md §3 |
| Data quality | Validation layers, testing data pipelines, observability for freshness and volume | BIGDATA.md §2 |
How to use this section: Read Operational data modeling (blueprint) when designing schemas, choosing indexes, or comparing relational vs NoSQL shapes. When pipeline-pattern and data-quality guides exist, use them alongside ADRs for project-specific pipeline layout and SLAs.
Suggested reading order for new practitioners:
- Big data & data engineering body of knowledge §1–2 for vocabulary and governance framing.
- Operational data modeling (blueprint) before locking physical schemas.
- Data architectures (blueprint) when choosing batch vs stream vs mesh posture.
Core knowledge: Big data & data engineering body of knowledge — data engineering principles, governance, pipeline patterns, DataOps.
Bridge: Big Data ↔ SDLC ↔ PDLC bridge — how data engineering maps to delivery and product lifecycles.
Architectures & engines: Data architectures (blueprint), Data technologies (blueprint) (including Data processing engines & platforms).
Cross-reference: Big data & data engineering body of knowledge §2 (governance and quality dimensions) and §5 (competencies) align with the techniques catalog; mesh-specific ownership is covered in Data mesh: domain-oriented decentralized architecture.
Scope boundary: Blueprint technique guides stay pattern-level (what to consider, how to trade off). Concrete DDL, DAG definitions, and environment-specific connection strings belong in docs/development/ or your pipeline repo, with ADRs for durable decisions.
Related blueprints: For end-to-end data discipline context, start at Big data & data engineering, then drill into architectures and technologies as needed.
Keep project-specific data architecture decisions in docs/adr/ and pipeline documentation in docs/development/, not in this file.