Data engineering techniques (blueprint)

Purpose: Deep, project-agnostic guides for data engineering techniques. Each guide covers principles, implementation patterns, trade-offs, and tooling so teams can reuse patterns without copying project-specific schemas…

Audience: Teams adopting Big data & data engineering; project-specific data configuration stays in docs/product/data/ where that convention exists.

Techniques here complement architecture and product docs: modeling and quality rules apply regardless of whether you run Lambda, Kappa, or a lakehouse. Planned guides will extend pipeline and validation topics without duplicating Big data & data engineering body of knowledge.


Technique Focus Guide
Operational data modeling Relational modeling (normalization, ER), indexing strategies, query plan analysis, transaction isolation, schema migration, NoSQL modeling (document, key-value, graph, time-series), polyglot persistence Operational data modeling (blueprint)
Pipeline patterns ETL/ELT, CDC, idempotency, orchestration integration BIGDATA.md §3
Data quality Validation layers, testing data pipelines, observability for freshness and volume BIGDATA.md §2

How to use this section: Read Operational data modeling (blueprint) when designing schemas, choosing indexes, or comparing relational vs NoSQL shapes. When pipeline-pattern and data-quality guides exist, use them alongside ADRs for project-specific pipeline layout and SLAs.

Suggested reading order for new practitioners:

  1. Big data & data engineering body of knowledge §1–2 for vocabulary and governance framing.
  2. Operational data modeling (blueprint) before locking physical schemas.
  3. Data architectures (blueprint) when choosing batch vs stream vs mesh posture.

Core knowledge: Big data & data engineering body of knowledge — data engineering principles, governance, pipeline patterns, DataOps.

Bridge: Big Data ↔ SDLC ↔ PDLC bridge — how data engineering maps to delivery and product lifecycles.

Architectures & engines: Data architectures (blueprint), Data technologies (blueprint) (including Data processing engines & platforms).

Cross-reference: Big data & data engineering body of knowledge §2 (governance and quality dimensions) and §5 (competencies) align with the techniques catalog; mesh-specific ownership is covered in Data mesh: domain-oriented decentralized architecture.

Scope boundary: Blueprint technique guides stay pattern-level (what to consider, how to trade off). Concrete DDL, DAG definitions, and environment-specific connection strings belong in docs/development/ or your pipeline repo, with ADRs for durable decisions.

Related blueprints: For end-to-end data discipline context, start at Big data & data engineering, then drill into architectures and technologies as needed.


Keep project-specific data architecture decisions in docs/adr/ and pipeline documentation in docs/development/, not in this file.