Data science approaches (blueprint)

Purpose: Deeper, project-agnostic guides for data science methodologies and process frameworks. Each approach describes its structure, phases, and mapping to the lifecycle.

Audience: Teams adopting Data science & machine learning; project-specific ML decisions stay in docs/architecture/ and docs/adr/.

Methodology selection

Pick a process scaffold first (goals, data reality, deployment), then layer operations (MLOps) when models must live in production under change control. CRISP-DM fits most greenfield analytics and ML projects; MLOps is not a replacement for CRISP-DM but the engineering wrapper around deployment and monitoring. Experiment management and A/B testing address how you learn during development and how you prove impact in production — they complement, rather than substitute for, a phase model. If you are unsure where to start, read CRISP-DM: Cross-Industry Standard Process for Data Mining for phase definitions, then MLOps: Machine Learning Operations when the path to production is real rather than hypothetical.

Core knowledge: Data science & machine learning body of knowledge — ML lifecycle, statistics, evaluation, MLOps, responsible AI.

Bridge: Data Science ↔ SDLC ↔ PDLC bridge — how data science maps to delivery and product lifecycles.

Swimlane diagram template

The diagram is illustrative: in practice, experiment tracking starts early (often during Data Preparation / Modeling in CRISP-DM terms), and A/B tests may run only after a first deployment — but the dependencies (sound process → disciplined experiments → reliable production → measured impact) stay the same.

Approach Focus When to use Guide
CRISP-DM Six-phase cross-industry standard process for data mining and ML projects Default starting point for most ML projects; industry-proven; tool-agnostic CRISP-DM: Cross-Industry Standard Process for Data Mining
MLOps Operationalizing ML — CI/CD for models, automated retraining, model monitoring When ML models run in production and require ongoing management MLOps: Machine Learning Operations
Experiment management Systematic approach to running, tracking, and analyzing ML experiments During model development; comparing approaches; hyperparameter optimization DATA-SCIENCE.md §3
A/B testing Controlled online experimentation to measure model impact on product metrics When validating model impact in production; PDLC P5 outcome measurement DATA-SCIENCE.md §3

Related techniques: For encoding, feature stores, and evaluation metrics used inside these phases, see Feature engineering & feature stores and Model evaluation, selection & validation.

Keep project-specific model documentation in docs/product/ and experiment logs in docs/development/, not in this file.