Handbook

ML techniques (blueprint)

Purpose: Catalog of machine learning techniques — algorithm families and deep-dive guides on features and evaluation. Each category describes the approach, common algorithms, trade-offs, and use cases.

Audience: Teams adopting Data science & machine learning; technique choices for a specific project are documented in experiment logs and model cards.

Technique selection in the ML workflow

Algorithm choice depends on problem type, data scale, interpretability, and latency. Feature engineering usually dominates early iteration quality; evaluation design (splits, metrics, fairness) decides whether you can trust comparisons. Use this folder to map where you are in the workflow to the right reference: encoding and stores before or alongside modeling; metrics and validation while comparing candidates; algorithm rows below when narrowing model families. Deep dives Feature engineering & feature stores and Model evaluation, selection & validation are intentionally tool-agnostic — bind specific libraries and experiment IDs in project docs, not in these blueprints.

Core knowledge: Data science & machine learning body of knowledge — ML lifecycle (CRISP-DM aligned), statistics, model evaluation, MLOps, responsible AI.

Category	Core idea	Common algorithms	Deep dive
Feature engineering	Transform raw inputs into model-ready signals; train/serve consistency	Encoding, scaling, temporal & text features, feature stores	Feature engineering & feature stores
Evaluation & validation	Metrics, splits, tuning, fairness, explainability, production tests	Cross-validation, hyperparameter search, A/B and shadow tests	Model evaluation, selection & validation
Supervised — classification	Learn to assign labels from labeled examples	Logistic regression, decision trees, random forests, gradient boosting (XGBoost, LightGBM), SVM, neural networks	DATA-SCIENCE.md §1–3
Supervised — regression	Learn to predict continuous values from labeled examples	Linear regression, polynomial regression, gradient boosting, neural networks	DATA-SCIENCE.md §1–3
Unsupervised — clustering	Discover natural groupings in unlabeled data	K-means, DBSCAN, hierarchical clustering, Gaussian mixture models	DATA-SCIENCE.md §1–3
Unsupervised — dimensionality reduction	Reduce feature space while preserving information	PCA, t-SNE, UMAP, autoencoders	—
Deep learning	Multi-layer neural networks for complex pattern recognition	CNNs (vision), RNNs/LSTMs (sequences), Transformers (language, multi-modal)	DATA-SCIENCE.md §1–3
Natural language processing	Understanding and generating human language	Transformer models (BERT, GPT), word embeddings, named entity recognition, sentiment analysis	—
Computer vision	Understanding and analyzing images and video	CNNs, object detection (YOLO, Faster R-CNN), image segmentation, generative models	—
Time series	Forecasting and anomaly detection in temporal data	ARIMA, Prophet, LSTM, temporal convolutional networks, state-space models	—
Recommender systems	Predicting user preferences for items	Collaborative filtering, content-based filtering, matrix factorization, neural collaborative filtering	DATA-SCIENCE.md §1–3
Reinforcement learning	Learning optimal actions through environment interaction	Q-learning, policy gradient, actor-critic, multi-armed bandits	—

Selection guidance: Technique selection depends on problem type, data volume, interpretability requirements, latency constraints, and team expertise. See Data science & machine learning body of knowledge §3 (model evaluation) for the evaluation framework.

Keep project-specific model documentation in docs/product/ and experiment logs in docs/development/, not in this file.