Architecture¶

Layer Overview¶

┌─────────────────────────────────────────────────┐
│                  ProblemDefinition              │
│  Variables, Objectives, Constraints, Scenarios  │
└──────────────────────┬──────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────┐
│               SurrogateManager                  │
│  Optuna HPO → Ensemble Selection → Conformal CP │
└──────────────────────┬──────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────┐
│                  Optimizer                      │
│  pymoo (DE / GA / NSGA-II / NSGA-III)           │
└──────────────────────┬──────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────┐
│                  Analysis                       │
│  Summary (auto) + Detail (lazy): SHAP, PDP, ... │
└─────────────────────────────────────────────────┘

Data Flow¶

ProblemDefinition is constructed and validated. It flows through every layer as an immutable reference.
BoundDataset binds a DataFrame to the problem, validating column presence, dtypes, bounds, and missing values.
SurrogateManager trains one ensemble per surrogate_column (deduplicated across objectives and data constraints). Each ensemble is an Optuna-selected set of XGBoost/LightGBM models with conformal calibration.
Optimizer evaluates candidates on surrogates, applies linear and data constraints, detects extrapolation, and returns Pareto-optimal points.
Analysis produces a summary automatically. The Analyzer object allows lazy, cached detail analyses.

Key Design Decisions¶

Decision	Rationale
Immutable models (frozen Pydantic)	Prevents accidental mutation across layers
One surrogate per unique column	Objectives and data constraints sharing a column reuse the same surrogate (ADR-007)
Conformal prediction	Distribution-free uncertainty intervals with coverage guarantees
Extrapolation detection	k-NN distance flags candidates outside the training domain
Lazy detail analysis	SHAP/PDP are expensive — computed only when requested and cached
Facade functions (`run`, `run_scenarios`)	Simple entry points hiding the layer orchestration