Architecture
Layer Overview
┌─────────────────────────────────────────────────┐
│ ProblemDefinition │
│ Variables, Objectives, Constraints, Scenarios │
└──────────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ SurrogateManager │
│ Optuna HPO → Ensemble Selection → Conformal CP │
└──────────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Optimizer │
│ pymoo (DE / GA / NSGA-II / NSGA-III) │
└──────────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Analysis │
│ Summary (auto) + Detail (lazy): SHAP, PDP, ... │
└─────────────────────────────────────────────────┘
Data Flow
- ProblemDefinition is constructed and validated. It flows through every layer as an immutable reference.
- BoundDataset binds a DataFrame to the problem, validating column presence, dtypes, bounds, and missing values.
- SurrogateManager trains one ensemble per
surrogate_column (deduplicated across objectives and data constraints). Each ensemble is an Optuna-selected set of XGBoost/LightGBM models with conformal calibration.
- Optimizer evaluates candidates on surrogates, applies linear and data constraints, detects extrapolation, and returns Pareto-optimal points.
- Analysis produces a summary automatically. The Analyzer object allows lazy, cached detail analyses.
Key Design Decisions
| Decision |
Rationale |
| Immutable models (frozen Pydantic) |
Prevents accidental mutation across layers |
| One surrogate per unique column |
Objectives and data constraints sharing a column reuse the same surrogate (ADR-007) |
| Conformal prediction |
Distribution-free uncertainty intervals with coverage guarantees |
| Extrapolation detection |
k-NN distance flags candidates outside the training domain |
| Lazy detail analysis |
SHAP/PDP are expensive — computed only when requested and cached |
Facade functions (run, run_scenarios) |
Simple entry points hiding the layer orchestration |