Configuration¶

Each pipeline layer accepts an optional config object. Defaults work well for most cases, but all parameters can be tuned.

Full Example¶

result, analyzer = surrox.run(
    problem=problem,
    dataframe=df,
    surrogate_config=surrox.TrainingConfig(
        n_trials=100,
        ensemble_size=7,
        cv_folds=10,
        default_coverage=0.95,
    ),
    optimizer_config=surrox.OptimizerConfig(
        population_size=200,
        n_generations=500,
    ),
    analysis_config=surrox.AnalysisConfig(
        shap_background_size=200,
        pdp_grid_resolution=100,
    ),
)

TrainingConfig¶

Controls surrogate model training: HPO budget, ensemble construction, and conformal calibration.

Parameter	Default	Description
`n_trials`	50	Optuna HPO trials per target column
`cv_folds`	5	Cross-validation folds
`ensemble_size`	5	Maximum models in the ensemble
`calibration_fraction`	0.2	Data fraction held out for conformal calibration
`default_coverage`	0.9	Conformal prediction interval coverage
`study_timeout_s`	300	Optuna study timeout in seconds
`min_r2`	0.7	Minimum R² quality threshold (None to disable)
`random_seed`	42	Random seed

FeatureReductionConfig¶

Controls automatic feature reduction (importance screening + correlation grouping). Nested inside TrainingConfig.

Parameter	Default	Description
`enabled`	`True`	Enable automatic feature reduction
`importance_threshold`	0.01	Minimum relative importance to keep a feature (XGBoost-based screening)
`correlation_threshold`	0.9	Absolute correlation above which features are grouped via PCA

Feature reduction is skipped when there are fewer than 10 features or fewer than 100 samples. Features involved in monotonic constraints are never dropped or grouped.

surrox.TrainingConfig(
    feature_reduction=surrox.FeatureReductionConfig(
        enabled=True,
        importance_threshold=0.02,
        correlation_threshold=0.85,
    ),
)

See TrainingConfig for the full API.

OptimizerConfig¶

Controls the optimization strategy. The optimizer auto-selects between a global surrogate strategy (pymoo) for low-dimensional problems and a trust region strategy (TuRBO) for high-dimensional problems.

Parameter	Default	Description
`strategy`	`None`	`GLOBAL_SURROGATE`, `TRUST_REGION`, or `None` (auto-select based on `dim_threshold`)
`dim_threshold`	15	Decision variable count above which TuRBO is auto-selected
`population_size`	100	Population size for pymoo (global strategy only)
`n_generations`	200	Number of generations for pymoo (global strategy only)
`extrapolation_k`	5	k-NN neighbors for extrapolation detection
`extrapolation_threshold`	2.0	Distance threshold for extrapolation flag
`constraint_confidence`	0.95	Conformal confidence for constraint evaluation
`seed`	42	Random seed
`turbo`	`TuRBOConfig()`	TuRBO-specific configuration (trust region strategy only)

TuRBOConfig¶

Parameter	Default	Description
`n_initial`	`None`	Initial Sobol points (`None` = 2 × n_decision_variables)
`max_evaluations`	500	Total evaluation budget
`batch_size`	1	Candidates per iteration
`length_init`	0.8	Initial trust region side length in [0,1]^d
`length_min`	0.0078125	Minimum TR length before restart
`length_max`	1.6	Maximum TR length
`success_tolerance`	3	Consecutive successes before TR expansion
`failure_tolerance`	`None`	Consecutive failures before TR shrinkage (`None` = ceil(dim / batch_size))
`n_restarts`	3	Maximum TR restarts before termination

See OptimizerConfig for the full API.

AnalysisConfig¶

Controls the post-optimization analysis.

Parameter	Default	Description
`shap_background_size`	100	Background samples for SHAP
`pdp_grid_resolution`	50	Grid points for PDP/ICE
`pdp_percentiles`	(0.05, 0.95)	Grid range percentile bounds
`monotonicity_check_resolution`	50	Grid resolution for monotonicity checks
`random_seed`	42	Random seed

See AnalysisConfig for the full API.