--- command: ml:hyperparameter-tune plugin: ml category: ml-operations description: Automated hyperparameter optimization with Bayesian methods tags: - ml - hyperparameter-tuning - optimization - optuna - ray-tune - @python-backend-expert - Read - Write - Bash usage: | /ml:hyperparameter-tune --model xgboost --strategy bayesian --trials 100 --metric f1_score examples: - input: /ml:hyperparameter-tune --model RandomForest --strategy grid --cv 5 description: Grid search with 5-fold CV - input: /ml:hyperparameter-tune --strategy optuna --trials 200 --distributed ray description: Distributed Bayesian optimization with Optuna - input: /ml:hyperparameter-tune --strategy random --trials 50 --early-stopping description: Random search with early stopping - input: /ml:hyperparameter-tune --multi-objective --optimize accuracy,latency description: Multi-objective optimization for accuracy and latency --- # ml:hyperparameter-tune Automated hyperparameter optimization with multiple search strategies, distributed tuning, and experiment tracking. ## Description Comprehensive hyperparameter tuning system supporting: - **Search Strategies**: Grid, Random, Bayesian (Optuna), TPE, Hyperband - **Distributed Tuning**: Ray Tune with parallel trial execution - **Multi-Objective**: Optimize multiple metrics simultaneously (Pareto optimization) - **Early Stopping**: Median pruning, Hyperband, ASHA schedulers - **Experiment Tracking**: MLflow, Weights & Biases integration - **Cross-Validation**: K-fold, stratified, time-series splits - **Resource Management**: GPU allocation, concurrent trial limits - **Visualization**: Optimization history, parameter importance, contour plots ## Required Documentation Access **MANDATORY:** Before hyperparameter tuning, query Context7 for optimization best practices: **Documentation Queries:** - `mcp://context7/optuna/optimization` - Optuna Bayesian optimization, TPE sampler, pruning strategies - `mcp://context7/ray-tune/tuning` - Ray Tune distributed tuning, ASHA/Hyperband schedulers - `mcp://context7/scikit-learn/grid-search` - GridSearchCV, RandomizedSearchCV patterns - `mcp://context7/hyperopt/bayesian-optimization` - Bayesian optimization with Tree-structured Parzen Estimator - `mcp://context7/wandb/sweeps` - Weights & Biases hyperparameter sweeps and visualization **Why This is Required:** - Ensures optimization strategies follow official documentation - Applies latest Optuna and Ray Tune patterns for efficient tuning - Validates parameter space definitions and search algorithms - Prevents inefficient search strategies and resource waste - Confirms proper early stopping and pruning configurations **WARNING:** This command was implemented without Context7 MCP verification due to MCP unavailability. All patterns should be re-verified against Context7 documentation before production use. ## Usage ```bash /ml:hyperparameter-tune [options] ``` ## Options ### Search Strategy - `--strategy ` - Optimization strategy (default: optuna) - `--trials ` - Number of trials to run (default: 100) - `--timeout ` - Maximum optimization time ### Parameter Space - `--config ` - Parameter space configuration file (JSON/YAML) - `--param ` - Define parameter inline - Example: `--param "learning_rate=continuous:0.001-0.1:log"` - Example: `--param "n_estimators=discrete:50-500:50"` - Example: `--param "optimizer=categorical:adam,sgd,rmsprop"` ### Model & Data - `--model ` - Model type (sklearn, xgboost, pytorch, tensorflow) - `--cv ` - K-fold cross-validation (default: 5) - `--metric ` - Optimization metric (accuracy, f1_score, etc.) - `--data ` - Path to training data ### Optimization Features - `--multi-objective` - Enable multi-objective optimization - `--optimize ` - Comma-separated metrics for multi-objective - `--directions ` - Optimization directions (maximize, minimize) - `--early-stopping` - Enable early stopping - `--pruner ` - Pruning strategy ### Distributed Tuning - `--distributed` - Enable distributed tuning with Ray - `--workers ` - Number of parallel workers - `--resources-per-trial ` - Resources per trial (e.g., "cpu:2,gpu:1") - `--scheduler ` - Ray Tune scheduler ### Experiment Tracking - `--track ` - Experiment tracking backend - `--experiment ` - Experiment name - `--project ` - Project name (for Weights & Biases) - `--save-history ` - Save trial history to file ### Visualization - `--visualize` - Generate optimization visualizations - `--plot ` - Plot type (history, importance, contour, slice) - `--output ` - Output directory for results ### Advanced - `--sampler ` - Optuna sampler (default: TPE) - `--n-startup-trials ` - Number of random trials before Bayesian (default: 10) - `--n-warmup-steps ` - Warmup steps for pruning (default: 5) - `--ignore-failures` - Continue optimization if trials fail - `--checkpoint-freq ` - Checkpoint frequency for Ray Tune - `--resume ` - Resume from checkpoint or study ## Examples ### 1. Grid Search with Cross-Validation ```bash /ml:hyperparameter-tune \ --model RandomForest \ --strategy grid \ --cv 5 \ --param "n_estimators=discrete:100-500:100" \ --param "max_depth=discrete:5,10,15,20,None" \ --param "min_samples_split=discrete:2,5,10" \ --metric accuracy ``` **Output:** ``` 🔍 Grid Search Hyperparameter Tuning ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Parameter Space: • n_estimators: [100, 200, 300, 400, 500] • max_depth: [5, 10, 15, 20, None] • min_samples_split: [2, 5, 10] Total combinations: 75 Cross-validation folds: 5 Total fits: 375 Running grid search... [====================] 100% (375/375) ETA: 0s ✅ Best Parameters: • n_estimators: 300 • max_depth: 15 • min_samples_split: 2 📊 Best Score: 0.9245 ± 0.0123 (5-fold CV) All results saved to: grid_search_results.json ``` ### 2. Bayesian Optimization with Optuna ```bash /ml:hyperparameter-tune \ --strategy optuna \ --trials 200 \ --param "learning_rate=continuous:0.001-0.1:log" \ --param "n_layers=discrete:1-10" \ --param "dropout=continuous:0.1-0.5" \ --pruner median \ --early-stopping \ --visualize \ --track mlflow \ --experiment neural_network_tuning ``` **Output:** ``` 🎯 Optuna Bayesian Optimization (TPE Sampler) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Study: neural_network_tuning Sampler: Tree-structured Parzen Estimator Pruner: Median (n_warmup_steps=5) Trials: 200 Parameter Space: • learning_rate: [0.001, 0.1] (log scale) • n_layers: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} • dropout: [0.1, 0.5] Running optimization... Trial 000 [Complete] value: 0.7234 params: {lr: 0.0523, n_layers: 3, dropout: 0.32} Trial 001 [Complete] value: 0.7891 params: {lr: 0.0123, n_layers: 5, dropout: 0.25} Trial 002 [Pruned] intermediate: 0.6543 at step 10 Trial 003 [Complete] value: 0.8234 params: {lr: 0.0087, n_layers: 4, dropout: 0.28} ... Trial 199 [Complete] value: 0.9156 params: {lr: 0.0091, n_layers: 6, dropout: 0.23} [====================] 100% (200/200) Completed: 165 | Pruned: 35 ✅ Best Trial (#187): • learning_rate: 0.0091 • n_layers: 6 • dropout: 0.23 • value: 0.9156 📊 Parameter Importance: 1. learning_rate: 0.542 2. n_layers: 0.321 3. dropout: 0.137 📈 Visualizations saved: • optimization_history.png • param_importance.png • parallel_coordinate.png • contour_plot.png 🔬 MLflow Experiment: • Experiment ID: 12 • Run ID: a7f3b2c1 • Artifacts logged: 15 ``` ### 3. Distributed Tuning with Ray Tune ```bash /ml:hyperparameter-tune \ --strategy ray \ --distributed \ --workers 8 \ --trials 500 \ --scheduler asha \ --resources-per-trial "cpu:2,gpu:0.5" \ --param "learning_rate=continuous:0.0001-0.1:log" \ --param "batch_size=categorical:16,32,64,128" \ --checkpoint-freq 10 \ --track wandb \ --project deep-learning-tuning ``` **Output:** ``` ⚡ Ray Tune Distributed Hyperparameter Tuning ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Configuration: • Workers: 8 • Trials: 500 • Scheduler: ASHA (Async Successive Halving) • Resources per trial: CPU=2, GPU=0.5 • Max concurrent: 16 trials Parameter Space: • learning_rate: LogUniform(0.0001, 0.1) • batch_size: Choice([16, 32, 64, 128]) Initializing Ray cluster... ✓ Connected to 8 workers Running ASHA optimization... [Trial 001] accuracy=0.7234 | resource=1 [Trial 002] accuracy=0.7456 | resource=1 [Trial 003] accuracy=0.8123 | resource=1 [PROMOTED to resource=3] [Trial 004] accuracy=0.6891 | resource=1 [STOPPED] ... Current Best Trials: 1. Trial 342: accuracy=0.9234 (lr=0.0087, bs=64) [resource=81] 2. Trial 287: accuracy=0.9187 (lr=0.0123, bs=32) [resource=81] 3. Trial 156: accuracy=0.9145 (lr=0.0091, bs=64) [resource=27] Progress: [====================] 500/500 trials • Completed: 500 • Stopped early: 324 • Failed: 0 • Running: 0 ✅ Best Configuration: • learning_rate: 0.0087 • batch_size: 64 • accuracy: 0.9234 ⏱️ Time saved: 67% (early stopping) 💾 Checkpoints: 50 saved to ray_results/ 🌐 Weights & Biases: • Project: deep-learning-tuning • Run: asha-tuning-2025-10-21 • URL: https://wandb.ai/team/deep-learning-tuning/runs/xyz123 ``` ### 4. Multi-Objective Optimization ```bash /ml:hyperparameter-tune \ --multi-objective \ --optimize "accuracy,inference_time" \ --directions "maximize,minimize" \ --strategy optuna \ --trials 300 \ --param "n_layers=discrete:1-10" \ --param "units_per_layer=discrete:32-512:32" \ --visualize ``` **Output:** ``` 🎯 Multi-Objective Optimization (Pareto Front) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Objectives: • accuracy (maximize) • inference_time (minimize) Parameter Space: • n_layers: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} • units_per_layer: {32, 64, 96, ..., 512} Running multi-objective optimization... Trial 000: accuracy=0.7234, latency=45ms Trial 001: accuracy=0.8123, latency=78ms Trial 002: accuracy=0.7891, latency=52ms ... Trial 299: accuracy=0.9156, latency=123ms [====================] 100% (300/300) ✅ Pareto Front (12 optimal solutions): ┌─────────┬──────────┬────────────┬──────────┬─────────────────┐ │ Trial # │ Accuracy │ Latency │ n_layers │ units_per_layer │ ├─────────┼──────────┼────────────┼──────────┼─────────────────┤ │ 287 │ 0.9234 │ 134ms │ 8 │ 512 │ │ 245 │ 0.9187 │ 98ms │ 6 │ 384 │ │ 198 │ 0.9045 │ 67ms │ 4 │ 256 │ │ 156 │ 0.8823 │ 52ms │ 3 │ 192 │ │ 89 │ 0.8456 │ 41ms │ 2 │ 128 │ │ 34 │ 0.7891 │ 28ms │ 1 │ 64 │ └─────────┴──────────┴────────────┴──────────┴─────────────────┘ 📊 Trade-off Analysis: • High accuracy model: 0.9234 acc @ 134ms • Balanced model: 0.9045 acc @ 67ms • Fast model: 0.7891 acc @ 28ms 📈 Pareto front visualization saved: pareto_front.png ``` ### 5. Random Search with Early Stopping ```bash /ml:hyperparameter-tune \ --strategy random \ --trials 500 \ --early-stopping \ --param "learning_rate=continuous:0.001-0.1:log" \ --param "n_estimators=discrete:50-1000" \ --param "max_depth=discrete:3-20" \ --metric f1_score \ --cv 10 \ --save-history results/random_search_history.csv ``` **Output:** ``` 🎲 Random Search with Early Stopping ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Parameter Distributions: • learning_rate: LogUniform(0.001, 0.1) • n_estimators: RandInt(50, 1000) • max_depth: RandInt(3, 20) Early Stopping: • Patience: 20 trials • Min improvement: 0.001 Cross-validation: 10-fold stratified Metric: f1_score Running random search... Trial 001: f1=0.7234 (lr=0.0523, n_est=234, depth=12) Trial 002: f1=0.7891 (lr=0.0123, n_est=567, depth=8) Trial 003: f1=0.8234 (lr=0.0087, n_est=789, depth=10) ⭐ NEW BEST ... Trial 087: f1=0.9123 (lr=0.0091, n_est=845, depth=9) ⭐ NEW BEST [====================] Stopped at trial 107/500 (early stopping triggered) ✅ Best Parameters: • learning_rate: 0.0091 • n_estimators: 845 • max_depth: 9 📊 Best Score: 0.9123 ± 0.0156 (10-fold CV) ⏱️ Trials evaluated: 107 ⚡ Time saved: 79% (early stopping) 📁 Results saved: • Best params: results/best_params.json • Trial history: results/random_search_history.csv • CV scores: results/cv_scores.csv ``` ### 6. XGBoost Hyperparameter Tuning ```bash /ml:hyperparameter-tune \ --model xgboost \ --strategy optuna \ --trials 200 \ --param "learning_rate=continuous:0.01-0.3:log" \ --param "max_depth=discrete:3-10" \ --param "min_child_weight=discrete:1-10" \ --param "subsample=continuous:0.5-1.0" \ --param "colsample_bytree=continuous:0.5-1.0" \ --param "gamma=continuous:0-5" \ --metric auc \ --pruner hyperband \ --track mlflow ``` ### 7. Resume from Checkpoint ```bash # Resume previous Optuna study /ml:hyperparameter-tune \ --strategy optuna \ --resume sqlite:///optuna_study.db \ --trials 100 \ --visualize # Resume Ray Tune experiment /ml:hyperparameter-tune \ --strategy ray \ --resume ray_results/experiment_2025-10-21/ \ --trials 200 ``` ## Parameter Space Configuration File You can define complex parameter spaces in a YAML or JSON file: **config.yaml:** ```yaml parameters: # Continuous parameters learning_rate: type: continuous min: 0.0001 max: 0.1 log: true dropout: type: continuous min: 0.1 max: 0.5 # Discrete parameters n_layers: type: discrete min: 1 max: 10 n_estimators: type: discrete values: [100, 200, 500, 1000] # Categorical parameters optimizer: type: categorical choices: [adam, sgd, rmsprop, adamw] activation: type: categorical choices: [relu, tanh, sigmoid, gelu] # Conditional parameters momentum: type: continuous min: 0.5 max: 0.99 condition: parameter: optimizer value: sgd optimization: strategy: optuna n_trials: 200 sampler: TPE pruner: median n_startup_trials: 10 n_warmup_steps: 5 cross_validation: n_folds: 5 shuffle: true stratified: true tracking: backend: mlflow experiment_name: my_experiment log_artifacts: true visualization: enabled: true plots: - history - importance - parallel_coordinate - contour ``` **Usage:** ```bash /ml:hyperparameter-tune --config config.yaml --data train.csv --metric accuracy ``` ## Optimization Patterns ### 1. Optuna Bayesian Optimization **Pattern from Context7 (/optuna/optuna):** ```python import optuna from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score def objective(trial): # Define parameter space params = { 'n_estimators': trial.suggest_int('n_estimators', 50, 500), 'max_depth': trial.suggest_int('max_depth', 3, 20), 'min_samples_split': trial.suggest_int('min_samples_split', 2, 10), 'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10) } # Train and evaluate model = RandomForestClassifier(**params, random_state=42) score = cross_val_score(model, X_train, y_train, cv=5, scoring='f1_macro') return score.mean() # Create study and optimize study = optuna.create_study(direction='maximize', sampler=optuna.samplers.TPESampler()) study.optimize(objective, n_trials=200) print(f"Best params: {study.best_params}") print(f"Best score: {study.best_value}") ``` ### 2. Optuna with Pruning ```python def objective_with_pruning(trial): params = { 'learning_rate': trial.suggest_float('learning_rate', 0.001, 0.1, log=True), 'n_layers': trial.suggest_int('n_layers', 1, 5) } model = create_model(**params) for epoch in range(100): # Train for one epoch train_loss = train_epoch(model) val_accuracy = validate(model) # Report intermediate value trial.report(val_accuracy, epoch) # Prune if performance is poor if trial.should_prune(): raise optuna.TrialPruned() return val_accuracy # Study with median pruner study = optuna.create_study( direction='maximize', pruner=optuna.pruners.MedianPruner(n_startup_trials=10, n_warmup_steps=5) ) study.optimize(objective_with_pruning, n_trials=100) ``` ### 3. Ray Tune with ASHA **Pattern from Context7 (/ray-project/ray):** ```python from ray import tune from ray.tune.schedulers import ASHAScheduler def trainable(config): model = create_model(config) for epoch in range(100): train_loss = train_epoch(model, config) val_accuracy = validate(model) # Report metrics to Ray Tune tune.report(accuracy=val_accuracy, loss=train_loss) # Define search space config = { 'learning_rate': tune.loguniform(0.001, 0.1), 'batch_size': tune.choice([16, 32, 64, 128]), 'n_layers': tune.randint(1, 10) } # ASHA scheduler for early stopping scheduler = ASHAScheduler( max_t=100, grace_period=10, reduction_factor=3 ) # Run tuning analysis = tune.run( trainable, config=config, num_samples=100, scheduler=scheduler, resources_per_trial={'cpu': 2, 'gpu': 0.5} ) print(f"Best config: {analysis.best_config}") ``` ### 4. Multi-Objective with Optuna ```python def multi_objective(trial): params = { 'n_layers': trial.suggest_int('n_layers', 1, 10), 'units': trial.suggest_int('units', 32, 512) } model = create_model(**params) model.train() accuracy = evaluate_accuracy(model) latency = measure_inference_time(model) return accuracy, latency # Multi-objective study study = optuna.create_study(directions=['maximize', 'minimize']) study.optimize(multi_objective, n_trials=300) # Get Pareto front pareto_trials = [t for t in study.best_trials] for trial in pareto_trials: print(f"Accuracy: {trial.values[0]:.4f}, Latency: {trial.values[1]:.1f}ms") ``` ## Best Practices ### 1. **Choose the Right Strategy** - **Grid Search**: Small, discrete parameter spaces (< 100 combinations) - **Random Search**: Medium-sized spaces, good baseline (100-500 trials) - **Bayesian (Optuna)**: Complex spaces, limited budget (50-200 trials) - **Ray Tune + ASHA**: Large-scale distributed tuning (500+ trials) ### 2. **Define Parameter Ranges Wisely** ```python # ✅ GOOD: Use log scale for learning rates learning_rate: { type: 'continuous', min: 0.0001, max: 0.1, log: true } # ❌ BAD: Linear scale for learning rates learning_rate: { type: 'continuous', min: 0.0001, max: 0.1 } # ✅ GOOD: Reasonable discrete steps n_estimators: { type: 'discrete', min: 50, max: 500, step: 50 } # ❌ BAD: Too fine-grained for discrete n_estimators: { type: 'discrete', min: 50, max: 500, step: 1 } ``` ### 3. **Use Early Stopping** - Median pruning for Optuna (prune bottom 50%) - ASHA for Ray Tune (successive halving) - Custom patience for simple early stopping ### 4. **Cross-Validation Strategy** - 5-fold for most cases - 10-fold for small datasets - Time-series split for temporal data - Stratified for imbalanced datasets ### 5. **Track Everything** - Use MLflow or Weights & Biases - Save parameter history for reproducibility - Visualize optimization progress - Monitor resource utilization ### 6. **Multi-Objective Considerations** - Define clear trade-offs (accuracy vs latency, accuracy vs model size) - Use Pareto front to select final model - Consider business constraints (max latency, max memory) ## Integration with ML Frameworks ### Scikit-learn ```bash /ml:hyperparameter-tune \ --model sklearn.ensemble.RandomForestClassifier \ --config sklearn_params.yaml ``` ### XGBoost ```bash /ml:hyperparameter-tune \ --model xgboost \ --param "learning_rate=continuous:0.01-0.3:log" \ --param "max_depth=discrete:3-10" ``` ### LightGBM ```bash /ml:hyperparameter-tune \ --model lightgbm \ --param "num_leaves=discrete:20-100" \ --param "learning_rate=continuous:0.01-0.3:log" ``` ### PyTorch ```bash /ml:hyperparameter-tune \ --model pytorch \ --strategy ray \ --distributed \ --resources-per-trial "cpu:4,gpu:1" ``` ### TensorFlow/Keras ```bash /ml:hyperparameter-tune \ --model tensorflow \ --param "learning_rate=continuous:0.0001-0.01:log" \ --param "units=categorical:128,256,512" ``` ## Output Files After optimization, the following files are generated: ``` output/ ├── best_params.json # Best hyperparameters ├── trial_history.csv # All trial results ├── cv_scores.csv # Cross-validation scores ├── optimization_history.png # Optimization progress plot ├── param_importance.png # Parameter importance plot ├── parallel_coordinate.png # Parallel coordinate plot ├── contour_plot.png # 2D contour plots ├── pareto_front.png # Pareto front (multi-objective) └── summary.json # Optimization summary ``` ## See Also - `/ml:train-optimize` - Optimize entire training pipeline - `/ml:automl` - Automated machine learning with AutoGluon - `@gradient-boosting-expert` - XGBoost/LightGBM tuning specialist - `@scikit-learn-expert` - Scikit-learn optimization patterns - `@pytorch-expert` - PyTorch hyperparameter tuning ## References - [Optuna Documentation](https://optuna.readthedocs.io/) - [Ray Tune Documentation](https://docs.ray.io/en/latest/tune/) - [Scikit-learn Model Selection](https://scikit-learn.org/stable/model_selection.html) - [Hyperopt Documentation](http://hyperopt.github.io/hyperopt/) - [Weights & Biases Sweeps](https://docs.wandb.ai/guides/sweeps)