I build ML/AI tools for complex industrial applications and operational intelligence.
Contact
Energy grid operators face a critical daily decision that affects all New Yorkers: which plants should start tomorrow? When too many are turned on, wasted fuel and startup operations can be costly. Inversely, when too too few are turned on there is increased risk for emergency energy purchases 10x normal prices. Traditional forecasting methods hit 5-7% error rates, forcing operators to hedge with expensive backup units on an as-needed basis. I built an XGBoost model on real NYISO hourly data that achieves 1.7% MAPE with strong peak performance (2.3% on high-load hours), enabling confident unit commitment decisions worth $15-25M annually.
The model uses real NYISO hourly load data with peak-weighted training (high-demand hours get 3x importance in the loss function) and achieves performance competitive with best-in-class utility systems. This isn’t about perfect predictions, it’s about confidence intervals that matter for operations. With 1.7% accuracy on a 20 GW system, operators get ±340 MW confidence versus ±1,000 MW with traditional methods. That difference eliminates hedging decisions and unnecessary plant startups.
Key Metrics:
Forecast Dashboard: Historical validation, future projections, and performance breakdown
The model maintains strong performance across all hours with particularly good results during peak periods when accurate predictions matter most for capacity planning. Feature importance analysis shows recent lags (1h, 24h) drive 45% of predictions, with rolling averages and temporal patterns accounting for the rest. No signs of overfitting and consistent performance across weekday/weekend patterns.
Business Applications: Daily peak forecasting and accuracy by load level
Data: NYISO public hourly load data covering 480 hours of grid operations. Load range 15,884 - 23,817 MW representing real operational conditions.
Feature Engineering: Built 10 features focused on temporal patterns and historical context:
Model: XGBoost regressor (500 trees, depth 8) with peak-weighted training. Top 25% of load hours get 3x importance during training to ensure the model learns critical high-demand patterns. Bias correction applied from validation set, plus range adjustment to ensure predictions span realistic load variability.
Why XGBoost: Handles non-linear temporal patterns efficiently, works well with tabular time series, provides interpretable feature importance, and delivers production-ready performance. Considered LSTM but XGBoost proved more reliable and faster to iterate on this scale of data.
Better forecasting translates directly to better decisions. The primary application is day-ahead unit commitment where operators decide which plants to start 24 hours ahead. With 1.7% accuracy versus traditional 5%, you get ±340 MW confidence instead of ±1,000 MW. This eliminates hedging with expensive backup units.
Operational Value Breakdown:
As New York pushes toward 70% renewable energy by 2030, accurate load forecasting becomes even more critical. Solar and wind add variability to net load (demand minus renewables), making confident forecasting essential for grid reliability during the energy transition.
Next Steps: Current model uses historical load and temperature proxies. Adding real weather forecasts and probabilistic outputs (P10/P50/P90 confidence intervals) could improve to 1.2-1.5% MAPE. The methodology is production-ready with daily retraining, bias correction, and comprehensive validation. Timeline for deployment: 6-12 months including weather integration and API development.