Back to Case Studies
Energy Infrastructure

Rescuing Britain's Energy Forecasting: From 18-Month Failure to OfGem Milestone Success

We rescued NESO's failing energy forecasting platform after 18 months of no progress, mobilising a 3-person QCE pod to deliver a modern Azure ML platform that achieved 92% cost reduction, 99% faster provisioning, and enabled daily model retraining processing 40TB+ of weather and energy data - meeting critical OfGem regulatory deadlines.

Published on: April 1, 2023Last Updated: August 22, 202510 min read

PISR: Problem, Impact, Solution, Result

  • Problem: NESO's critical Platform Energy Forecasting (PEF) project was at severe risk of missing a crucial OfGem regulatory milestone after 18 months of failed delivery. The original delivery partner had developed nothing functional on Azure, leaving NESO unable to build and run essential machine learning models for renewable energy forecasting. Legacy models ran on expensive, shared on-premise infrastructure with rigid data pipelines and manual processes that created bottlenecks and single points of failure.

  • Business Impact: Missing the OfGem deadline would result in significant regulatory penalties and compromise NESO's ability to manage Britain's renewable energy transition. The failed programme had consumed 18 months with zero functional delivery whilst burning substantial Azure costs. Data scientists remained constrained by legacy systems that limited model training frequency, created data drift risks, and prevented the advanced ML techniques needed for accurate renewable energy forecasting.

  • Our Solution: ClearRoute mobilised a 3-person Quality Cloud Engineer enablement pod over 12 months to rescue and rebuild the programme. We restructured the team approach, implemented modern engineering practices with clear metrics, and delivered a complete Azure ML platform with microservices-based data ingestion. The solution enabled daily model retraining processing 40TB+ of weather and energy data (including first-ever access to Met Office GRIB format data), automated testing and infrastructure, and modern MLOps practices.

  • Tangible Result: We met the critical OfGem milestone whilst achieving a 92% reduction in monthly Azure costs and 99% faster platform provisioning. The new platform processes 40TB+ of weather and energy data (a 160x increase from legacy capacity), enables daily model retraining, forecast output revised every 30 minutes, and dramatically improved ML capabilities for renewable energy forecasting. The transformation established a foundation for NESO's ongoing energy forecasting modernisation essential to Britain's net-zero goals.


The Challenge

Business & Client Context

  • Primary Business Goal: Meet critical OfGem regulatory milestone whilst modernising NESO's energy forecasting capabilities to support Britain's transition to renewable energy through advanced machine learning models and automated forecasting systems.
  • Pressures: Critical OfGem deadline with potential regulatory penalties for non-compliance, 18 months of failed delivery burning significant budget, and urgent need to replace legacy ML infrastructure that couldn't support the advanced forecasting required for increasing renewable energy integration.
  • Technology Maturity: Legacy models running on expensive shared on-premise servers, manual testing strategies, no CI/CD pipelines, rigid data ingestion processes creating bottlenecks, and absence of MLOps practices essential for modern energy forecasting.

Current State Assessment: Key Pain Points

  • Project Management Dysfunction: Poor backlog structure with no focus on business value, requirements analysis happening within sprints creating ambiguity, lack of agile maturity with no progress metrics, and scope confusion due to changing solution architecture approaches.

  • Engineering Capability Gaps: No defined build pipelines or CI/CD approach, manually driven test strategy without automation consideration, incomplete end-to-end system design impacting infrastructure decisions, lack of pre-production environments, and complete absence of DevSecOps mindset.

  • Legacy ML Infrastructure Limitations: Models running on very expensive shared on-premise servers limiting team access, constrained model training and inference processes preventing rapid iteration, hard-set data ingestion pipelines acting as triggers with high failure risk, and inability to process diverse data sources essential for accurate renewable energy forecasting (limited to basic flat files totaling under 250GB versus the 40TB+ of advanced weather data needed for modern forecasting).

Baseline Metrics (Where Available)

Metric CategoryBaselineNotes
OfGem Milestone RiskCritical failure risk18 months with no functional delivery
Azure Monthly CostsHigh baseline costsInefficient cloud resource utilisation
Platform Provisioning TimeManual, slow processNo automation or standardisation
ML Model Training FrequencyInfrequent, manualLegacy infrastructure constraints
Data Processing Capacity~250GB flat files onlyNo access to GRIB weather model data
Team VelocityLow with oversized teamPoor structure and practices

Solution Overview

Engagement Strategy & Phases

Phase 1: Assessment & Clean Slate (Months 1-2): Assessed the failed delivery approach and made the decision to start fresh rather than attempt to salvage fundamentally flawed architecture. Established new team structure focusing on cross-functional capabilities rather than role-based silos.

Phase 2: Azure ML Platform Foundation (Months 3-8): Implemented modern Azure ML platform with microservices-based data ingestion, established full CI/CD pipelines, deployed Infrastructure as Code using Azure Bicep, and introduced automated testing strategies with DevSecOps practices.

Phase 3: MLOps & Production Delivery (Months 9-12): Delivered complete MLOps capability enabling daily model retraining, integrated with Met Office's advanced weather prediction APIs (GRIB format - first time NESO had access to raw numerical weather model data), and achieved production readiness to meet the critical OfGem milestone.

Architectural Overview

Before State: The Approach

After State: ClearRoute Azure ML Platform

Data Processing Scale Transformation

The platform transformation enabled NESO to process diverse energy and weather data sources at unprecedented scale over a 12-month period:

Data SourceVolumeFiles ProcessedSignificance
Met Office API (GRIB)40 TB45.4 millionRaw numerical weather model data - wind speeds at multiple altitudes, solar radiation, pressure systems. First time NESO accessed this format, enabling precise renewable generation forecasting
ECMWF2 TB280,000European weather model data providing ensemble forecasts for uncertainty quantification
Met Office Flat Files192 GB38,000Traditional processed weather data (legacy format)
Elexon Flat Files32 GB68,000GB electricity market data - actual generation, demand patterns
Sheffield Solar14 GB240,000UK solar generation actuals and estimates
Elexon BOA4 GB40,000Balancing and settlement data for grid operations

This represents a 160x increase in data processing capability (from ~250GB legacy capacity to 40TB+ of advanced weather and energy data), with the most critical advancement being access to GRIB format weather data - the same raw numerical model outputs that meteorologists use for weather prediction, now applied to renewable energy forecasting.

QCE Disciplines Applied

  • Platform Engineering: Delivered a comprehensive Azure ML platform with microservices-based data ingestion capable of processing 40TB+ of weather and energy data (including 45 million GRIB files), replacing expensive on-premise infrastructure with cost-optimised cloud solutions that achieved 92% monthly cost reduction.

  • Quality Engineering: Implemented MLOps practices with automated testing, model validation, and deployment gates, ensuring reliable daily model retraining and 30-minute forecast output cycles essential for accurate renewable energy forecasting.

  • Developer Experience: Established complete CI/CD pipelines with Azure DevOps, Infrastructure as Code, and self-service capabilities that enabled data scientists to focus on model development rather than infrastructure management, dramatically improving productivity with a properly structured team.


The Results: Measurable & Stakeholder-Centric Impact

Headline Success Metrics

MetricBefore EngagementAfter EngagementImprovement
OfGem Milestone AchievementCritical failure riskMilestone met on timeProject rescued
Monthly Azure CostsHigh baseline spendOptimised cloud resources92% reduction
Platform Provisioning TimeManual, slow processAutomated provisioning99% faster
Data Processing Scale~250GB flat files40TB+ including GRIB weather models160x increase
ML Model Training FrequencyInfrequent, manualDaily automated retrainingContinuous improvement
Team StructureOversized, dysfunctionalRight-sized, cross-functionalEffective delivery

Value Delivered by Stakeholder

  • For the Programme Manager:

    • Rescued a critical programme at risk of missing OfGem regulatory deadlines after 18 months of failed delivery, proving the project was achievable with proper approach and team structure.
    • Achieved 92% reduction in monthly Azure costs whilst delivering significantly improved capabilities, demonstrating that efficiency and performance improvements could be delivered simultaneously.
    • Met the critical OfGem milestone, avoiding regulatory penalties and enabling NESO's renewable energy forecasting modernisation.
  • For Data Scientists and Forecasting Teams:

    • Delivered modern Azure ML platform enabling daily model retraining with 40TB+ of diverse weather and energy data (a 160x increase from legacy ~250GB capacity), compared to infrequent manual processes on shared legacy infrastructure.
    • Enabled use of cutting-edge ML techniques and automated model validation, dramatically improving forecast accuracy and reducing data drift risks that had plagued legacy systems.
    • Provided self-service ML capabilities with 30-minute forecast output cycles, allowing rapid iteration and experimentation essential for improving renewable energy forecasting accuracy.
  • For NESO Leadership and Operations:

    • Achieved OfGem regulatory compliance whilst establishing a modern, cost-efficient ML platform that supports Britain's renewable energy transition requirements.
    • Demonstrated that proper engineering practices and team structure deliver more value than oversized, poorly structured teams.
    • Established MLOps foundation with automated testing, deployment, and monitoring that ensures reliable, continuous improvement of forecasting capabilities essential for grid stability with increasing renewable penetration.

Key Technical Achievements

  • MLOps Platform Delivery: Successfully delivered comprehensive Azure ML platform with daily model retraining capabilities, processing 40TB+ of weather and energy data across 46 million files, and 30-minute forecast output cycles that transformed NESO's forecasting capabilities.
  • Cost Optimisation Achievement: Achieved 92% reduction in monthly Azure costs whilst dramatically improving platform capabilities, proving that modern cloud-native approaches deliver both performance and efficiency gains.
  • Regulatory Compliance Success: Met critical OfGem milestone after 18 months of failed delivery, avoiding regulatory penalties and enabling NESO's renewable energy transition support.

Lessons, Patterns & Future State

What Worked Well

  • Clean Slate Approach: Rather than attempting to salvage the previous delivery partner's work, we recognised that the fundamental architecture and approach were so flawed that starting fresh was more efficient than remediation.
  • Right-Sized Team Structure: A 3-person QCE enablement pod delivered more in 12 months than the previous oversized team delivered in 18 months, demonstrating the power of proper team composition and practices.
  • MLOps-First Design: Building a modern Azure ML platform with automation and self-service capabilities from the start unlocked daily model retraining and advanced ML techniques that legacy infrastructure couldn't support.

Challenges Overcome

  • Tight OfGem Deadline: Delivered complete MLOps platform within critical timeline after 18 months of previous failures, requiring intense focus on automated delivery and proven patterns rather than experimental approaches.
  • Legacy System Integration: Successfully migrated from expensive on-premise ML infrastructure to cost-optimised Azure ML platform whilst maintaining operational continuity for critical energy forecasting.
  • Team Restructuring Resistance: Overcame organisational inertia around team structure and practices, proving that properly structured, cross-functional teams with modern engineering practices could dramatically outperform traditional role-based silos.

What We'd Do Differently

  • Earlier Cost Baseline Establishment: While we achieved 92% cost reduction, earlier baseline cost analysis would have enabled even more aggressive optimisation targets and clearer ROI demonstration from project start.
  • Parallel Environment Setup: The production deployment process complexity could have been addressed in parallel with development work to accelerate final delivery timeline.
  • Stakeholder MLOps Education: More upfront investment in stakeholder education about MLOps benefits and modern ML platform capabilities might have reduced resistance to the architectural changes required.

Key Takeaway for Similar Engagements

When facing critical regulatory deadlines with failed delivery programmes, focus on proven MLOps patterns and right-sized team structures rather than attempting to fix fundamentally flawed approaches. Modern cloud-native ML platforms can deliver both dramatic cost savings and improved capabilities simultaneously when implemented with proper engineering practices.

Replicable Assets Created

  • Azure ML Platform Pattern: Complete MLOps platform template with daily model retraining, automated validation, and cost-optimised resource management suitable for regulated industries
  • Team Restructuring Playbook: Proven approach for transforming oversized, role-based teams into efficient cross-functional QCE pods
  • Cost Optimisation Framework: Azure resource optimisation patterns that achieved 92% cost reduction whilst improving capabilities
  • Regulatory Deadline Recovery Process: Proven methodology for rescuing failing programmes under tight compliance deadlines

Client's Future State

The Azure ML platform we delivered enabled NESO to meet their critical OfGem milestone and established the foundation for ongoing energy forecasting modernisation. The platform's ability to process 40TB+ of weather and energy data - including first-time access to Met Office GRIB format data with 45 million files of raw numerical weather models - provides NESO with the same meteorological capabilities used for professional weather prediction. Combined with daily model retraining and 30-minute forecast cycles, this positions NESO to significantly improve renewable energy forecasting accuracy, essential for grid stability as Britain's renewable capacity continues to expand toward net-zero goals.

Internal Learning: MLOps Delivery and Team Structure

This engagement demonstrates that well-structured QCE teams can deliver complex MLOps platforms whilst achieving significant cost optimisation. The transformation from a large, dysfunctional team to a focused 3-person pod proves that modern engineering practices and proper team structure matter more than headcount. Future similar engagements should leverage proven Azure ML patterns and focus on automation-first approaches rather than attempting to fix legacy team structures and processes.