Reduced prediction latency from six hours to fifteen minutes at 2.4B events/day

A Series A renewable energy analytics startup needed to process terabytes of turbine telemetry with near-real-time yield predictions. A streaming architecture on Kafka, Spark Structured Streaming, and Delta Lake reduced prediction latency from six hours to fifteen minutes and enabled a new premium tier of customer commitments.

15 min

Latency

2.4B

Events/day

+12%

Energy yield

99.95%

Pipeline SLA

Pattern study · Client is anonymised; details may be composited across engagements to preserve commercial confidentiality. Outcomes reflect the kind of result these methods are designed to deliver.

The client operated a forecasting product for utility-scale wind and solar operators — its revenue model tied directly to the accuracy and timeliness of the yield predictions it delivered to asset managers who balanced grid commitments against anticipated generation. The product had launched successfully on a batch architecture: sensor data from across three thousand turbines and sixty-eight solar sites was aggregated hourly, joined against weather data, run through an ensemble of forecasting models overnight, and delivered to operators the following morning. For the first eighteen months this was acceptable. But as the energy markets moved toward fifteen-minute settlement intervals and intra-day balancing, the client's largest customers began asking for predictions that were structurally impossible to serve from a six-hour batch.

When we joined the engagement, the company had already attempted one re-platforming and failed. An internal team had started migrating to a Kafka-based ingestion layer but had underestimated the complexity of maintaining historical model reproducibility alongside real-time inference. The half-completed architecture had left both systems running in parallel, the original batch and the partially functional stream, with a growing maintenance burden and data-consistency incidents that the data engineering team were fighting on a weekly basis. The CTO's first ask of us was not to build something new, but to produce an honest assessment of what had been built and what would be required to actually finish it.

We spent the first three weeks auditing the in-flight architecture. Our recommendation to the CTO and board was that the existing Kafka layer was structurally sound and should be preserved, but that the stream processing layer — which had been implemented on a mix of Kafka Streams and bespoke Python consumers — needed to be consolidated onto a single framework with first-class support for stateful computation. We recommended Spark Structured Streaming running on Databricks, with Delta Lake as the storage format to provide both the streaming ingestion path and the historical reprocessing path the model training pipeline required. Crucially, Delta Lake's time-travel capability meant that any prediction could be reproduced against the exact sensor history that had been available at prediction time — a requirement that had been underappreciated in the earlier attempt and was now a contractual obligation with two of the client's largest customers.

The feature store became the other structural piece of the architecture. The forecasting models consumed several hundred features per prediction — rolling averages of power output, weather-interpolated irradiance, historical availability, maintenance windows — and the existing system recomputed these features inside the prediction code path, which was a significant contributor to the batch latency. We deployed Feast as the feature store with Delta Lake as the offline store and Redis as the online serving layer. Features were materialised continuously from the streaming pipeline, versioned, and served to the inference path with sub-second latency. The model serving layer itself was MLflow-orchestrated with canary deployment so that any model drift introduced by a new training run could be caught against a traffic slice before full rollout.

The adversarial case for the architecture was the grid emergency. If a regional grid operator needed a revised generation forecast within fifteen minutes of a major weather event, could the architecture serve it? We ran quarterly drills, injecting simulated sudden-weather scenarios into the streaming pipeline and measuring end-to-end latency from event ingestion to revised forecast delivery. The architecture consistently delivered revised predictions within four to seven minutes of the triggering event, well inside the fifteen-minute commercial commitment the client had made.

In production, the pipeline sustained two point four billion events per day across three cloud regions with a ninety-ninth-percentile ingestion-to-prediction latency of eleven minutes. Client operators reported a twelve per cent improvement in realised energy yield because intra-day balancing decisions could now be made on current data rather than on morning forecasts that had aged eighteen hours by the close of trading. The client closed its Series B seven months after the architecture went live, with the real-time capability as a central element of the investor pitch. The reference architecture has since been open-sourced in part and has become a recruiting asset for the client's data engineering hiring pipeline.

Need a similar outcome for your organisation?