Series B Digital Lending Platform
Re-architected a failing monolith into a microservices platform that now serves 150,000+ concurrent users
A Series B lender's single-binary platform was buckling under load and blocking FCA reauthorisation. Eleven months of structured modernisation took them from 10,000 concurrent users to 150,000, cut infrastructure spend by 40%, and unlocked a further £30M of ARR that the regulator had effectively frozen.
When the commercial team first briefed us, the platform was processing roughly £4.2 billion in originated loan volume annually on a codebase that had not been structurally modified since 2019. Every release required a full-system deploy. Peak concurrency hit a hard ceiling at ten thousand active sessions, at which point garbage collection in the core Java application would stall transaction workers long enough to cause cascading 504 errors on the consumer-facing API. The business impact was unambiguous: during the four-week window between 17:00 and 22:00 on weekday evenings — the only hours most retail borrowers were available — the platform was dropping between three and seven per cent of applications outright, and internal monitoring suggested another twelve per cent were abandoned because of latency. With FCA reauthorisation nine months away, the board needed a credible remediation plan, not a roadmap.
Our first ninety days were deliberately unglamorous. Before committing to an architectural direction, we embedded three senior engineers and a principal architect into the client's production environment with read-only access across their AWS organisation, their Splunk estate, their CI pipeline, and their ticketing system. The objective was a forensic assessment of where the monolith was actually failing — as distinct from where the team believed it was failing. The distinction mattered. Internal opinion held that the database was the bottleneck; the evidence from thirty-one days of structured observability told a different story. Three specific code paths — the credit-decision orchestrator, the affordability scoring service, and the document-signing webhook handler — were consuming seventy-one per cent of total request time. These three became the anchor tenants of the migration plan.
We recommended, and the client's CTO approved, a strangler-fig migration onto AWS ECS Fargate across fourteen bounded contexts. The sequence was chosen not by technical preference but by commercial risk: the three high-traffic code paths came first, so that measurable latency improvements would be visible to the risk committee before the expensive work began. Each bounded context was extracted into its own service with a documented contract, deployed behind an API gateway with canary traffic shaping, and instrumented with OpenTelemetry traces propagated end-to-end. The ledger itself — which in the legacy system had been a sequence of synchronous writes to a PostgreSQL primary with four read replicas — was re-engineered as an event-driven architecture on Kafka with idempotent transaction handlers and full ISO-20022 message compliance. This meant that even if the credit-decision service timed out mid-application, the partially completed transaction could be resumed rather than rolled back, eliminating one of the most common sources of customer-facing errors in the previous system.
The rollout was governed by the regulatory calendar rather than by engineering preference. We moved three per cent of production traffic to the new platform in week eighteen of the engagement, held at five per cent for three weeks of intensive synthetic and real-user monitoring, then stepped through fifteen, forty, seventy, and finally one hundred per cent over a further six weeks. At each step we ran full reconciliation between the legacy ledger and the event-sourced ledger — any divergence greater than one basis point of transaction volume triggered an automatic rollback. Over the full cutover, two rollbacks occurred and both were diagnosed and re-deployed within twelve hours. The FCA's technical assessors were briefed continuously through the reauthorisation process and given observability access to the production environment for the final month.
The outcomes were measured against the same metrics the client had historically reported to their investors. Concurrent user capacity increased from approximately ten thousand to over one hundred and fifty thousand verified in load testing; peak evening application completion rose from ninety-three per cent to ninety-nine point four per cent; the infrastructure cost per originated loan dropped by forty per cent as Fargate autoscaling replaced permanently provisioned EC2 capacity. The FCA reauthorisation was cleared ninety-four days ahead of the renewal deadline with no conditions attached. In the twelve months following go-live, the client grew ARR from £18M to £48M, and the engineering leadership reported that their deploy frequency rose from bi-weekly to, on average, seventeen times per day. The platform has since been used as a reference architecture in two further FCA-authorised lender authorisations that our team has supported.