Insights  ✦  Release Risk

Why Most Software Releases Fail — And Why Testing Isn’t the Real Problem

Anthony Adeloye
Founder & Principal Consultant, CalyTeQ
13 January 2026
8 min read
Software release risk — CalyTeQ
← Back to Insights

The organisations experiencing the most production incidents are not the ones doing the least testing. They are often the ones doing the most — but measuring the wrong things. The gap between QA activity and release confidence is where incidents are born.

The testing paradox

I have been working in software quality engineering since 1998. In that time, I have seen testing volumes increase dramatically — more automation, more tooling, more dashboards — while the frequency of significant production incidents has not decreased at the same rate. In some BFSI organisations, it has increased.

This is not a coincidence. It is a structural problem. And it is one that senior technology leaders are increasingly aware of, even if they cannot always name it precisely. The symptom is a familiar one: your QA team delivers a green dashboard before every release, and yet your on-call engineers are still firefighting at 2am.

The cause is almost never a lack of testing. It is a lack of genuine confidence in what the testing means.

“More testing does not equal more confidence. It can equal more data — which is a very different thing.”

What “release confidence” actually means

When a BFSI board or a CTO asks “are we ready to release?”, they are not asking for a test results summary. They are asking a specific risk question: what is the probability that this release causes a significant incident, and what is the business impact if it does?

Most QA processes are not designed to answer that question. They are designed to answer a different one: have we completed the agreed testing activities? That is a compliance question, not a risk question. Answering it correctly tells you whether the team did what it said it would do. It tells you almost nothing about whether the system is safe to release.

The distinction matters enormously in regulated financial services environments, where the cost of a Sev1 production incident is not just technical. It is operational, reputational, regulatory, and — in extreme cases — existential.

The five failure modes I see repeatedly

Across 28 years of quality engineering work at Deutsche Bank, Commerzbank, Fujitsu, EY, Sky UK, and the Scottish Government, the same patterns recur. BFSI production incidents are almost always traceable to one or more of these five structural failures:

  1. Automation coverage that misses the critical path. High automation coverage percentages that do not correspond to the highest-risk user journeys. A system with 85% automation coverage might have zero automated coverage of the payment confirmation flow that processes 60% of transaction value.
  2. Performance assumptions that have never been validated. NFRs defined at the start of a programme and never formally tested against the production configuration. The test environment behaved correctly. Production, with real data volumes and real network conditions, did not.
  3. Governance that produces evidence but not assurance. Sign-off processes that confirm checklists were completed, not that the system is ready. The distinction is subtle but the consequences are not.
  4. Defect triage that accepts too much risk. Severity-3 defects accepted for release because “we’ll fix them in the next sprint.” Three of those accepted Sev-3s interact in production in a way nobody anticipated, and the result is a Sev-1.
  5. The trust gap between delivery and leadership. Delivery teams who believe the system is not ready but lack the evidence framework to make that case to a programme director under deadline pressure. The release proceeds. The incident follows.
The pattern in numbers

DORA research consistently shows that organisations in the bottom quartile of software delivery performance have a change failure rate of 46% — nearly half of all production changes cause incidents requiring hotfixes or rollbacks. Elite performers sit at below 5%. The gap is not testing volume. It is the quality of the risk assessment that happens before each release.

Why AI makes this problem harder before it makes it easier

The emergence of AI-assisted testing tools has added a new dimension to this challenge. AI can generate test cases at a speed and volume that human testers cannot match. It can identify coverage gaps, suggest edge cases, and analyse test results with unprecedented speed.

All of this is genuinely useful. But it compounds the existing problem if the underlying framework is wrong. More test cases generated by AI, applied against a system without a robust risk-prioritised coverage strategy, produces more data that still does not answer the release confidence question.

The value of AI in quality engineering is not in generating volume. It is in accelerating the analysis that a senior engineer then interprets. The interpretation — the judgement about what the evidence means, and what risk it represents to a specific organisation releasing a specific system into a specific regulatory environment — remains a human responsibility.

This is not a popular thing to say in an industry that is deeply excited about AI. It is, however, accurate.

“AI in quality engineering accelerates the collection of evidence. It does not replace the judgement required to interpret it at board level.”

What needs to change

The shift required is not primarily a technical one. It is a framing shift — from quality as a delivery activity to quality as a risk management function. Three changes make the most practical difference:

The commercial case is straightforward

A Sev1 production incident in a retail banking environment costs, conservatively, between £15,000 and £150,000 in direct costs — engineering time, incident management, regulatory reporting, customer remediation. The reputational cost is harder to quantify and typically larger.

A structured release risk baseline engagement costs a fraction of a single incident. An Executive Release Assurance assessment for a major release costs less than the overtime bill for the incident response team if that release fails.

The question is not whether senior quality engineering advisory is affordable. It is whether the cost of not having it is acceptable. In most cases, for most BFSI organisations, it is not.


Anthony Adeloye
Founder & Principal Consultant, CalyTeQ

Anthony has 28 years of quality engineering experience across Banking, Financial Services, Insurance, Government, and Enterprise technology. He has held senior roles at Deutsche Bank, Commerzbank, Fujitsu, EY, Sky UK, and IBM, leading performance engineering, test automation, and quality strategy programmes across the UK, Germany, and Luxembourg. CalyTeQ is his advisory practice — built to bring senior-level quality engineering expertise to BFSI organisations without the overhead of a large firm.

Read more about Anthony →

Release risk is a solvable problem.

A structured quality baseline or Executive Release Assurance engagement gives you a clear, evidence-based picture of where you stand — and what to do about it.

Book a Discovery Call Learn About ERA
A
Anthony — CalyTeQ
Available now
WhatsApp us