Last month’s global IT outage, which affected 8.5 million Windows PCs, once again highlights the vulnerabilities in the computer systems of some of our largest companies and organisations. The failure, triggered by an update of CrowdStrike cybersecurity software, disrupted services and sectors including airlines, online transactions, cash machines, card payments for retailers like Morrisons and many high street banks. Even NHS GP and cancer-treatment appointments were cancelled.
The CrowdStrike crisis is not an isolated incident. Major tech outages have recently impacted customers and operations at McDonald’s, Greggs, Deliveroo, Tesco and Barclays. A new Roq survey (attached) has found that a third of Brits have experienced a technology failure within the banking sector in the past year.
Stephen Johnson, CEO and founder of Quality Engineering consultancy, Roq, says it is now imperative for companies and organisations to invest significantly more resources and effort into ensuring robust, future-proof systems underpin everything they do.
“The issues with CrowdStrike have been heightened due to it being a cybersecurity product. However, it didn’t cause a security breach. The real challenge is that operating systems like Windows, security platforms, and sensitive data are akin to the skeletal frame of a human body or the core infrastructure of society. When compromised, it has a devastating effect.
“We have seen in the UK, and across the world, how a lack of investment in core infrastructure – roads, trains, the NHS, and teachers – has long-term impacts, and the remedy is very expensive. Our taxes will increase to address these issues. Technology, despite being a substantial part of organisational budgets, is similarly feeling the impact of long- term underinvestment.
“Unfortunately, in a competitive commercial landscape, speed often takes precedence over quality. However, in critical infrastructure like CrowdStrike, neglecting quality can lead to significant repercussions. As AI becomes increasingly prevalent, the associated risks will only grow. You can tell when a piece of code doesn’t work, but how will you know when an algorithm is truly working as it should?
“We’ll probably never know what validation took place at CrowdStrike when someone decided to go live with this code. But the assumption is that something was missed, unplanned for, or a defect was ignored. Each of these possibilities indicates a significant oversight in Quality Assurance, which lead to severe consequences for millions of people across the globe on Friday. Until the quality of technology is seen as a serious risk factor at board level, we’ll continue to encounter these issues.
“There are plenty of steps to take to ensure that products are technically fit, but money and time seem to be bigger priorities. Executives will feel the repercussions of this. I personally know many people who were caught in the crossfire of this with nothing they could really do. However, like most significant issues, this could be an early symptom of what is to come.”
Steps to improve Quality Engineering and system robustness:
- Implement fast, automated testing systems: Quickly identify and resolve issues by integrating automated testing tools that can perform continuous testing throughout the development cycle.
- Conduct thorough analysis and validation of systems: Ensure robustness by performing comprehensive system analysis, stress testing, and validation to identify potential failure points.
- Prioritise long-term investment in technology: Allocate resources for ongoing technology upgrades and maintenance to prevent underinvestment impacts that could lead to system failures.
- Adopt a culture of quality over speed: Encourage a shift in mindset where quality is valued over rapid deployment, reducing the likelihood of costly errors and outages.
- Foster cross-functional collaboration: Encourage collaboration between development, QA, and operations teams to ensure a cohesive approach to quality engineering and system robustness.
- Implement a robust change management process: Ensure all changes are thoroughly reviewed, tested, and documented to minimise the risk of introducing new issues into the system.
By addressing these areas, organisations can better protect themselves and their customers from the significant risks posed by technology failures. The CrowdStrike incident is a serious reminder of the importance of robust Quality Engineering in our increasingly technology-dependent world.
About Roq
Roq is an outcomes-focused quality engineering consultancy which has now gained platinum status from Investors in People. The firm provides an independent view on all things quality, working with some of the world’s largest organisations on their most important technology initiatives. Roq’s goal is to help organisations to realise the benefits of high-functioning, high-quality software applications delivered at a pace that aligns with their business imperatives. It has been proudly providing services since 2009.
It is a co-founder of the Quality Engineering Forum, which was formed in 2023 by a group of seasoned quality engineering professionals united in their goal to elevate standards to enable organisations to improve technology delivery. Its members include Alliance Healthcare, BBC, Zenith Intelligent Vehicle Solutions, Roq, and S&A Group. The Forum recently launched a Quality Engineering Charter.