CrowdStrike or cloud strife?

Opinion: The rise of big-tech monopolies in cloud computing is causing the type of vulnerabilities responsible for the CrowdStrike meltdown, says Angus Dowell.

The rise of cloud computing, while offering numerous efficiencies, has also led to a decline in institutional IT knowledge that could help mitigate risk.
The rise of cloud computing has also led to a decline in institutional IT knowledge that could help mitigate risk.

The concept of ‘systemic risk’ was used in the years after the 2008 financial crisis to describe the risk from dominant and hyper-interconnected financial institutions. The interconnectedness of banks, insurers, and financial markets meant that the failure of one key player had widespread repercussions. Some institutions were deemed ‘too big to fail’.

Systemic risk isn’t particular to financial markets. It was on full display on July 18, in the usually hidden world of cloud-computing markets, when CrowdStrike, a third-party security software provider operating on Microsoft’s cloud infrastructure, released a software update that contained a disastrous bug.

Organisations, including numerous Fortune 500 companies, use CrowdStrike’s cybersecurity software to detect and block hacking threats. But when CrowdStrike issued the flawed update, it quickly spread to its customers using Microsoft Windows around the world, causing widespread systems crashes.

The update induced what many called the ‘blue screen of death’, bringing down everything from airlines to healthcare and banking systems, highlighting the fragility of our interconnected digital world.

Outages appear to be becoming more common. Last week, New Zealand experienced another set of locally specific outages from Microsoft, leaving the police and other organisations without services like email for a period of time.

Costs from outages such as this can be enormous. According to Parametrix, a cloud monitoring organisation covering the CrowdStrike crash, the banking and health care sectors are estimated to have lost between $1.15 and $1.94 billion respectively.

The update induced what many called the ‘blue screen of death’, bringing down everything from airlines to healthcare and banking systems, highlighting the fragility of our interconnected digital world.

Microsoft was quick to distance itself from the problem, emphasising that the bug was not a flaw in their own software but rather in CrowdStrike’s update. They argue the responsibility lies with CrowdStrike for issuing the faulty update and with customers for not preparing their IT environments for such an event.

This explanation sidesteps a critical issue: exacerbating the problem was Microsoft’s dominance in the cloud-computing market.

Microsoft’s cornering of roughly 25 percent of the global cloud market, alongside Amazon’s 34 percent and Google’s 10 percent, means that any issue in their ecosystem can have far-reaching consequences.

The dominance of these big-tech players forces companies to rely on the small number of proprietary operating systems on offer, leading to high levels of homogenous integration across their ecosystems leading to significant vulnerabilities.

These are vulnerabilities that at least one executive raised concerns about prior to the meltdown. As CrowdStrike vice-president, Drew Bagley, put it, “their IT stack may include just a single provider for operating system, cloud, productivity, email, chat, collaboration, video conferencing, browser, identity, generative AI and increasingly security as well”. He goes on, “this means that the building materials, the supply chain and even the building inspector are all the same”.

This situation underscores the need for strong intervention to create competitive markets to reduce systemic technological risks.

Another key amplifier of risk is the loss of institutional IT knowledge that comes from organisations migrating to the cloud. Leading up to the 2008 financial crisis, complex financial instruments and practices were poorly understood by many market actors and regulators. This lack of knowledge contributed to the inability to foresee and mitigate the risks.

The rise of cloud computing, while offering numerous efficiencies, has also led to a decline in institutional IT knowledge that could help mitigate risk. Companies and public institutions increasingly outsource their IT needs to cloud providers, leading to a loss of in-house expertise and a diminished capacity to respond to IT crises independently.

This dependency on big-tech cloud providers makes organisations more vulnerable to disruptions like the CrowdStrike incident, as they lack the necessary skills and knowledge to manage and mitigate such issues.

 

This situation underscores the need for strong intervention to create competitive markets to reduce systemic technological risks.

These interlocking vulnerabilities are not easily addressed. Microsoft, Amazon and Google’s dominance of more than two thirds of cloud computing markets means they have outsized power to keep things the way they are. It’s not just that CrowdStrike relies on Microsoft’s hugely powerful and geographically distributed cloud infrastructure, but that Microsoft uses its power to subject firms to anti-competitive practices.

As more and more are pointing out, the big-tech players mobilise a complex web of licensing restrictions and payment models that restrict customers from switching cloud providers, creating what’s called ‘vendor lock-in’ – high enough technical barriers to exit that means customers are effectively trapped by the juggernauts.

Up until a few months ago, for example, Amazon Web Services, the cloud provider arm of Amazon, charged ‘egress fees’ – fees charged to customers wishing to exit Amazon’s cloud to migrate to a new cloud provider. The move to drop egress fees was likely a response to pressure from a European Union announcement in January on the development of a European Data Act designed to promote fair competition among cloud providers.

When all was said and done after the 2008 crisis, relatively few people were held legally accountable for their involvement. There simply wasn’t the kind of legal recourse to define and hold actors responsible for a crisis of systemic proportion. In the case of the CrowdStrike and Microsoft outage, a similar issue exists in the lack of mechanisms to define responsibility in the highly opaque and distributed digital ecosystems of big tech.

Cloud computing is here to stay and undoubtedly brings important capabilities and efficiencies, but we should question whether we are organising and obliging cloud markets to reduce vulnerability to systemic technological risks.

Some might argue that risk from systemic failures brought by big tech’s monopolisation of cloud markets is outweighed by the innovations and cost reductions they deliver. This is a fallacy, and reflects a worrying version of the ‘too big to fail’ argument.

Big-tech monopolies work, in part, by the way their monopoly power allows them to externalise the cost of failures. Monopolists tend to force firms out of the market, and redesign competition in their market ecosystems according to terms that don’t threaten their own power. Cloud monopolies, in short, leave society at risk, while big tech are free to continue feasting.

If we are to take seriously what CrowdStrike’s Drew Bagley means when he says “we can no longer tolerate solutions or architectures that risk crumbling from a single point of failure”, then solutions clearly point toward breaking up big tech’s monopoly power in cloud markets.

Angus Dowell is a doctoral candidate at the School of Environment, Faculty of Science, where he is researching the relationships between cloud computing and the public sector. 

This article reflects the opinion of the author and not necessarily the views of Waipapa Taumata Rau University of Auckland.

This article was first published on Newsroom, Why cloud monopolies are a risk to society, 9 August, 2024

Media contact

Margo White I Research communications editor
Mob
021 926 408
Email margo.white@auckland.ac.nz