When it comes to your cloud estate, building and maintaining a cost-efficient culture that continues to deliver is not always easy. It requires ongoing effort and a culture-wide acceptance of your chosen approach. And it is worth it, just look at the benefits: streamlined operations, available budget for re-investment or development of new services, satisfied customers, and ongoing reliable performance and service delivery.
FinOps has emerged as a guiding light for organisations navigating the complex financial landscape of the public cloud. It is a methodology that promises cost optimisation, improved accountability, and better cloud resource management. And it delivers on those promises – to a degree.
But if you are deeply entrenched in the cloud, you may have noticed something unsettling: FinOps often focuses on treating the symptoms of cloud cost overruns, rather than addressing the root causes.
I almost think of FinOps as a skilled doctor. It is adept at diagnosing the ailment – identifying which services are draining your budget, where inefficiencies lie, and which teams are overspending. It can even prescribe effective treatments: rightsizing instances, reserving capacity, or leveraging spot instances. But just like a doctor who only focuses on managing symptoms, FinOps can miss the underlying disease.
Typically issues stem from product performance and resource bloat – both of which affect the confidence of the operations team. However, the real culprit behind many cloud cost overruns is often architectural. Your cloud infrastructure may be inherently costly due to design choices made early on. Perhaps you are relying heavily on expensive managed services, or your application architecture is not well-suited for the cloud's elasticity, because it is part of a lift and shift migration from an on-prem data centre or just a poorly designed application.
FinOps tools are fantastic at identifying these issues. But they rarely provide the solutions. That is because architectural optimisation is a different beast altogether. It requires a deeper understanding of architecture – including of applications, workloads and the intricacies of cloud architecture, which are skills that are not typically found in solution architects. It is also about leveraging the flexibility of the cloud to your advantage, making CSP (Cloud Service Providers) pricing models work for you to deliver maximum value.
The only way to know if you have the right level of resources is by having the correct visibility of your systems. How are they being utilised? Do you see underutilisation or is it about right? Do they have a regular usage pattern? Is this usage pattern changing over time? Is it impacted by seasonality?
All these questions can be answered by having the right observability in place. It is not just about considering technical metrics either. A common mistake I see over and over is engineers sizing resources for spikes in CPU utilisation or IOPS, assuming it is a valid workload, or the spike is driven by business demand. Often though there is no correlation between the business demand and usage.
For example, you have a workload that generally runs at 25% CPU utilisation, but there is a daily spike of 80% utilisation that lasts for 30 minutes. That daily CPU spike turns out to be a backup job. Your system is currently sized for that backup. The current backup window is three hours, yet it is completing in 30 minutes. There would be an opportunity to downsize, which would increase utilisation, increase the length of the backup job, and reduce costs. But that is OK, because we are matching deployed resources with the business requirement. Not downsizing because you see 80% utilisation, without understanding the cause of that spike and the business value it adds, is a mistake, that leads to increased costs.
When organisations rely solely on FinOps to manage cloud costs, they often find themselves in a frustrating cycle:
This reactive approach is exhausting and ultimately unsustainable. It is like constantly putting out fires without ever addressing the faulty wiring that is causing them. Surely, it is better to fix the faulty wiring instead of firefighting all the time. It is also a lot safer.
To truly conquer cloud costs, you need a more thoughtful approach that combines the strengths of FinOps with observability and business metrics to provide a complete picture.
Here is what that looks like:
FinOps is undeniably valuable. It provides essential visibility and control over your cloud spending. But it is not a silver bullet. To achieve long-term, sustainable cloud cost control, you need to go deeper. By addressing the root causes of cloud cost overruns through architectural optimisation, you can break the cycle of reactive cost management and build a cloud environment that is both efficient and affordable.
To find out more about our approach to cloud cost optimisation, or to take a deeper dive into how to make it work for your organisation, download our latest whitepaper.