Thought of the Week: Key Drivers for an SRE Practice

<span id="hs_cos_wrapper_name" class="hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_text" style="" data-hs-cos-general-type="meta_field" data-hs-cos-type="text" >Thought of the Week: Key Drivers for an SRE Practice</span>

Date 29 June 2026

Author Team Capacitas

In conversations with customers and network peers, many companies are considering setting up a dedicated SRE team or possibly looking to realign existing responsibilities. According to a report from Catchpoint, 50% of organisations have dedicated SRE teams or roles, and the number of vacancies for Service Reliability engineers has increased dramatically.

This supports the belief that system reliability, performance, and availability continue to be at the top of the key drivers for establishing an improved foundation of SRE practices.

Key drivers for an SRE practice

The scale and complexity of IT Systems are key determinants. Increasing scale and complexity undoubtedly expose much more risk.
Operational risks are not proactively mitigated through development and tend to be reactively resolved.
The impact of operational failure on the business is substantial in terms of revenue loss and reputation.
The frequency and severity of production incidents are high. Development teams are spending too much time firefighting. Incident management is not fixing issues properly.
Service-Level Objectives (SLOs) for high-priority systems either do not exist or are not measured. Actionable insights are not being generated and operational issues are not exposed proactively. Management of SLOs is not happening.
Production monitoring and alerting are not set up properly and this leads to poor insight on performance, availability and reliability risk. Reporting is very weak. There is little or no observability in test environments.
Development teams miss chances to improve time to market and are not taking advantage of transformative activities such as automation frameworks, testing frameworks, deployment, and Infrastructure as Code. Releases are often overrun and the release cycle is slow.
Non-functional testing (performance/scalability/efficiency, resilience/recovery, security) is executed poorly if at all, and is not underpinned by testing frameworks.
Cross-functional collaboration between Service Management, Operations, and Development teams is poor and the benefits of close cooperation are not realised.

If any of these factors describe operational challenges you are experiencing then it might be time to examine your organisational capability and implement a remediation plan to plug key gaps.

About the Author

Frank Warren

Frank is a Principal Consultant specialising in capacity planning, performance engineering and cloud cost optimisation. Frank leads numerous high profile ecommerce clients, helping them achieve their business peaks while savings on cloud costs and improving performance.

If you would like to have a chat about optimising your cloud bill, feel free to reach out for a no commitment chat. You can contact us via the website at https://www.capacitas.co.uk/book-a-diagnostic-session or reach out via email at contact@capacitas.co.uk

Also worth having a look at some of our recent case studies where we have saved our clients Millions of pounds in cloud spend.

About the author

Team Capacitas

Capacitas is a cloud and AI value partner. We translate rapid technological change into enduring commercial advantage by converting every unit of compute into enterprise value.

FinOps and AI: Building the Financial Discipline for the Next Wave of Enterprise Intelligence

AI FinOps represents an evolution rather than a replacement of traditional FinOps. It extends the model into a domain where financial, technical, and product decisions are tightly interconnected.

Read insight

Confidence Under Load: How We Verified AKS Readiness for Peak

How Capacitas verified AKS readiness for peak demand by validating workload performance, autoscaling, cluster capacity, monitoring, and incident response.

Read insight

Building Cloud Resilience: Lessons from the AWS Outage

Learning from the Latest Outage. Events like this week’s AWS disruption highlight one clear truth: resilience must be designed, not assumed.

Read insight

Bringing Order to Chaos: A Practical Guide to Chaos Testing in the Cloud

In today’s cloud-native environments, resilience is not optional—it’s critical. Chaos testing has emerged as a key practice for validating system behaviour under failure conditions.

Read insight

Contact us

Blogs

Thought of the Week: Key Drivers for an SRE Practice

Key drivers for an SRE practice

About the author

Team Capacitas

FinOps and AI: Building the Financial Discipline for the Next Wave of Enterprise Intelligence

Confidence Under Load: How We Verified AKS Readiness for Peak

Building Cloud Resilience: Lessons from the AWS Outage

Bringing Order to Chaos: A Practical Guide to Chaos Testing in the Cloud