Pattern: SRE Team

The SRE (Site Reliability Engineering) team helps the development teams to maintain and improve the application (not the platform or infrastructure)

SRE Team

A big company has a large, mission-critical application with very high demands for quality and availability, and significant resources for creating dedicated improvement teams.

In This Context

Once a platform is built and in production, attention is often directed away from improving internal processes and runtime performance. This can cause degradation over time, reducing quality and performance.


Create a team that is focused 50% on reliability and 50% on continuous improvement of internal processes and development practices.This SRE team worries about overall site (platform) availability overall. However,each individual service has its own operational needs as well. It can be helpful to add SREs into each individual build squad (or at least tribe) to focus on service availability.


The runtime stability and quality is continuously increasing, and automation is also increasing.