Manchester
/
Full time

Site Reliability Engineer (SRE)

Applies software engineering principles to operations, improving system reliability, scalability, and resilience through automation.

About the Company

This organisation operates complex, technology-driven systems where uptime, performance, and scalability are critical. Reliability is treated as a shared responsibility between engineering and operations, with SREs playing a key role in bridging the two.

The culture values automation, learning from failure, and data-driven decision-making. Engineers are encouraged to reduce manual work and build systems that can operate reliably at scale.

Typical benefits include flexible working, strong technical autonomy, and opportunities to work on high-impact infrastructure challenges.

The Role

As a Site Reliability Engineer, you will focus on ensuring systems are reliable, observable, and scalable. You will use software engineering approaches to automate operational tasks, improve monitoring, and reduce the risk of outages.

The role combines coding, systems thinking, and operational responsibility, requiring both technical depth and a strong understanding of how systems behave in production.

Key Responsibilities

  • Design and implement reliability-focused tooling and automation
  • Monitor system performance and availability
  • Respond to and analyse production incidents
  • Improve observability through logging, metrics, and alerts
  • Work with engineering teams to design resilient systems
  • Reduce toil through automation and process improvement
  • Conduct post-incident reviews and implement improvements
  • Define and maintain reliability standards and practices

What We’re Looking For

  • Strong engineering background with an operational mindset
  • Experience with production systems and incident response
  • Ability to write code to automate operational tasks
  • Understanding of distributed systems and failure modes
  • Calm, analytical approach under pressure
  • Strong documentation and communication skills

Tools & Environment

You are likely to work with:

  • Programming or scripting languages
  • Monitoring, logging, and alerting platforms
  • Cloud infrastructure and container platforms
  • CI/CD pipelines and automation tools

How Success Is Measured

  • System reliability and uptime
  • Reduction in manual operational work
  • Quality of monitoring and alerting
  • Effectiveness of incident response
  • Improvements driven by post-incident learning

Benefits & Progression

SREs often progress into senior engineering, platform architecture, or reliability leadership roles. Benefits typically include flexible working, training budgets, and the opportunity to work on complex, large-scale systems.

Benefits

Unlimited PTO Icon - Quantum | Webflow Template
28 days holiday
Healt Icon - Quantum | Webflow Template
Health benefits
Flexible Hours Icon - Quantum | Webflow Template
Flexible hours
Great Culture Icon - Quantum | Webflow Template
Great culture

More positions

Explore other roles that may be a good fit for your skills and experience.

Browse all positions