Senior Site Reliability Engineer
Own the reliability, performance and scalability of our production systems. You will shape our infrastructure practices and help the engineering team ship with confidence.
Responsibilities
- Design and maintain cloud infrastructure across AWS, Scaleway and Hetzner
- Build and improve CI/CD pipelines and deployment workflows
- Define and implement observability with Grafana, Prometheus and alerting
- Lead incident response and post-mortem culture
- Champion reliability best practices across the engineering organisation
Requirements
- 5+ years in SRE, DevOps or infrastructure engineering
- Strong experience with Kubernetes and infrastructure-as-code (Terraform, Pulumi or equivalent)
- Solid Linux, networking and cloud platform fundamentals
- Experience operating production monitoring and alerting stacks
- Comfort with on-call rotations and incident management
Nice to have
- Experience with Cloudflare Workers or edge computing platforms
- Open-source contributions to infrastructure tooling
- Background supporting AI or data-intensive workloads