In This Role You Will;

Play a critical role in evolving our infrastructure as we develop solutions to complex technical problems involving reliability, latency, bandwidth and most importantly security.
Be an integral part of improving observability, monitoring and alerting throughout the platform.
Help co-ordinate work across different areas of the company to ensure the most efficient path of execution
Centralize wherever possible common streams of work that are currently duplicated across developer teams.
Focus heavily on writing tooling to replace manual, repetitive work in a scalable way.

To Succeed you need:

Experience working with cloud solutions (GCP or AWS).
Deep understanding and demonstrable experience with modern monitoring tools such as Prometheus, Datadog, Grafana, Telegraf
3+ years experience with infrastructure as code tools. Experience with complex Terraform deployments is a plus.
Solid background with configuration management tools. Experience with Saltstack is a plus.
Experience with using GitOps and CI to make changes, preferably Github Actions.
Experience with messaging systems such as Kafka.

Site Reliability Engineer