Asset 2
Back to job search

Site Reliability Engineer - Permanent Opportunity

  • Sector:

  • Job type:

    Permanent

  • Contact:

    Aaron Van Kan

  • Job ref:

    4625

  • Published:

    about 2 months ago

MA Tech are working closely with a core partner looking to expand their first in class SRE team in Dublin. You will be working as a member of a team responsible for developing and running the services and infrastructure of products known throughout the world.

Looking to speak with candidates interested in being embedded into development teams working to ensure service design is reliable, scalable and has uptime at its core.

You will have experience in high volume, critical production service environments working at scale. Heavy focus on and a passion for automation with an understanding of fundamental technologies as well as public cloud platforms and the use of container tech such as Kubernetes.

Get in touch to learn more about this company and opportunity.

 

Responsibilities

  • Create and support scalable services
  • Contribute improvements to the availability, scalability, latency, and efficiency of services
  • Be a part of a full-service and cross disciplinary development team, participating in the full development process, including design, capacity planning, and production deployments, while promoting site reliability engineering best practices
  • Contribute to our deployment and automation tools, as well as the platform to more efficiently detect, address, and prevent problems from recurring
  • Define and measure production title availability, navigating known downtime, and service level outages
  • Debug problems at scale for mission critical services, and help the platform and service teams to implement lasting fixes to recurring issues
  • Be a part of our on-call rotation, which is a responsibility you'll share with your engineering team
  • Help to shape architecture. Influence and create new designs, architectures, standards, and methods for large-scale distributed systems
  • Influence a culture of service ownership
  • Engage in training and mentoring to help develop other engineers with this mindset

 

Requirements

  • Minimum 5 years relevant work experience, including in a high-volume or critical production service environment
  • Experience working at scale - thousands of servers running a high-volume or critical production service environment
  • Automation / scripting skills and a desire to automate all the things
  • Comfortable with at least one scripting language, e.g. Python or Ruby
  • Experience with at least one major database e.g. MySQL, Cassandra.
  • Solid understanding of fundamental technologies, e.g. TCP/IP, Linux/Unix internals
  • Experience in configuration management systems, e.g. Ansible, Puppet, Terraform


Desirable

  • Experience working with public cloud providers and cloud technologies e.g. Amazon, GCP
  • Experience working with container orchestration e.g. Kubernetes
  • Experience in monitoring and metrics systems, e.g. Nagios, Zabbix, Graphite, ELK
  • Background in Software Engineering is advantageous
  • Experience with Python scripting language