Loading...

Title:  Reliability Engineering Lead

Location: 

Hyderabad, TG, IN Bengaluru, KA, IN

Description: 

We are looking for a bright engineers to join our Platform SRE (Site reliability engineering) team in Hyderabad. The Platform SRE team is responsible for supporting applications/services in production environment, working closely with development groups to identify and work on stability projects, experimenting with tools/process to increase stability of applications services.

 

What you’ll do:

  • Supervise SREs, ensuring that production applications which team supports are stable, reliable and well documented
  • Understand high level components of Arcesium platform and support internal and external clients for any issues
  • Work 50% supporting regular operations and 50% engineering to make the platform more stable and build automation to reduce toil
  • Support internal and external client for any technical issues on a 24x7 rotational coverage
  • Capability of full stack developer to analyze and debug complex issue across tiers from frontend to mid-tier to infrastructure
  • Adjust capacity utilization to be optimal
  • Partner with engineering team to ensure that applications are designed with operability & scale in mind
  • Participate in the evaluation of new software, automation, and infrastructure solutions
  • Identification of top errors, reliability issues and driving root cause to avoid repeat of incidents
  • Serve as an escalation point for production issues during shift or as required

 

 

What you’ll need:

  • 6-11 years of experience handling systems for large scale production environments
  • Experienced with variety of tools that help manage, understand, and debug large, complex distributed systems
  • Good knowledge of Unix system, web technologies, databases and public cloud systems like AWS, Networking, System
  • Good communication & collaboration skills and attention to details
  • Expert with troubleshooting issues and bugs
  • Programming experience (Python/Java/Go/C/shell)
  • Prior SRE/DevOps experience desirable