Roles and Responsibilities : Responsible for uptime / availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
Responsible for the overall operability, resiliency, performance, and capacity of owned production services. Collaborate with engineers to execute strategic changes in the Infrastructure based on the product roadmap.
Identify repeat problem areas and build monitoring and automation tools to mitigate them Develop dashboards, visualizations and regular monitoring for our infrastructure components and core applications Automate system capacity, uptime, and other system related reports Gain expert level knowledge of our applications and services Provide mentorship to a team of highly passionate and skilled engineers Participate in a weekly on-
call rotation, Conduct regular SRE quarterly service reviews to assess workload Write / develop tools to assist work such as capacity planning or improving the ability to debug production issues over distributed systems.
Contribute to a culture of learning and responsibility by writing detailed post-mortem RCA reports. Tackle live issues on production when on-
call with assistance from the rest of the teams. Required Skills : 4 years of Software, Site Reliability, Systems, or Service Engineering experience on large scale cloud services Proven experience in Cloud Platforms -
AWS (preferred) / GCP. Current software development expertise in multiple programming languages (C#, C , Python, Java, et al) Experience crafting, analyzing, and troubleshooting distributed systems Should be an expert of Redis, and MySQL.
Can write programs, develop applications in the hour of need Should be good at LINUX / UNIX, RDBMS, JMeter, Load Balancers, Certificates, DNS, Proxy, networking concepts, shell & Python Scripting etc.
Knowledge on any Config management tool, working understanding of Enterprise and Internet Security is a plus Attention to detail and accuracy and ability to spot long-
term trends in production web environment Outstanding interpersonal, analytical, and communication skills Most importantly takes ownership and enjoys being an SRE,