Lead Site Reliability Engineer :
About the Team : SRE team includes expert Software and System engineers who are custodians of Availability, Scalability and Performance of the SaaS products.
We build tools and frameworks to monitor, load test and sometimes build full platform features that other products' use.
We undertake architecture reviews and help the individual product teams to identify performance bottlenecks.
We tend to look at the application from a system perspective bottom up rather than top-down.
Our engineers have the freedom to pick the challenges that they work on and own the task to completion.
Works with Cross-product teams to ensure high availability of system
Identifies system bottlenecks and recommend solutions to solve the availability issue
Undertake system level debugging in linux independently
Undertake root cause analysis when incidents or issues are identified
Identifies repetitive system tasks and automate
Guide new SRE engineers on aspects of system debugging
Qualification : Must Have :
Must Have :
Bachelor's degree in computer science :
8+ years of experience with at least 5 of those years in System engineering Or System Debugging roles for a SaaS product
Working experience in debugging high performance systems
Proven ability to work with multiple teams and multi-task and prioritising
Good working knowledge of the IP protocol stack and ability to use tools to monitor and sniff packets -
Experience in at least one of the following languages and willingness to learn new ones : C / C++, Java, golang
Nice to have :
Working experience in building massively scalable high performance services
Excellent written and verbal skills
Expertise in cloud platform like AWS or Azure is preferable