Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-
exceptional service conditions. SREs will be focused on maximum availability, reliability, security, and performance for FireEye cloud services.
You will be an integral part of our Global Site Reliability Engineering team in Bengaluru.
Design, write and deliver software to improve the availability, scalability, latency, and efficiency of FireEye’s cloud services
Influence and create new designs, architectures, standards and methods for large-scale distributed systems
Collaborate with a world-class engineering team to propose features that solve recurring patterns of customer complaints
Engage in service capacity planning and demand forecasting, software performance analysis and system tuning
Participate in on call rotation , Participate, collaborate and provide guidance in retrospectives.
Find scalability bottlenecks and areas for performance improvements
Strong team player with a high degree of flexibility.
Systematic problem solving approach, coupled with a strong sense of ownership and drive
Bachelor’s degree in software engineering, computer science, computer engineering, or related technical field
Experience in Cloud Software Engineering, Cloud Site Reliability Engineering, & Cloud Operations
Experience with Amazon Web Services and / or any other public cloud
Experience with Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) technology stacks.
Blend of both Development and SRE mindset (i.e. software and infrastructure )
Experience with Go / Python(strongly preferred), Perl or Ruby, or Java / C++ (one of the OOP language) , specifically for systems automation
Experience in cloud provisioning code development and tools (Terraform, .etc.
Networking : knowledge and understanding of network theory, such as different protocols (TCP / IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing
Firm grasp of at least one modern programming language( Java / Go / Python / Ruby), beyond basic scripting(Shell,Perl,Bash)
Solid experience using configuration management frameworks (e.g. Ansible / Chef / Puppet)
Release software through tooling (git, Jenkins, custom scripts, Docker)
Experience with algorithms, data structures, complexity analysis and software design.
Systematic problem solving skills, coupled with a strong sense of ownership and drive.
Procedural and troubleshooting documentation skills
Expertise in designing, analyzing and troubleshooting large-scale distributed systems
Familiarity with running web services at scale; understanding of Unix systems internals and networking , AWS.
Understanding of Unix / Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-
server protocols along the way
Good knowledge of virtualization technologies and container technologies
Experience with containers and HA clusters; experience with Docker and Amazon ECS / Kubernetes / Mesosphere / Docker Swarm a plus
Basic understanding of most of the following : Jira, Splunk