Sr Staff Site Reliability Engineer
FireEye is the intelligence-led security company. Working as a seamless, scalable extension of customer security operations, FireEye offers a single platform that blends innovative security technologies, nation-state grade threat intelligence, and world-renowned Mandiant® consulting.
With this approach, FireEye eliminates the complexity and burden of cyber security for organizations struggling to prepare for, prevent, and respond to cyber attacks.
FireEye has over 9,000 customers across 103 countries, including more than 50 percent of the Forbes Global 2000.
The Role :
Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
SREs will be focused on maximum availability, reliability, security, and performance for FireEye cloud services. You will be an integral part of our Cloud SIEM , Site Reliability Engineering team in Bangalore.
Design, write and deliver software to improve the availability, scalability, latency, and efficiency of FireEye’s cloud services
Influence and create new designs, architectures, standards and methods for large-scale distributed systems
Collaborate with a world-class engineering team to propose features that solve recurring patterns of customer complaints
Engage in service capacity planning and demand forecasting, software performance analysis and system tuning
Participate in on call rotation , Participate, collaborate and provide guidance in retrospectives.
Find scalability bottlenecks and areas for performance improvements
Strong team player with a high degree of flexibility.
Systematic problem solving approach, coupled with a strong sense of ownership and drive
Bachelor’s degree in software engineering, computer science, computer engineering, or related technical field
10+ years experience in Cloud Software Engineering, Cloud Site Reliability Engineering, & Cloud Operations
Experience with Amazon Web Services and / or any other public cloud
Experience with Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) technology stacks.
Experience with containers and HA clusters; experience with Docker and Amazon ECS / Kubernetes is mandatory
Good knowledge of virtualization technologies and container technologies
Firm grasp of at least one modern programming language( Java / Go / Python / Ruby), beyond basic scripting(Shell,Perl,Bash)
Solid experience using configuration management frameworks (e.g. Ansible / Chef / Puppet)
Manage capacity, build Security into every layer and reduce cost
Implement secure Networking, key management, user management, access management, process management, image management.
Maintain services once they are live by measuring and monitoring availability, latency and overall system reliability.
Proven experience in handling large Infrastructure and distributed systems like Kafka, Elastic Search etc..
Release software through tooling (git, Jenkins, custom scripts, Docker)
Familiar with Observability Platform (application telemetry, tracing, and Log aggregation).
Expertise in designing, analyzing and troubleshooting large-scale distributed systems
Exp with Unix / Linux-OS Internals and administration (e.g. Filesystems, inodes, system calls, etc) or Networking (e.g.
TCP / IP, routing, network topologies, and hardware, SDN, etc)
Basic understanding of most of the following : Jira, Splunk
Experience with algorithms, data structures, complexity analysis and software design.