Staff Site Reliability Engineer
FireEye, Inc
Bangalore, India
4d ago

The Role :

Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-

exceptional service conditions. SREs will be focused on maximum availability, reliability, security, and performance for FireEye cloud services.

You will be an integral part of our Global Site Reliability Engineering team in Bangalore.

Responsibilities :

  • Design, write and deliver software to improve the availability, scalability, latency, and efficiency of FireEye’s cloud services
  • Influence and create new designs, architectures, standards and methods for large-scale distributed systems
  • Collaborate with a world-class engineering team to propose features that solve recurring patterns of customer complaints
  • Participate in on call rotation , Participate, collaborate and provide guidance in retrospectives.
  • Find scalability bottlenecks and areas for performance improvements
  • Strong team player with a high degree of flexibility.
  • Systematic problem solving approach, coupled with a strong sense of ownership and drive
  • Requirements :

  • Bachelor’s degree in software engineering, computer science, computer engineering, or related technical field
  • Experience in Cloud Software Engineering, Cloud Site Reliability Engineering, & Cloud Operations
  • Experience with Amazon Web Services and / or OCI any other public cloud
  • Programming Languages : ( Java / Go / Python ), beyond basic scripting(Shell, Bash )
  • Infrastructure as Code : Terraform, Packer
  • Monitoring tools : Elasticsearch , Telegraf, Grafana ,Influx , Kapcitor
  • Knowledgeable in distributed systems and redundancy / high-availability and performance optimizations
  • Networking : knowledge and understanding of network theory, such as different protocols (TCP / IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing.
  • Exposure to troubleshooting tools like TCPdump

  • Solid experience using configuration management frameworks (e.g. Ansible )
  • Release software through tooling (git, Jenkins, custom scripts, Docker )
  • Experience in supporting large scale web facing SaaS product
  • Systematic problem solving skills, coupled with a strong sense of ownership and drive.
  • Ready to carry oncall rotation ( Pagerduty)
  • Additional Qualifications :

  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems
  • Familiarity with running web services at scale; understanding of Unix systems internals and networking , AWS.
  • Understanding of Unix / Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-
  • server protocols along the way

  • Good knowledge of virtualization technologies and container technologies
  • Experience with containers and HA clusters; experience with Docker and Amazon ECS / Kubernetes / Docker Swarm a plus
  • Basic understanding of most of the following : Jira, Splunk
  • Step 2
    Apply
    Add to favorites
    Remove from favorites
    Apply
    My Email
    By clicking on "Continue", I give neuvoo consent to process my data and to send me email alerts, as detailed in neuvoo's Privacy Policy . I may withdraw my consent or unsubscribe at any time.
    Continue
    Application form