Sr. Site Reliability Engineer
Springboard
Bengaluru, Karnataka, India
3d ago

The Company Springboard is redefining professional education for the 21st century through immersive, mentor-supported courses in cutting-edge fields like data science and design.

Our self-paced, online offerings give anyone, anywhere access to world-class learning resources, with an emphasis on project-based learning, industry-relevant curriculum, and tangible outcomes.

Through this hybrid approach, we’ve helped thousands of learners revamp their careers and, by extension, their lives. The Opportunity As a Senior Site Reliability Engineer (SRE) at Springboard, you will be a key member for our cloud infrastructure and tech-operations initiatives.

You will utilise your diverse background in operations, cloud, systems engineering, and monitoring to ensure uptime, reliability, efficiency and health of our web-services on staging and production.

You’ll learn quickly, be hands-on, own key processes, and make continuous improvements to the quality of our services and operations, as we scale.

Responsibilities

  • Play a significant role in SRE initiatives at Springboard by :
  • Being one of the primary people responsible for reliability, health, and performance of our cloud services. Learning, advocating and adopting processes and industry best-practices.

    Gaining deep knowledge and understanding of Springboard’s application ecosystem and services Setting a high bar for reliability, quality and operational efficiency through continuous improvement.

    Thinking, innovating and engineering solutions to detect and solve complex problems, which are hard to solve using conventional tools

  • Serve as a mentor for junior engineers :
  • You will utilise your excellent communication skills, empathy and training skills to groom junior engineers towards building a strong SRE function at Springboard.

    Analyse, design, and implement strategies across our Google Cloud infrastructure with emphasis on security, traffic management, cluster configuration, monitoring and operations.

    Conduct system tests and put processes in place to monitor security, performance, and availability of the service Provide continuous feedback from the production environment to the development team Set up telemetry (logs, metrics and events) on production systems as well as deployment pipelines to create continuous real-time feedback mechanisms, providing insights to the development team

  • Own Infrastructure Operations :
  • Handle and address requests from the engineering team for configuration changes, permissions / access based on laid out security policies Define processes and implement systems to streamline the operations process such that it can be handled smoothly, and audited.

  • Make recommendations to the development team on areas related to the reliability, maintainability, availability, security and performance of the system as well as efficiency of the team
  • Evaluate solutions : homegrown, 3rd-party open-source and SaaS for tools that should be used to build efficiency within the infrastructureIdentify potential bottlenecks given rate of growth, and scale

  • DevOps :
  • Develop scripts and CLI tools for day-to-day tasks Contribute to the CI / CD pipeline for smoother integration and deployment of tech

    You :

  • Are passionate about enabling teams to build, test and deploy software faster and more reliably - ideally you have 4+ years experience with a diverse mix of SRE, DevOps, System Administration or equivalent software-engineering role
  • Must be experienced in working with containers, orchestration tools and supportive monitoring & configuration tools within the ecosystem.
  • Must have strong expertise on Docker and Kubernetes, having managed infrastructure on high-availability production environments.
  • Must be an experienced script developer, comfortable developing shell scripts (bash, zsh) and at-least one or more scripting languages (Lua, Python, perl, Ruby, JavaScript);
  • Python & JavaScript preferred. You must be open and willing to learn new languages and technologies.

  • Must be experienced with Google Cloud Platform (GCP) and admin functionalities
  • Must be knowledgeable about integrating, configuring, deploying and managing centrally provided common cloud services (e.
  • g. IAM, networking, logging, operating systems, containers)

  • Must be an experienced *nix power-user with foundational operating system knowledge and system-administration experience.
  • Must be passionate about SRE, with a strong desire to learn new technologies, mentor junior engineers, and aspire to grow yourself whilst keeping pace with the company’s growth.
  • Are knowledgeable about network, server, and application-status monitoring. Comfortable deploying and configuring tools to suit evolving needs by setting up tools and dashboards (Ex : Nagois, DataDog, Prometheus, Graphana etc)
  • Are experienced in working day-to-day with Git (version control), with knowledge of semantic versioning, release and change management.
  • Are knowledgeable about automated configuration management and deployment tools (such as Puppet, Chef, Ansible)Are deeply interested in identifying, innovating, exploring and solving complex problems related to system performance and scale.
  • Are a preferred candidate if you are Google Cloud Certified (Example : Cloud Architect, Cloud DevOps Engineer)
  • Report this job
    checkmark

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Apply
    My Email
    By clicking on "Continue", I give neuvoo consent to process my data and to send me email alerts, as detailed in neuvoo's Privacy Policy . I may withdraw my consent or unsubscribe at any time.
    Continue
    Application form