The Company Springboard is redefining professional education for the 21st century through immersive, mentor-supported courses in cutting-edge fields like data science and design.
Our self-paced, online offerings give anyone, anywhere access to world-class learning resources, with an emphasis on project-based learning, industry-relevant curriculum, and tangible outcomes.
Through this hybrid approach, we’ve helped thousands of learners revamp their careers and, by extension, their lives. The Opportunity As a Senior Site Reliability Engineer (SRE) at Springboard, you will be a key member for our cloud infrastructure and tech-operations initiatives.
You will utilise your diverse background in operations, cloud, systems engineering, and monitoring to ensure uptime, reliability, efficiency and health of our web-services on staging and production.
You’ll learn quickly, be hands-on, own key processes, and make continuous improvements to the quality of our services and operations, as we scale.
Being one of the primary people responsible for reliability, health, and performance of our cloud services. Learning, advocating and adopting processes and industry best-practices.
Gaining deep knowledge and understanding of Springboard’s application ecosystem and services Setting a high bar for reliability, quality and operational efficiency through continuous improvement.
Thinking, innovating and engineering solutions to detect and solve complex problems, which are hard to solve using conventional tools
You will utilise your excellent communication skills, empathy and training skills to groom junior engineers towards building a strong SRE function at Springboard.
Analyse, design, and implement strategies across our Google Cloud infrastructure with emphasis on security, traffic management, cluster configuration, monitoring and operations.
Conduct system tests and put processes in place to monitor security, performance, and availability of the service Provide continuous feedback from the production environment to the development team Set up telemetry (logs, metrics and events) on production systems as well as deployment pipelines to create continuous real-time feedback mechanisms, providing insights to the development team
Handle and address requests from the engineering team for configuration changes, permissions / access based on laid out security policies Define processes and implement systems to streamline the operations process such that it can be handled smoothly, and audited.
Evaluate solutions : homegrown, 3rd-party open-source and SaaS for tools that should be used to build efficiency within the infrastructureIdentify potential bottlenecks given rate of growth, and scale
Develop scripts and CLI tools for day-to-day tasks Contribute to the CI / CD pipeline for smoother integration and deployment of tech
g. IAM, networking, logging, operating systems, containers)