Senior Site Reliability Engineer
3d ago

Oracle is seeking motivated Senior Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment.

This position requires wide and overall knowledge in Linux administration, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability, security, performance, and reliability for our infrastructure.

Site Reliability Engineer expected to work with multiple service and product development teams, identifying cross-team issues that create risk for operations across the organization and resolving those issues with a mixture of engineering, development, troubleshooting expertise, and general operational guidance.

This role also requires excellent communication and organizational skills. The candidate is expected to collaborate with service owners, other engineers and developers to deliver a superior support experience to development community


  • Solve complex problems related to cloud infrastructure, Linux infrastructure and build automation to prevent problem recurrence.
  • Identify opportunities and drive the implementation of automation to improve service health, availability and reliability
  • Architect, design, configure, deploy, and script end-to-end service monitoring, alerting and self-healing capabilities for production services
  • Understand the end-to-end configuration, technical dependencies, characteristics of production infrastructure and services
  • Quickly grasp and analyze new technologies that are complex and rapidly changing and integrate those into automation and infrastructure support
  • Act as escalation point for complex or critical issues that may not have a documented procedure and provide cause analysis (RCA)
  • Author functional and technical documentation and standard operating producers (SOP)
  • Collaborate with development teams in defining and implementing improvements in service architecture.
  • Partner with DevOps teams, Oracle Cloud Infrastructure deployment, development teams to identify and resolve issues.
  • Articulate technical characteristics of services and technology areas and guide cross-functional teams to engineer and add capabilities to internal tools.
  • Responsible for the design and delivery of the mission critical automation, with focus on security, resiliency, scale, and performance.
  • Work with Global SRE team of Database Engineering and lead global projects.
  • Knowledge Skills

  • 6- 12 years of experience in Site Reliability Engineering and in implementing automation.
  • Experience in Linux administration with good knowledge on Kernel level debugging
  • Experience in debugging operating system performance issues and performance tuning
  • Excellent troubleshooting skills for resolving critical application, networking and system administration issues
  • Experience working with fault tolerant, highly available, high throughput, distributed, scalable systems
  • Expertise in developing scripts, utilities and tools to automate routine or manual intensive tasks
  • Experience in application, compute, storage and database troubleshooting for improving application reliability, scalability, availability
  • Experience in cloud infrastructure technologies
  • Experience in operations, problem management
  • Experience with ML and AI based development
  • Experience with monitoring tools such as Prometheus, Grafana
  • Development experience using Python and building Infrastructure using Terraform
  • Solid experience with Configuration Management tools such as Ansible, Chef
  • Experience in managing 24 7 high-availability production applications
  • Experience of working with global teams across different time zones.
  • Possess and demonstrates strong logical-thinking skill, full of intellectual curiosity and high for self-development.
  • Aptitude to be a good team player and the desire to learn and implement new Cloud technologies as needed
  • Good understanding of Agile software development principles including using common tools such as JIRA
  • Good understanding of cloud security, compliance management including patching
  • Multi-OS knowledge and expertise is preferred
  • Excellent organizational, verbal, and written communication skills
  • Qualifications required

  • 6 to 12 years of experience working in IT Operations Infrastructure team
  • Bachelor degree in Computer Science, Computer Engineering, Software Engineering, or related areas is preferred
  • Report this job

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    My Email
    By clicking on "Continue", I give neuvoo consent to process my data and to send me email alerts, as detailed in neuvoo's Privacy Policy . I may withdraw my consent or unsubscribe at any time.
    Application form