Druva is the global leader in Cloud Data Protection and Management, delivering the industry’s first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it.
Druva’s award-winning solutions intelligently collect data, and unify backup, disaster recovery, archival and governance capabilities onto a single, optimized data set.
As the industry’s fastest growing data protection provider, Druva is trusted by over 4,000 global organizations, and protects over 40 PB of data.
Please do visit us at : Role and Responsibility :
The CloudOps Engineer is responsible for 24 / 7 availability for Druva, a cloud SaaS
Support and sustain customer facing AWS Production
Front line support for Cloud Monitoring Infrastructure and Application
Respond, troubleshoot and resolve production alerts
Communicate and troubleshoot operational issues supporting a complex environment
Initiate Incident Response for cloud outages
Analyze trends to pro-actively prevent incidents
Respond to product escalations from Support as well as Engineering
Scale infrastructure capacity on production
Participate in Cloud Updates and Maintenance
Assist in security vulnerability and remediation
Must feel comfortable working in a fast-paced, dynamic and flexible environment
Participation in an on-call rotation and operate effectively in a global 24x7 environment
Must be able to work extended hours as needed including being available for off hours production support
Experience :
Ability to learn new technologies quickly with some support and guidance
5 - 9 years
Strong Linux / Unix administration
Knowledge of Cloud providers including Amazon AWS, Google Cloud Platform, or Microsoft Azure
Knowledge of Configuration Management (SaltStack) for complex software management
Monitor and analyze system logs and RCA
Monitor site reliability and performance
Scripting knowledge (Shell, Python)
High-level understanding of networking standard protocols and components such as : HTTP, DNS, TCP / IP, ICMP, the OSI Model, Subnetting and Load Balancing
Ability to think outside-of-the-box to generate creative solutions to problems
Requires the ability to multitask and work well under pressure
Requires excellent communications skills, both verbal and written