Position Overview :
We love travel - do you? Come join our exciting team and help us deliver the exceptional technical solutions for the world's #1 online travel company.
Bring your skills, experience, enthusiasm and passion. You'll see your work in the hands of millions of people worldwide as we continue to deliver the future of amazing customer experiences.
You will also contribute to creating a fun and innovative environment where brilliant people want to work. The Partner Availability Resiliency Optimization team ensures the health and availability of Expedia’s partner applications and services.
What you'll do :
Support the production environment through detecting services errors and escalating to engineers and developers
Perform application log analysis and develop enhancements that allow applications to better instrument for availability and performance (uptime, success rate, and user experience)
Monitor enhancements and automation of new health measurements
Identify and partner with application teams for performance improvements based on findings
Create custom alerts, dashboards and reports needed for the development, engineering, and operations teams
Proactively monitor the health of websites, applications, and related services
Contribute to incident response on critical applications and infrastructure issues
Identify / detect trends and patterns on dashboards
Work with development, application engineering and operations teams to monitor and analyze user experience patterns and understand their business and technology requirements
Support new partner launch
Work towards automation of manual tasks whenever possible
Create, update, understand, and follow processes and knowledge documents with integrity
Who you are : You'll fit the role if you have :
2-4 years of experience
Experience in Web Operations, Application Support, or similar position OR Bachelors / Masters degree in Computer Science, Mathematics, Statistics, Engineering or related discipline
Monitoring, Event Management, Analytics experience.
Demonstrated ability to construct complex SQL queries or write automated scripts for testing and analysis is great!
Programming experience on any of the languages like Python, R, PHP, Angular JS, Node.js, Java, C#, etc.
Understanding of design, development & integration experience on cloud platforms (i.e. AWS) in a continuous delivery environment.
Understanding of large-scale, complex systems from a reliability perspective.
Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-
focused environment with other Site Reliability Engineers, Developers and Program Managers.
Drive to ensure user experience and strong customer-centric point-of-view.
Experience in Cloud & Enterprise based monitoring tools, such as, Splunk, Cloudwatch, CatchPoint, Seyren etc. is a plus.