Technology Asset Inventory SRE (Site Reliability Engineering)
The Enterprise System Management (ESM) department is seeking a Site Reliability Engineer to drive the reliability engineering, operations and customer support services for Morgan Stanley?
s suite of technology asset inventory products. ESM is a cornerstone of the Application Infrastructure organization in Morgan Stanleys Technology Division.
Reporting to the Global SRE Lead for Technology Asset Inventory (TAI) and IT Service Management (ITSM) products, this role requires executing on SRE capabilities for TAI.
The TAI portfolio of products are critical to the running of technology at Morgan Stanley, from enabling the configuration for monitoring and observability, regulatory driven risk management of systems, to billing and chargeback of technology services, to the tracking and tracing of all physical and logical assets on the network, including forensic analysis of undeclared devices.
The role requires delivering reliable systems without wasteful operational effort in a product area of ongoing investment and innovation.
This is a prod-side, operational role requiring participation in an on-call rotation, and also role that will require influencing skills amongst a technical stakeholder group.
Candidates from any sector are welcome and financial services experience is not required. The successful candidate will likely either be either a devops software engineer or an infrastructure specialist today.
Linux troubleshooting skills and object orientated development experience is essential, with either Python or the willingness and aptitude to learn Python quickly.
Responsibilities include :
Maximizing the availability and performance of supported systems through optimized and automated plant management, ongoing problem management, and architecture reviews with dev-side peers
Reduction of the cost of support (hours of effort) through the elimination of operational issues, optimization and automation of tasks, development of operational tools and driving client self-service to minimize constraints
Identification and prioritization of technical debt that is impacting client productivity, reliability or the efficiency of the ops team
Complex troubleshooting of front to back development environment issues
Minimizing the escalation rate to the dev-side team members to ensure the department has the greatest possible flow of feature delivery
Removing friction to the operational onboarding of new solutions in this ecosystem
Being operationally responsive, including sharing on-call rotation with the rest of the global team (with a time-off in lieu system)
Skills required :
An interest in using or experimenting with the latest technologies
Strong Linux troubleshooting skills
Competence in at least 1 object-orientated programming language, Python preferred
Ability to self-organize
Skills to communicate with and influence the products? dev teams
The drive to seek out and execute on opportunities to increase productivity for the team
Skills desired :
Prior experience automating operational tasks
Prior experience tuning monitoring and observability
Hands-on experience with Splunk or Grafana
Working with service level indicators / objectives (SLIs, SLOs) and Error Budgets
Delivering in an agile / devops team