Cloud Engineering team, this role will be implementing an enterprise wide Kubernetes platform and a simple / automated Cloud adoption to containers platform to reduce expenses as part of the IT department's mission.
You will be a leader in providing technical guidance and direction for implementation, modernization, enhancements and management of the Kubernetes / Docker CaaS enterprise wide environments.
You will have autonomous control over day-to-day activities allocated to the team as part of agile development of new services.
Programming and orchestrating the deployment of feature sets into the Kubernetes Container.
Building and configuring Kubernetes-based infrastructure, Core-DNS, Kube Proxy, Ingress, etc.
Automating the build of containerized systems with CI / CD tooling, Helm charts, and more.
Managing deployments and rollbacks of applications.
Implementing monitoring and metrics with Prometheus.
Troubleshooting and optimizing containerized workload deployments for clients.
Automating operational tasks and assisting in the transition to service ownership models.
Collaborating across project teams to simplify and improve software lifecycle processes.
Evaluating technical options for meeting user needs and ensuring that system requirements are identified, prioritized, and incorporated in an effective, efficient manner.
Performing POC (Proof of Concept) technical evaluations for new technologies for use in the Cloud.
Setting technical direction and determining work priorities based on interactions with the Cloud Architecture teams, the Cloud and Software Vendors.
Communicating with management and executive leadership to ensure the team's goals and priorities align with the department.
Leading problem resolution for business impacting failures and driving the resolution to meet platform service level objectives.
This includes assembling resources from existing team members and engaging other engineering resources from peer teams in order to restore service disruptions.
Deciding upon the recovery method, establishing the recovery plan and to driving / implementing the recovery plan for the affected platform during the crisis bridge.
Performing detailed root cause analysis involving the necessary resources.