Changing the world through digital experiences is what Adobe’s all about. We give everyone from emerging artists to global brands everything they need to design and deliver exceptional digital experiences! We’re passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen.
We’re on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity.
We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!
The Challenge :
Are you comfortable with dev, comfortable with ops, and looking for a job that doesn’t have DevOps in the title?
Do you have an intimate understanding of the operational challenges of running services at scale, and are you also committed to overcoming those challenges with software instead of manpower?
Adobe needs a Site Reliability Engineer (SRE) who knows how to balance going fast and going big with operating safely. Our mission is to progress, protect, and provide for the software and systems behindall ofMarketo : AnAdobeCompany,with an ever-watchful eye onsystemavailability, latency, performance, and capacity.
SRE is a mindset of engineering approaches which focuses on building highly reliable systems and eliminatingtoilthrough automation.
We hire people from both systems and software backgrounds. Strong candidates will have experience with both. The engineer role within SRE is at the heart of fulfilling SRE’s mission : build highly reliable, scalable & measurable customer experience for the continued growth ofMarketo’sinfrastructure.
We are using both multi-cloud (Azure / AWS / GCP) and on-premiseenvironments. We are looking for someone who isambitious, has a passion for quality,andwants tohelp critical services succeedwithout compromising security.
Our SRE and Engineering teams are distributed, split between Denver, Colorado;San Mateo,California;and Bucharest,Romania.
Werely heavily on tools like Slack, JIRA and video conferencingto collaborate. Flexibility to join meetings with colleagues around the world is expected.
The successful candidate must be able to prioritize tasks and work independently.
What you’ll do
Engage with product and engineering todrive and improve the whole lifecycle of operational readiness - from inception and design, through deployment, operation and refinement proactively.
Write software layers, scripts, deployment frameworks, tracers, monitors, self-healing / auto remediation tools and automate the processes.
Build and maintain software modules for use and re-use in cloud andon-premisesystems automation.
Maintain business continuity by identifying and drivingopportunities to makesystems highly resilient and human-free.
Closely work withsoftware engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.
Maintain up-to-date documentation on deployments, processes,and standard operating procedures / run-bookswith a goal minimize runbooks by automation.
Even after self-healing and automation done by you if complex issues arise, get involvedwithtroubleshooting and root-cause analysis of issues across the stacks hardware, software, database, network and so on.
Participate in shared on-call schedule follow-the-sun model managed across SRE & Engineering.
Be an evangelist and promote lean-opscultureby applying self-service, self-healing and automation.
Work with product management team to define SLAs SLOs and implement SLIs for core capabilities.
Improve observability of software by implementing right monitoring, tracing and logging.
What you need to succeed
Experience designing for and dealing with a large production environment.Minimum of 7 years.
ABachelorsorMastersdegreein computer science engineering or related.
Developing, running, and / or consuming cloud technologies such as AWS, Azure, Google Cloud Platform and related tooling : Terraform, configuration management, etc.
Recent large-scale experiencedeveloping, running and / or consuming on premise platforms and related tooling : VMware, Ansible,Chef orPuppet, configuration management, etc.
Programming (PythonandBashareour preferred scripting / shell languages) and automation skills.
Troubleshooting and system engineering exposure in Linux production environments. Experience with Linux, Internet Protocols, and Large-Scale Operations.
Experience with CI / CD tooling : Jenkins,Spinnaker, GitLab runners,Azure DevOps, etc.
Experience with designing, deploying and maintaining monitoring solutions such as Splunk,Prometheus, Check MK, etc.
Familiarity with AWS / Azure well architected frameworks and practical experienceinapplying resiliency and reliability patterns such as Circuit Breaker, Bulkhead etc...
Great communication, interpersonal,and teamwork skills.
Ability to work independently and own problem statements end-to-end.
Experience with relational databases such asMySQL,Postgres,and document stores such as MongoDB.
Experience deploying applications in containers using Docker and Kubernetes.
Strong intuition about system design, robustness, and scalability.
Decent Experience with Windows.