Amazon Web Services is the largest consumer cloud offering in the world, powering cutting edge science, rapidly growing start-ups and industry-leading companies.
The AWS Reliability Engineering team is building systems to ensure these AWS customers can rely on the highest-availability, lowest-latency cloud platform on the planet. We work closely with the teams who own the largest AWS products, building systems to detect and mitigate operational issues before they impact customers. We are looking for a knowledgeable and experienced senior software development engineer to help us succeed in this mission.
As a Senior Software Development Engineer at AWS Reliability, you will lead the team in the design and implementation of systems which automate fault containment, problem diagnosis, and issue resolution across multiple hugely-distributed, always-on architectures. These systems will take metric and dependency data from multiple sources and analyze them, correlating them with customer impact to determine root cause of an issue without human intervention. As the scale and complexity of AWS grows, this is the best way that we can offer our customers a stable and reliable cloud computing platform. We succeed once these systems detect, diagnose, and repair operational defects without customer impact or human intervention.
You will work with teams across AWS to drive adoption of the software that has been built by the team, and influence systems development practices for new and existing products. You will define availability goals for service teams across AWS, and strategies to make these goals attainable with minimal effort. Your goal will be to remove human-error from the day-to-day operations of the massive, always-on, distributed systems which make up AWS.
Within your first year on the AWS Reliability team, you will have met with senior technical leaders from across AWS, designed and implemented at least one new system, and you will have dived deep into the causes of at least one historic external customer impacting event, and determined how to prevent a similar event from ever happening again. As your career continues to develop, you will influence the growth and direction not only of the Reliability team, but of the AWS group as a whole.
If this sounds like the right challenge for you, then please apply today!