Senior Systems Engineer, AWS Managed Operations (MO) id-4018
About the position
The AWS Managed Operations (MO) organization, founded in April 2023, is dedicated to reducing operational load and toil through long-term engineering projects. This team is tasked with building a best-in-class engineering and operations team that will manage the day-to-day operations for AWS Regions. The primary goal is to enhance the availability, reliability, latency, performance, and efficiency of AWS operations. We are seeking highly motivated Senior Systems Engineers who can effectively balance the daily operations of AWS's software systems with long-term software engineering initiatives aimed at reducing operational toil. Candidates should possess a passion for continuous learning and a deep understanding of the diverse systems and technologies that constitute one of the world's largest cloud providers. As part of the AWS Utility Computing (UC) organization, you will support the development and management of various services, including Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services within AWS. The role involves not only maintaining existing systems but also innovating and improving them to ensure they meet the evolving needs of our customers. AWS values diverse experiences and encourages candidates from all backgrounds to apply, even if they do not meet every qualification listed in the job description. At AWS, we foster an inclusive team culture that promotes learning and curiosity. Our employee-led affinity groups create a supportive environment that celebrates our differences. We offer ongoing events and learning experiences, such as the Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, which inspire us to embrace our uniqueness. We are committed to mentorship and career growth, providing resources to help you develop into a well-rounded professional. Additionally, we prioritize work-life balance, offering flexible work hours and arrangements to ensure that success at work does not come at the expense of personal life. The AWS Operations Management (AWSOM) team is on a mission to launch a new offering that will enhance security, availability, performance, and efficiency across AWS Regions globally. We are focused on eliminating operational toil through automation, enabling us to manage day-to-day operations at scale. Our team increases collaboration between development and operations while prioritizing customer needs. AWSOM will take on operational responsibility for all Utility Compute (UC) services in AWS commercial and sovereign regions, allowing service teams to focus on rapid innovation for our customers. We will be responsible for service availability, latency, performance, efficiency, change management, and monitoring, directly influencing customer experience by recommending resilience and reliability improvements in products.
Responsibilities
- Manage day-to-day operations for AWS Regions.
- Improve availability, reliability, latency, performance, and efficiency of AWS operations.
- Balance daily operations with long-term software engineering initiatives.
- Support the development and management of Compute, Database, Storage, IoT, Platform, and Productivity Apps services.
- Eliminate operational toil through automation.
- Collaborate with service teams to enhance service availability and performance.
- Influence customer experience by recommending improvements for resilience and reliability.
Requirements
- 9+ years of relevant systems engineering or systems administration experience using Operating Systems, networking, and storage systems.
- 5+ years of Systems Engineering, DevOps, Site Reliability Engineering (SRE), or Enterprise Production experience in Windows/Linux or similar environments.
- 3+ years of experience operating in a 24/7 production environment.
- 3+ years experience with a scripting language: Perl, Python, Ruby, PowerShell or similar languages.
Nice-to-haves
- Bachelor's Degree in Computer Information Systems, Computer Engineering or a related discipline.
- 15+ years experience with cloud computing technologies.
- 6+ years of networking experience.
- 6+ years of experience with support procedures and methodologies for production computing environments.
- Development of systems automation in a scripting language: Perl, Python, Ruby, PowerShell or similar languages.
- Experience with Agile engineering practices (Scrum, continuous delivery, etc).
- Meets/exceeds Amazon's leadership principles requirements for this role.
- Meets/exceeds Amazon's functional/technical depth and complexity for this role.
Benefits
- Flexible work hours and arrangements
- Mentorship and career advancement resources
- Employee-led affinity groups for inclusion
- Ongoing learning experiences and events