Site Reliability Engineer 3

Remote · USA Full-time New today

About the position The Site Reliability Engineering team designs and builds the global infrastructure on which MongoDB deploys its services, focusing on the flagship MongoDB Atlas platform. As customers grow and globalize, services must satisfy demands for low-latency requests around the globe and comply with various data sovereignty requirements. The SRE Team’s mission is to build this increasingly complex infrastructure, while continually lowering the operational burden associated with it and increasing internal visibility into the health of the system. They are strong believers in infrastructure-as-code and self-healing systems. The SRE Team is fully integrated with all other engineering teams, working closely together with a soft and traversable boundary between their areas of responsibility. Candidates based in New York City are sought for a hybrid working model. MongoDB is built for change, empowering customers and people to innovate at the speed of the market. They have redefined the database for the AI era, enabling innovators to create, transform, and disrupt industries with software. MongoDB’s unified database platform—the most widely available, globally distributed database on the market—helps organizations modernize legacy workloads, embrace innovation, and unleash AI. Their cloud-native platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available across AWS, Google Cloud, and Microsoft Azure. With offices worldwide and nearly 60,000 customers—including 75% of the Fortune 100 and AI-native startups—relying on MongoDB for their most important applications, they are powering the next era of software. MongoDB's Leadership Commitment guides decisions, interactions, and success. To drive personal growth and business impact, they are committed to developing a supportive and enriching culture for everyone, offering employee affinity groups, fertility assistance, and a generous parental leave policy to support employee wellbeing and professional and personal journeys. MongoDB is committed to providing necessary accommodations for individuals with disabilities within their application and interview process and provides equal employment opportunities to all employees and applicants.

Responsibilities

Design and build the infrastructure for a global cloud service that comprises hundreds of thousands of MongoDB clusters, processes a billion metrics per day, and replicates tens of billions of database writes to our backup service
Design, implement, and troubleshoot the automation and monitoring of services that seamlessly spans the globe - including several cloud providers
Become an expert in infrastructure performance, helping us optimize from the application level all the way through the firmware
Build for resilience. Our goal is that nobody’s pager goes off, ever. Are we there yet? No. Are we really close? Very. While we work on that - participate in a weekly on-call rotation
Improve our infrastructure capabilities, optimizing for cost, simplicity, and maintainability

Requirements

3+ years of experience running a mission critical service at scale in a Linux environment
Firm grasp of at least one modern programming language, beyond basic scripting
Familiarity with web and network protocols and standards (HTTP, TLS, DNS, etc)
Bachelor’s degree in Computer Science or equivalent experience
Experience writing automation tools & eagerness to "automate all the things"

Nice-to-haves

Experience building large applications from scratch, complete with CI/CD infrastructure
Experience in networking, security, hardware or OS performance tuning
Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure)
Experience managing kubernetes clusters or some other container orchestration infrastructure
Experience with observability of large scale distributed systems

Benefits

fertility assistance
generous parental leave policy
employee affinity groups

Apply tot his job Apply To this Job

Apply

Site Reliability Engineer 3

Responsibilities

Requirements

Benefits

Related roles

Sr. Site Reliability Engineer

Senior SRE - INTL MX

Junior Site Reliability Engineer

SRE (Kubernetes)

Principal SRE

Site Reliability Engineer (SRE) – II

Site Reliability Engineer III

Intermediate Site Reliability Engineer, Environment Automation

Lead Site Reliability Engineer (GCP & Hybrid Cloud) Hybrid

Senior Infrastructure Engineer/SRE

Experienced Data Entry Specialist – Entry-Level Opportunity with arenaflex

Experienced Full Stack Customer Service Representative – Remote Work Opportunity at arenaflex

Manager, Clinical Health Services, UM Prior Authorization - Aetna Medicaid

Experienced Part-Time Remote Focus Group Panelist – National & Local Paid Studies

Assistant Attorney General - Criminal Justice Division's Financial Crimes Team, Seattle

Experienced Remote Data Entry Analyst – Unlock Endless Opportunities with arenaflex

Dental Billing Specialist; ENG

Experienced Entry-level Data Entry Associate – Remote Opportunity with arenaflex

Experienced Part-Time and Full-Time Remote Data Entry Operator – High Accuracy and Confidentiality Required

Experienced Customer Success Manager – Global Business Growth and Revenue Acceleration