All roles

Senior Site Reliability Engineer

Remote · USA Full-time New today

Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. As a Senior Site Reliability Engineer you will champion all things pertaining to reliability at Okta for Auth0. Working closely with the Product Engineers, Quality Engineers, Platform Engineers and Architecture teams, your primary focus will be on ensuring production systems remain operational at all times, while continually setting and achieving long-term performance, reliability and scalability goals in a platform with an exponential growth plan for the coming years. With Okta’s increased dedication to ensuring customer availability expectations are exceeded in every way, you will play a key role as we evolve our system architecture to meet the demands of enormous growth and support the hundreds of millions of users who rely on us to provide uninterrupted access to business-critical enterprise and consumer applications. Skills Exceptional communication skills, including technical writing in the English language Systematic problem-solving approach, coupled with a strong sense of ownership and drive Understanding of microservices, cloud infrastructure (AWS, Azure), databases (SQL, No-SQL, Key/Value), containers (docker, kubernetes), web technologies (web sockets, http) and networking (SSL, routing, VPN) Live and breathe SLIs, SLOs, error budgets and SLAs Strong belief in automating everything and reducing toil for yourself and teammates Loves to work as a team, but is able to work effectively in a remote environment where tasks may be self-driven Knowledge of Datadog or other observability platform is desired The role expects the member to handle 24*7 oncall (on a rotational basis) independently

Responsibilities

Working with the other teams to run, own and improve incident response processes Participate in regular on-call rotations to ensure 24/7 coverage of all critical systems Use existing monitoring tools to identify problems and resolve and/or escalate to service teams Implement changes to enable or improve infrastructure resilience, monitoring, and alerting Experience 3+ years as a Site Reliability Engineer or in a Cloud Operations/DevOps role 2+ years using golang, shell scripting and terraform 2+ years as software developer in a SaaS environment 3+ years in a production environment supporting large-scale, mission-critical applications The Okta Experience Supporting Your Well-Being Driving Social Impact Developing Talent and Fostering Connection + Community We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one. Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation. Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice. Apply To This Job

Related roles

Principal Systems Architect

Remote · USA Full-time

Software Developer

Remote · USA Full-time

CRM Manager, Core Banking

Remote · USA Full-time

Social Media Manager ESN & More Nutrition (gn) - France

Remote · USA Full-time

Scrum Master (REF5563J)

Remote · USA Full-time

Software Developer

Remote · USA Full-time

Head of Research, Innovation & Knowledge Exchange

Remote · USA Full-time

Government Travel Consultant

Remote · USA Full-time

Travel Consultant Global 24 Hour Emergency Desk

Remote · USA Full-time

Lead Implementation Consultant, ERP (Apparel)

Remote · USA Full-time

Senior Analytics Engineer (12 Month Contract)

Remote · USA Full-time

Apply Now: Part-Time Remote Apple Product Specialist

Remote · USA Full-time

Remote Virtual Chat Representative for Moms (Part-Time, No Experience)

Remote · USA Full-time

Remote Full-Time and Part-Time Work from Home Data Entry and Typing Jobs for Entry-Level Candidates with Opportunities for Growth and Professional Development

Remote · USA Full-time

Field Service Technician - Cincinnati

Remote · USA Full-time

Experienced Data Entry Coordinator – Remote Part-time Opportunity for Detail-Oriented Professionals in Ecommerce Systems Management

Remote · USA Full-time

Program Operations Lead, Luxe Guest Services

Remote · USA Full-time

Director, Global Logistics

Remote · USA Full-time

Experienced Full Stack Customer Support Representative – Live Chat for arenaflex

Remote · USA Full-time

Remote Customer Service Representative – Client Success, Support & Engagement Specialist for arenaflex

Remote · USA Full-time