[Remote] DevOps Engineer - Atlanta, GA, Birmingham, AL, Louisville, KY, Richmond, VA, Charlotte, NC

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Dice is seeking an experienced Site Reliability Engineer (SRE) / DevOps Engineer with expertise in Incident Management and cloud-native platforms. The role involves ensuring the reliability and performance of distributed systems, managing incident responses, and implementing automation and governance strategies.

Responsibilities

Manage and improve platform reliability, availability, and performance across production environments
Lead and participate in incident management, root cause analysis, remediation planning, and post-incident reviews
Drive change control processes and ensure operational governance standards are followed
Monitor and manage error budgets while implementing reliability improvements
Design, build, and maintain scalable cloud infrastructure and automation frameworks
Deploy and manage containerized applications using Kubernetes and Docker
Develop and maintain CI/CD pipelines to support efficient software delivery
Implement Infrastructure as Code (IaC) solutions for automated provisioning and configuration management
Establish observability strategies using monitoring, logging, and alerting platforms
Collaborate with development, infrastructure, security, and business teams to ensure platform stability
Troubleshoot complex production issues across cloud, networking, infrastructure, and application layers
Continuously improve operational processes, automation, and system resilience

Skills

7+ years of experience in Site Reliability Engineering (SRE), DevOps, Cloud Infrastructure, or Production Operations
Strong experience managing workloads in cloud environments: Microsoft Azure, Amazon Web Services (AWS), Google Cloud Platform (Google Cloud Platform)
Hands-on experience with: Kubernetes, Docker, CI/CD Pipelines, Infrastructure as Code (IaC)
Strong scripting and automation expertise using: Python, Bash, PowerShell, Go (Golang)
Experience with observability and monitoring platforms: Datadog, Grafana, Prometheus, Splunk
Strong understanding of: Networking concepts, Linux Administration, Windows Administration, Distributed Systems, Cloud-Native Architectures
Experience with: Incident Response, Production Troubleshooting, Operational Governance
Experience implementing reliability engineering best practices and SRE methodologies
Experience supporting large-scale enterprise production environments
Familiarity with high-availability and disaster recovery architectures
Experience automating operational workflows and infrastructure management
Knowledge of security best practices within cloud environments
Experience working in Agile and DevOps-driven organizations

Company Overview

Dice is a job-searching platform for technology professionals. It is a sub-organization of DHI Group. It was founded in 1990, and is headquartered in Santa Clara, California, USA, with a workforce of 201-500 employees. Its website is http://www.dice.com.

Apply To This Job

Apply

[Remote] DevOps Engineer - Atlanta, GA, Birmingham, AL, Louisville, KY, Richmond, VA, Charlotte, NC

Related roles

[Remote] Lead Certified CMMC Assessor (CCA) Consultant | Remote | 1 week Consulting Project

[Remote] Senior Accountant (CPA Preferred) – Workday Financials

[Remote] Marketing Manager (India)

[Remote] B2B SaaS Sales Recruiter (SDR/AE/AM), Remote, Contract-JB-E

[Remote] Senior Procurement Analyst

[Remote] Technical Business Analyst (Remote-US)

[Remote] Account Executive - Large Enterprise Switzerland

[Remote] Databricks Data Security/ Governance Engineer

[Remote] Program Manager - Benefits

[Remote] Machine Learning Engineer II - Autonomous Driving Performance Evaluation

Senior Software Engineer, Windows/Desktop Applications - Prague, Czech Republic

Minnesota Housing Resource Navigator

Legal Counsel - Environmental and Remediation

Experienced Data Entry Clerk – Hybrid Work Environment at arenaflex

IT Project Manager (Remote Available)

[Remote] Full Stack Engineer

AI for QA Teaching Expert Part-time

Project Manager/Scrum Master (Remote Opportunity)

Software Engineer

Regional Perishable Sales Executive