[Remote] Senior Site Reliability Engineer (SRE) / Platform Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Mastech Digital is seeking an experienced Site Reliability Engineer (SRE) to lead reliability engineering initiatives for large-scale, mission-critical healthcare platforms. The role involves defining reliability KPIs, driving observability strategies, and leading incident response for enterprise platforms.
Responsibilities
- Define and monitor reliability KPIs, SLIs, and SLOs
- Drive observability and monitoring strategies across distributed systems
- Lead incident response, RCA, and reliability improvements
- Build automation for infrastructure and CI/CD pipelines
- Partner with stakeholders on SLA and service-level management
- Support modernization of enterprise platforms
Skills
- Proven experience implementing SRE frameworks in large enterprise environments
- Strong background supporting complex distributed systems
- Java, Spring Boot
- Azure, GCP, GKE
- Kubernetes, CI/CD, JFrog
- MongoDB, SQL
- AppDynamics, Splunk, Grafana
- Experience with Prometheus is a plus
- Healthcare or PBM platform experience
- Platform Engineering or Reliability Engineering background
Company Overview
Company H1B Sponsorship