All roles

Software Engineer, Site Reliability

Remote · USA Full-time New today

fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products. As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on. You are a seasoned SRE who keeps production infrastructure running at scale. You own the reliability and availability of customer-facing systems — from Kubernetes clusters to deployment pipelines to the networking layer that connects it all. You think in SLOs, automate ruthlessly, and treat every incident as a chance to make the system better.

Key Responsibilities

Own and operate our Kubernetes infrastructure: cluster lifecycle, upgrades, networking, and multi-tenant isolation for customer workloads Build and maintain CI/CD pipelines and deployment infrastructure Leverage AI to an extreme level to automate analysis and resolution of production issues, and improve software development speed, reliability and maintainability Build dashboards, alerting, and anomaly detection across our systems Define and enforce SLOs and build out incident response processes Manage and improve our networking, load balancing, and service mesh configurations Drive reliability improvements across the stack through automation, runbooks, and chaos engineering Requirements 5+ years experience in managing critical production systems and software development workflows Strong production experience setting up and operating Kubernetes at scale, using infrastructure-as-code (Terraform, Ansible) Deep knowledge of Linux networking, container networking (CNI plugins, VXLAN, BGP), and DNS Experience building CI/CD systems and GitOps workflows (FluxCD, ArgoCD) Proficiency in Python and either Go or Bash for tooling and automation Strong experience with logging, monitoring and alerting (Prometheus, Grafana, Loki, Thanos, VictoriaMetrics, Datadog) Excellent communication and ability to drive technical decisions across teams Self-starter who executes quickly, takes ownership, and constantly seeks improvement Nice to have Experience with managing GPU and AI/ML workloads Experience with kernel-based monitoring and routing (eBPF, XDP) Experience with security tooling (Falco, Coroot, SIEM) Experience with bare metal Kubernetes networking (Calico, Cilium, MetalLB) Experience with distributed storage systems (Ceph, Longhorn, etc.) Location Turkey What we offer at fal Interesting and challenging work A lot of learning and growth opportunities Regular team events and offsites Apply To This Job

Related roles

Business Development Lead, Care Partnerships

Remote · USA Full-time

Channel Manager - SEM

Remote · USA Full-time

New Business Sales Executive - Remote - £29k OTE £50k + per annum

Remote · USA Full-time

Area VP, Solution Consulting - Canada

Remote · USA Full-time

Transportation Insurance Producer

Remote · USA Full-time

Senior Manager, Clinical Quality and Documentation Integrity

Remote · USA Full-time

Lead AI Engineer

Remote · USA Full-time

Healthcare Advocate

Remote · USA Full-time

Product Reliability Engineer | EU

Remote · USA Full-time

Oncology Social Worker (11:30 AM - 8:00 PM EST)

Remote · USA Full-time

Account Executive - France (Retail segment)

Remote · USA Full-time

Experienced Full Stack Data Scientist – Ads Data Solutions Research at arenaflex

Remote · USA Full-time

Virtual Patient Observation Tech - LBJ Hospital- Part Time - 1:00PM-9:30PM

Remote · USA Full-time

Experienced Lead Customer Service Representative – Retail Store Management and Customer Experience

Remote · USA Full-time

Data Platform Architect

Remote · USA Full-time

Product Designer (Design Systems)

Remote · USA Full-time

Experienced Pharmacist - Data Entry - DPS for blithequark Remote Work From Home Opportunity

Remote · USA Full-time

Experienced Remote Email/Chat Support Clerk Assistant – Customer Service Representative

Remote · USA Full-time

Experienced Full-Time Walmart Customer Service Representative – Work From Home $24 Hour – Part Time Opportunity with Competitive Compensation and Flexible Schedule

Remote · USA Full-time

Remote Data Entry & Inside Sales Associate – College Student Friendly, Part‑Time & Full‑Time Opportunities at arenaflex

Remote · USA Full-time