[Remote] Senior Software Engineer, Infrastructure AI
Note: The job is a remote job and is open to candidates in USA. Airwallex is a unified payments and financial platform for global businesses, empowering over 200,000 clients with integrated solutions. They are seeking a Senior Software Engineer for their Infrastructure AI team to design and build AI agents that automate infrastructure operations, enhancing the efficiency of SRE, DevOps, and DBA workflows.
Responsibilities
- Build goal-oriented infrastructure AI agents: Design and implement autonomous agents on the Quartermaster platform that handle SRE, DevOps, and DBA workflows — including incident investigation and remediation, infrastructure provisioning, deployment orchestration, database operations, and capacity management
- Design agent architectures for safety and reliability: Build robust agent loops with planning, reasoning, tool-use, error recovery, and human-in-the-loop escalation. Infrastructure agents touch production systems — they must be safe, auditable, and predictable
- Integrate agents with infrastructure tooling: Give agents the ability to interact with real systems — Kubernetes, Terraform, cloud APIs (AWS, GCP, Aliyun), monitoring/observability platforms, databases, and CI/CD pipelines — through well-designed tool interfaces
- Evaluate and improve agent performance: Define metrics for agent effectiveness (task completion rate, accuracy, time-to-resolution, escalation rate). Build evaluation frameworks to measure whether agents reliably achieve their goals
- Collaborate cross-functionally: Partner with SRE, DevOps, and platform engineering teams to identify the highest-value workflows for agent automation. Understand real operational pain points and translate them into agent capabilities
- Shape the future of infrastructure AI at Airwallex: Contribute to the technical vision for how AI agents will transform infrastructure operations. Stay at the forefront of agentic AI developments and bring new ideas into the team
Skills
- 5+ years of professional software engineering experience, with strong proficiency in backend languages such as Python, Go, Java, or Kotlin
- Hands-on experience building agentic AI systems: You have designed and built AI agents that autonomously plan, reason, and execute multi-step tasks using LLMs. This includes agent loop design, tool-use orchestration, and managing agent state and memory. This is a core requirement, not a nice-to-have
- Infrastructure and cloud knowledge: Practical experience with cloud platforms (AWS, GCP, or Aliyun), container orchestration (Kubernetes), infrastructure-as-code (Terraform), and CI/CD systems. You understand the domain the agents will operate in
- Strong system design skills with the ability to architect reliable, safe, and scalable agent systems that interact with production infrastructure
- Excellent communication and collaboration skills, with a track record of working effectively across cross-functional teams in a global environment
- Experience with agent orchestration frameworks (e.g., LangGraph, CrewAI, AutoGen, or custom agent frameworks) and LLM APIs (OpenAI, Anthropic, etc.)
- Experience with observability and monitoring systems (Grafana, Prometheus, Datadog, ELK stack) — either building them or building agents that interact with them
- Experience with database administration or automation (schema migrations, query optimization, backup/recovery) — knowledge that helps build effective DBA agents
- Understanding of SRE practices: incident response, runbooks, postmortems, SLOs/SLIs — the operational domain these agents will automate
- A forward-thinking mindset about where AI agents are headed. You have opinions on agent reliability, safety, evaluation, and the path toward increasingly autonomous infrastructure systems
Company Overview
Company H1B Sponsorship