This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Software Engineer – AI Infrastructure based in Brazil.
This role sits at the core of a high-scale AI systems environment, focused on building and operating the infrastructure that powers production-grade intelligent agents. You will design and implement robust backend systems that support model inference, orchestration, and execution at massive scale. The work is deeply technical and performance-driven, requiring strong systems thinking and a passion for reliability under heavy load. You will help define the foundations of AI infrastructure used by millions of users globally. The environment is fast-paced, production-oriented, and highly collaborative across infrastructure and applied AI teams. This is an opportunity to shape the primitives that make safe, scalable agent systems possible in real-world applications.
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Software Engineer – AI Infrastructure based in Brazil.
This role sits at the core of a high-scale AI systems environment, focused on building and operating the infrastructure that powers production-grade intelligent agents. You will design and implement robust backend systems that support model inference, orchestration, and execution at massive scale. The work is deeply technical and performance-driven, requiring strong systems thinking and a passion for reliability under heavy load. You will help define the foundations of AI infrastructure used by millions of users globally. The environment is fast-paced, production-oriented, and highly collaborative across infrastructure and applied AI teams. This is an opportunity to shape the primitives that make safe, scalable agent systems possible in real-world applications.
Accountabilities:
- Design and build the core infrastructure layer powering AI agent systems in production environments.
- Develop and maintain high-performance backend services (primarily in Rust) for inference, orchestration, and execution workloads.
- Architect distributed systems capable of handling high throughput, low latency, and global-scale traffic.
- Build and improve ML infrastructure, including model deployment, monitoring, evaluation, and lifecycle management.
- Implement observability, reliability, and failure recovery mechanisms for critical agent-driven workflows.
- Optimize system performance across latency, cost, and scalability constraints.
- Collaborate closely with applied AI and infrastructure teams to productionize experimental systems.
- Contribute to architectural decisions shaping long-term platform evolution.
- 5+ years of experience building and operating large-scale production systems.
- Strong expertise in Rust or similar systems programming languages.
- Deep understanding of distributed systems, reliability engineering, and performance optimization.
- Proven experience supporting high-throughput or large user-base production environments.
- Hands-on experience with ML infrastructure, model serving, or MLOps in production.
- Strong knowledge of observability tools, monitoring practices, and incident/failure management.
- Experience working cross-functionally with infrastructure and applied engineering teams.
- Strong ownership mindset and ability to operate in high-stakes production environments.
- Nice to have: experience with LLM/agent systems, cloud-native infrastructure, async systems, or evaluation frameworks.
- Competitive compensation package aligned with global tech market standards.
- Fully remote work with flexible arrangements.
- Opportunity to work on cutting-edge AI infrastructure at global scale.
- High-impact role with strong ownership over core production systems.
- Collaborative, engineering-driven culture focused on technical excellence.
- Learning and growth opportunities in AI systems, distributed systems, and platform engineering.
- Inclusive and diverse work environment.