This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior SRE / Cloud Engineer based in Brazil.
This role sits at the core of a modern, AI-driven cloud ecosystem, responsible for ensuring the reliability, scalability, and performance of infrastructure supporting advanced intelligent applications. You will work in highly distributed environments built on Oracle Cloud Infrastructure, contributing directly to the stability of production systems powering AI agents and data-intensive workloads. The position blends platform engineering and Site Reliability Engineering practices, with a strong focus on automation, observability, and operational excellence. You will collaborate with cross-functional teams to design resilient architectures and continuously improve system efficiency. This is a high-impact role where infrastructure decisions directly influence product performance, user experience, and cost optimization. The environment is fast-paced, innovation-oriented, and deeply integrated with cloud-native and AI technologies.
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior SRE / Cloud Engineer based in Brazil.
This role sits at the core of a modern, AI-driven cloud ecosystem, responsible for ensuring the reliability, scalability, and performance of infrastructure supporting advanced intelligent applications. You will work in highly distributed environments built on Oracle Cloud Infrastructure, contributing directly to the stability of production systems powering AI agents and data-intensive workloads. The position blends platform engineering and Site Reliability Engineering practices, with a strong focus on automation, observability, and operational excellence. You will collaborate with cross-functional teams to design resilient architectures and continuously improve system efficiency. This is a high-impact role where infrastructure decisions directly influence product performance, user experience, and cost optimization. The environment is fast-paced, innovation-oriented, and deeply integrated with cloud-native and AI technologies.
Accountabilities:
- Design, implement, and evolve cloud infrastructure on Oracle Cloud Infrastructure (OCI) to support AI-driven applications and services
- Manage and optimize Kubernetes environments (preferably Oracle Kubernetes Engine - OKE), ensuring high availability and scalability
- Build and maintain Infrastructure as Code (IaC) using Terraform, ensuring consistency and automation across environments
- Implement observability solutions, including monitoring, logging, tracing, and alerting for distributed systems
- Manage cloud networking components such as VCNs, security rules, and service gateways
- Drive automation of operational processes, including deployments, configuration, and secrets management
- Define and improve SLOs/SLIs, ensuring system reliability and performance targets are met
- Contribute to FinOps initiatives, optimizing infrastructure usage and controlling cloud costs
- Solid experience with Oracle Cloud Infrastructure (OCI) in production environments
- Strong hands-on experience with Kubernetes, preferably Oracle Kubernetes Engine (OKE)
- Proficiency in Terraform and Infrastructure as Code (IaC) best practices
- Experience with cloud networking concepts such as VCN, NSG, and service gateways
- Strong knowledge of observability tools (OCI Monitoring, Logging, APM or equivalents)
- Experience applying SRE principles, including reliability engineering, incident response, and SLO/SLI definition
- Background in CI/CD pipelines, automation, and distributed system operations
- Strong analytical thinking, problem-solving skills, and collaborative mindset
- Experience with FinOps practices in cloud or AI-intensive environments
- Knowledge of service mesh technologies and secure service-to-service communication (mTLS)
- Experience with streaming platforms such as Kafka or OCI Streaming
- Exposure to GPU-based inference workloads (e.g., vLLM, NVIDIA Triton)
- Familiarity with Generative AI, LLMs, or agent-based architectures
- Comprehensive health and dental insurance
- Meal and food allowances
- Childcare assistance
- Extended parental leave
- Wellness partnerships and access to fitness platforms
- Profit-sharing program (PLR)
- Life insurance coverage
- Continuous learning and upskilling platforms
- Discount club and partner perks
- Mental, physical, and wellbeing support platforms
- Parenting and maternity education programs
- Language learning and online course partnerships
- Additional corporate benefits and development resources.
In this role, you will be responsible for designing, operating, and continuously improving cloud infrastructure and reliability practices.
Requirements
We are looking for a professional with strong cloud engineering and SRE expertise, particularly in OCI and Kubernetes environments.
Nice to have: