This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a AI DevOps & Reliability Engineer based in Canada.
This is a high-impact engineering role at the intersection of platform reliability, DevOps automation, and AI-driven operations. You will be responsible for shaping how software is built, deployed, and operated across a large-scale, high-traffic SaaS environment. The role combines central platform ownership with hands-on embedding within engineering teams to improve delivery speed, system resilience, and operational maturity. You will design and evolve CI/CD pipelines, deployment automation, and infrastructure standards that enable safe, continuous releases. A key focus is driving the adoption of AI into DevOps workflows, including incident response, observability, and runbook automation. This is a 0-to-1 environment where you will influence architecture, culture, and engineering practices at scale.
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a AI DevOps & Reliability Engineer based in Canada.
This is a high-impact engineering role at the intersection of platform reliability, DevOps automation, and AI-driven operations. You will be responsible for shaping how software is built, deployed, and operated across a large-scale, high-traffic SaaS environment. The role combines central platform ownership with hands-on embedding within engineering teams to improve delivery speed, system resilience, and operational maturity. You will design and evolve CI/CD pipelines, deployment automation, and infrastructure standards that enable safe, continuous releases. A key focus is driving the adoption of AI into DevOps workflows, including incident response, observability, and runbook automation. This is a 0-to-1 environment where you will influence architecture, culture, and engineering practices at scale.
Accountabilities:
- Design, build, and evolve CI/CD pipelines, deployment automation, and release frameworks that enable continuous and on-demand production delivery
- Define and enforce engineering standards for progressive delivery, rollback strategies, quality gates, and deployment safety mechanisms
- Build and manage self-service environments (dev, staging, and ephemeral) that replicate production and accelerate development cycles
- Drive AI-augmented DevOps practices, including automated runbooks, intelligent alerting, and AI-assisted incident response workflows
- Champion Infrastructure as Code and GitOps practices to ensure scalable, repeatable, and secure infrastructure and deployments
- Own operational reliability practices including observability, incident response, SLO/SLI definition, and on-call readiness
- Partner directly with engineering teams in an embedded model to improve delivery maturity and operational excellence
- Track and improve engineering performance using DORA metrics and other reliability indicators
- 7+ years of experience in DevOps, platform engineering, SRE, or infrastructure-focused roles in high-scale environments
- Strong hands-on experience with Kubernetes and AWS in production systems
- Deep expertise in Infrastructure as Code tools such as Terraform and/or CloudFormation
- Proven experience designing and operating CI/CD pipelines with strong governance, automation, and quality controls
- Experience implementing GitOps workflows using tools such as Argo CD or Flux
- Hands-on experience operating high-scale systems including Kafka and distributed data infrastructure
- Strong software engineering and automation skills using Python, Bash, or similar languages
- Experience with observability tooling such as Prometheus, Grafana, PagerDuty, and related monitoring stacks
- Practical experience with incident management, on-call rotations, and reliability engineering best practices
- Demonstrated experience integrating AI tools or agentic workflows into DevOps or SRE processes
- Strong communication skills with the ability to influence, mentor, and collaborate across engineering teams
- Competitive base salary with performance-based annual bonus
- Equity opportunities for eligible roles
- Fully remote work within Canada
- Comprehensive health, dental, and vision coverage
- Generous paid time off and flexible work arrangements
- Learning and development support, including courses and training programs
- Parental leave and family support benefits
- Opportunity to work on high-impact systems in a fast-scaling engineering environment
- Strong culture of ownership, autonomy, and technical excellence
In this role, you will own the end-to-end delivery and reliability ecosystem, building platforms and practices that enable fast, safe, and scalable software delivery across engineering teams.
Requirements:
The ideal candidate brings deep DevOps and platform engineering expertise, combined with strong hands-on experience in modern infrastructure and AI-enabled operations.