Principal Applied AI Developer, Foundation Models Infrastructure

CanadaFull-timeCA$153k–CA$224kPosted Jul 5, 2026

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Principal Applied AI Developer, Foundation Models Infrastructure based in Canada.

This is a senior technical leadership role focused on building and scaling the core infrastructure that powers foundation model and machine learning workflows at enterprise scale. You will design and evolve cloud-native platforms that support the full lifecycle of ML systems, including training, inference, evaluation, deployment, and monitoring. The role combines deep hands-on engineering with strategic technical leadership, requiring you to influence architecture decisions across multiple teams. You will operate at the intersection of AI research, platform engineering, and product development, translating ambiguous requirements into scalable, production-ready systems. Working in a highly collaborative environment, you will partner with researchers, ML engineers, product managers, and platform specialists. Your work will directly shape developer experience, system reliability, and the scalability of AI capabilities across global products. This is a high-impact role for someone passionate about large-scale distributed systems and applied AI infrastructure.

Accountabilities:

In this role, you will define and build the foundational infrastructure that enables scalable AI and ML development across the organization:

Define and drive technical strategy for foundation model and ML infrastructure across the platform ecosystem
Architect and build large-scale, resilient, secure, and cost-efficient systems supporting ML lifecycle stages (training, inference, deployment, evaluation, monitoring)
Develop and evolve developer-facing APIs, self-service tools, and workflows to accelerate ML development and deployment
Work hands-on with cloud-native technologies such as Kubernetes, AWS, Ray, and SageMaker to support distributed compute workloads
Translate ambiguous research, product, and business requirements into clear technical designs and executable engineering plans
Lead cross-team technical initiatives, aligning stakeholders and influencing architectural direction without formal authority
Establish and maintain platform standards for reliability, observability, SLAs/SLOs, versioning, governance, and production readiness
Improve system performance, scalability, cost efficiency, and operational reliability across ML workloads
Drive CI/CD, infrastructure-as-code, automated testing, and developer productivity improvements
Lead root-cause analysis of production issues and implement long-term platform-level solutions
Mentor senior engineers and elevate engineering excellence across teams
Collaborate with security, privacy, research, and product teams to ensure safe and responsible AI deployment practices

Requirements:

To succeed in this role, you bring deep expertise in large-scale systems, AI infrastructure, and technical leadership:

8+ years of software engineering experience, including strong exposure to distributed systems, cloud infrastructure, or ML platforms
Advanced experience designing and operating production-grade ML or AI infrastructure (training, inference, deployment, observability)
Strong hands-on expertise with Kubernetes and cloud-native architectures
Experience with distributed computing frameworks such as Ray, SageMaker, or equivalent ML infrastructure tools
Proven ability to lead complex, cross-functional technical initiatives and influence architecture decisions without direct authority
Strong background in CI/CD pipelines, infrastructure-as-code, automated testing, and production system operations
Experience translating ambiguous technical or business requirements into scalable system designs
Deep understanding of system design trade-offs across performance, reliability, cost, and scalability
Experience mentoring senior engineers and driving engineering best practices across teams
Excellent communication skills with the ability to engage both technical and non-technical stakeholders
Bonus: experience with AI-assisted development tools, agentic workflows, and modern AI engineering patterns
Bonus: familiarity with model governance, responsible AI practices, and ML lifecycle management at scale

Benefits:

Competitive annual base salary range: $153,000 – $224,400 CAD
Performance-based annual bonus and potential equity or stock grants
Comprehensive health, dental, and vision coverage
Hybrid or flexible work arrangements depending on location and team needs
Strong focus on innovation, learning, and professional development
Opportunity to work on large-scale foundation model infrastructure and cutting-edge AI systems
Inclusive, global engineering culture focused on collaboration and technical excellence
Access to mentorship, leadership development, and internal mobility opportunities
Exposure to high-impact projects shaping AI-driven products used worldwide

How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Why Apply Through Jobgether? Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1