Développeur principal en IA appliquée, Infrastructure des modèles de base

CanadaFull-timePosted Jul 4, 2026

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Développeur principal en IA appliquée, Infrastructure des modèles de base based in Canada.

This is a senior technical leadership role at the forefront of large-scale AI and machine learning infrastructure, focused on enabling the full lifecycle of foundation models across training, inference, deployment, and production operations. You will design and evolve resilient, secure, and highly scalable platform services that power machine learning workflows used by researchers, engineers, and product teams. The role combines deep hands-on engineering with strategic influence, requiring you to define technical direction and drive cross-team initiatives in a complex, cloud-native environment. You will work closely with AI researchers, ML engineers, security, and platform teams to translate ambiguous requirements into robust, production-ready systems. A key part of your mission is to improve developer experience through self-service platforms, APIs, and automation. You will also play a central role in advancing AI/ML infrastructure standards and enabling safe, trusted AI at global scale. This is a high-impact position where your work directly shapes how advanced AI systems are built, deployed, and operated.

Accountabilities

Define and drive the technical strategy for foundation model infrastructure within a large-scale machine learning platform.
Design, build, and evolve distributed platform services supporting the full ML lifecycle, including training, inference, evaluation, deployment, monitoring, and operations.
Architect scalable, secure, observable, and cost-efficient infrastructure for enterprise-grade AI and ML workloads.
Develop and maintain developer-facing APIs, self-service tools, and workflows that enable efficient and safe model development and deployment.
Work hands-on with Kubernetes, AWS, Ray, SageMaker, and cloud-native technologies to support distributed compute and production inference systems.
Establish platform standards for reliability, SLAs/SLOs, observability, incident response, versioning, governance, and production readiness.
Lead cross-functional technical initiatives, align stakeholders, and influence architecture decisions without direct managerial authority.
Drive continuous improvements in performance, scalability, security, cost efficiency, and developer productivity across ML systems.
Lead root-cause analysis of production issues and implement durable, platform-level fixes.
Mentor senior engineers and promote engineering excellence, ownership, and accountability across teams.
Collaborate with research, product, security, and platform teams to enable trusted, responsible AI deployment practices.

Requirements

8+ years of software engineering experience, including significant work on large-scale distributed systems, cloud platforms, or ML infrastructure.
Strong hands-on expertise with Kubernetes and cloud-native architectures (AWS preferred).
Experience building and operating production-grade ML systems supporting training, inference, deployment, or observability.
Deep understanding of distributed computing and ML tooling such as Ray, SageMaker, or equivalent platforms.
Proven ability to lead complex, cross-team technical initiatives and influence architecture without formal authority.
Experience translating ambiguous research, product, or business needs into concrete engineering designs.
Strong background in CI/CD, infrastructure as code, automated testing, monitoring, and production operations.
Experience mentoring engineers and raising engineering standards across teams.
Strong communication skills, with the ability to engage both technical and non-technical stakeholders.
Familiarity with AI-assisted development tools and modern engineering productivity workflows is an asset.
Experience with ML governance, model versioning, observability, and responsible AI practices is highly valued.
Degree in Computer Science, Engineering, Machine Learning, or equivalent practical experience.

Benefits

Competitive base salary aligned with experience and market benchmarks
Annual bonus opportunities and potential equity/stock grants
Comprehensive health, dental, and vision insurance coverage
Retirement savings plans and financial wellbeing programs
Flexible work arrangements, including remote or hybrid options
Paid time off, holidays, and wellness programs
Learning and development support, including training and certifications
Opportunity to work on cutting-edge foundation model and AI infrastructure systems at global scale

How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Why Apply Through Jobgether? Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1