Principal Applied AI Developer, Foundation Models Infrastructure

Jobgether·Lever
CanadaFull-timeCA$153k–CA$224kPosted Jul 5, 2026
Apply

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Principal Applied AI Developer, Foundation Models Infrastructure based in Canada.

This is a senior technical leadership role focused on building and scaling the core infrastructure that powers foundation model and machine learning workflows at enterprise scale. You will design and evolve cloud-native platforms that support the full lifecycle of ML systems, including training, inference, evaluation, deployment, and monitoring. The role combines deep hands-on engineering with strategic technical leadership, requiring you to influence architecture decisions across multiple teams. You will operate at the intersection of AI research, platform engineering, and product development, translating ambiguous requirements into scalable, production-ready systems. Working in a highly collaborative environment, you will partner with researchers, ML engineers, product managers, and platform specialists. Your work will directly shape developer experience, system reliability, and the scalability of AI capabilities across global products. This is a high-impact role for someone passionate about large-scale distributed systems and applied AI infrastructure.

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Principal Applied AI Developer, Foundation Models Infrastructure based in Canada.

This is a senior technical leadership role focused on building and scaling the core infrastructure that powers foundation model and machine learning workflows at enterprise scale. You will design and evolve cloud-native platforms that support the full lifecycle of ML systems, including training, inference, evaluation, deployment, and monitoring. The role combines deep hands-on engineering with strategic technical leadership, requiring you to influence architecture decisions across multiple teams. You will operate at the intersection of AI research, platform engineering, and product development, translating ambiguous requirements into scalable, production-ready systems. Working in a highly collaborative environment, you will partner with researchers, ML engineers, product managers, and platform specialists. Your work will directly shape developer experience, system reliability, and the scalability of AI capabilities across global products. This is a high-impact role for someone passionate about large-scale distributed systems and applied AI infrastructure.

Accountabilities:

    In this role, you will define and build the foundational infrastructure that enables scalable AI and ML development across the organization:

    • Define and drive technical strategy for foundation model and ML infrastructure across the platform ecosystem
    • Architect and build large-scale, resilient, secure, and cost-efficient systems supporting ML lifecycle stages (training, inference, deployment, evaluation, monitoring)
    • Develop and evolve developer-facing APIs, self-service tools, and workflows to accelerate ML development and deployment
    • Work hands-on with cloud-native technologies such as Kubernetes, AWS, Ray, and SageMaker to support distributed compute workloads
    • Translate ambiguous research, product, and business requirements into clear technical designs and executable engineering plans
    • Lead cross-team technical initiatives, aligning stakeholders and influencing architectural direction without formal authority
    • Establish and maintain platform standards for reliability, observability, SLAs/SLOs, versioning, governance, and production readiness
    • Improve system performance, scalability, cost efficiency, and operational reliability across ML workloads
    • Drive CI/CD, infrastructure-as-code, automated testing, and developer productivity improvements
    • Lead root-cause analysis of production issues and implement long-term platform-level solutions
    • Mentor senior engineers and elevate engineering excellence across teams
    • Collaborate with security, privacy, research, and product teams to ensure safe and responsible AI deployment practices
    • Requirements:

      To succeed in this role, you bring deep expertise in large-scale systems, AI infrastructure, and technical leadership:

      • 8+ years of software engineering experience, including strong exposure to distributed systems, cloud infrastructure, or ML platforms
      • Advanced experience designing and operating production-grade ML or AI infrastructure (training, inference, deployment, observability)
      • Strong hands-on expertise with Kubernetes and cloud-native architectures
      • Experience with distributed computing frameworks such as Ray, SageMaker, or equivalent ML infrastructure tools
      • Proven ability to lead complex, cross-functional technical initiatives and influence architecture decisions without direct authority
      • Strong background in CI/CD pipelines, infrastructure-as-code, automated testing, and production system operations
      • Experience translating ambiguous technical or business requirements into scalable system designs
      • Deep understanding of system design trade-offs across performance, reliability, cost, and scalability
      • Experience mentoring senior engineers and driving engineering best practices across teams
      • Excellent communication skills with the ability to engage both technical and non-technical stakeholders
      • Bonus: experience with AI-assisted development tools, agentic workflows, and modern AI engineering patterns
      • Bonus: familiarity with model governance, responsible AI practices, and ML lifecycle management at scale
      • Benefits:

        • Competitive annual base salary range: $153,000 – $224,400 CAD
        • Performance-based annual bonus and potential equity or stock grants
        • Comprehensive health, dental, and vision coverage
        • Hybrid or flexible work arrangements depending on location and team needs
        • Strong focus on innovation, learning, and professional development
        • Opportunity to work on large-scale foundation model infrastructure and cutting-edge AI systems
        • Inclusive, global engineering culture focused on collaboration and technical excellence
        • Access to mentorship, leadership development, and internal mobility opportunities
        • Exposure to high-impact projects shaping AI-driven products used worldwide
How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best!  Why Apply Through Jobgether?    Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.     #LI-CL1

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free