This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr. Principal Software Scientist based in the United States.
This is a senior technical leadership role focused on advancing the state of generative AI and large-scale foundation models for real-world applications in mobility and intelligent systems. You will be responsible for designing, training, and scaling next-generation transformer-based architectures, while shaping the core technical direction of advanced AI systems. The role sits at the intersection of deep research and production-scale engineering, requiring both theoretical depth and hands-on execution. You will work on complex challenges such as training stability, scaling laws, distributed training, and multimodal model design. Operating in a fast-paced, research-driven environment, you will collaborate with global ML systems and engineering teams to push the boundaries of model performance and efficiency. This position offers significant ownership over model architecture decisions and the opportunity to define foundational AI capabilities used at global scale. It is ideal for experts who thrive in ambiguity and want to build cutting-edge AI systems from first principles.
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr. Principal Software Scientist based in the United States.
This is a senior technical leadership role focused on advancing the state of generative AI and large-scale foundation models for real-world applications in mobility and intelligent systems. You will be responsible for designing, training, and scaling next-generation transformer-based architectures, while shaping the core technical direction of advanced AI systems. The role sits at the intersection of deep research and production-scale engineering, requiring both theoretical depth and hands-on execution. You will work on complex challenges such as training stability, scaling laws, distributed training, and multimodal model design. Operating in a fast-paced, research-driven environment, you will collaborate with global ML systems and engineering teams to push the boundaries of model performance and efficiency. This position offers significant ownership over model architecture decisions and the opportunity to define foundational AI capabilities used at global scale. It is ideal for experts who thrive in ambiguity and want to build cutting-edge AI systems from first principles.
Accountabilities
- Lead the design and development of large-scale transformer and hybrid foundation models, defining architecture choices across text, multimodal, and emerging generative AI paradigms. You will own key decisions that shape next-generation model capabilities.
- Build and train large models from first principles, focusing on architecture innovation rather than incremental adaptation of existing codebases, ensuring scalability and robustness at production scale.
- Diagnose and resolve training instability issues, including divergence, optimizer failures, and gradient pathologies, ensuring stable and efficient large-scale model training.
- Define and evaluate scaling strategies across compute, data, and model size, applying scaling laws to optimize performance and efficiency trade-offs in foundation model development.
- Design and experiment with loss functions and alignment strategies, including next-token prediction, contrastive learning, and RLHF/DPO/GRPO approaches, to improve convergence and generalization.
- Architect distributed training systems using frameworks such as FSDP, ZeRO-3, tensor and pipeline parallelism, and mixed precision techniques, in collaboration with ML infrastructure teams.
- Drive innovation in model architectures including MoE routing, multimodal fusion, and hybrid or state-space approaches, while considering inference efficiency and KV cache optimization.
- Extensive hands-on experience in deep learning, with strong theoretical grounding in transformer architectures, optimization dynamics, and representation learning.
- Proven track record of training large-scale foundation models from scratch, with deep understanding of distributed training systems and scaling challenges.
- Strong ability to reason about optimization behavior beyond hyperparameter tuning, including debugging instability and improving convergence at scale.
- Deep expertise in transformer internals, attention mechanisms, and advanced architectural techniques such as GQA, RoPE, ALiBi, and MoE.
- Strong understanding of scaling laws, compute/data trade-offs, and model efficiency considerations in large AI systems.
- Experience working with distributed training frameworks (e.g., FSDP, ZeRO, tensor/pipeline parallelism) and mixed precision techniques (bf16, fp8).
- Comfort operating in ambiguous, research-heavy environments where architectural decisions must be explored, validated, and iterated rapidly.
- Competitive compensation package with an estimated salary range of $185,000 to $280,000 USD, based on experience and qualifications.
- Annual bonus eligibility and equity opportunities for eligible roles.
- Comprehensive health coverage including medical, dental, vision, life, and disability insurance.
- Paid time off and paid holidays to support work-life balance.
- Retirement savings plan with employer contributions (e.g., RRSP/401(k) equivalent depending on location).
- Remote or hybrid work flexibility depending on role requirements.
- Strong focus on learning, research growth, and access to cutting-edge AI development environments.
- Opportunity to contribute to high-impact AI systems deployed at global scale in the automotive and mobility sector.