Sr. Principal Software Scientist

United StatesFull-time$185k–$280kPosted Jul 4, 2026

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr. Principal Software Scientist based in the United States.

This is a senior technical leadership role focused on advancing the state of generative AI and large-scale foundation models for real-world applications in mobility and intelligent systems. You will be responsible for designing, training, and scaling next-generation transformer-based architectures, while shaping the core technical direction of advanced AI systems. The role sits at the intersection of deep research and production-scale engineering, requiring both theoretical depth and hands-on execution. You will work on complex challenges such as training stability, scaling laws, distributed training, and multimodal model design. Operating in a fast-paced, research-driven environment, you will collaborate with global ML systems and engineering teams to push the boundaries of model performance and efficiency. This position offers significant ownership over model architecture decisions and the opportunity to define foundational AI capabilities used at global scale. It is ideal for experts who thrive in ambiguity and want to build cutting-edge AI systems from first principles.

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Sr. Principal Software Scientist based in the United States.

Accountabilities

Lead the design and development of large-scale transformer and hybrid foundation models, defining architecture choices across text, multimodal, and emerging generative AI paradigms. You will own key decisions that shape next-generation model capabilities.
Build and train large models from first principles, focusing on architecture innovation rather than incremental adaptation of existing codebases, ensuring scalability and robustness at production scale.
Diagnose and resolve training instability issues, including divergence, optimizer failures, and gradient pathologies, ensuring stable and efficient large-scale model training.
Define and evaluate scaling strategies across compute, data, and model size, applying scaling laws to optimize performance and efficiency trade-offs in foundation model development.
Design and experiment with loss functions and alignment strategies, including next-token prediction, contrastive learning, and RLHF/DPO/GRPO approaches, to improve convergence and generalization.
Architect distributed training systems using frameworks such as FSDP, ZeRO-3, tensor and pipeline parallelism, and mixed precision techniques, in collaboration with ML infrastructure teams.
Drive innovation in model architectures including MoE routing, multimodal fusion, and hybrid or state-space approaches, while considering inference efficiency and KV cache optimization.

Requirements

Extensive hands-on experience in deep learning, with strong theoretical grounding in transformer architectures, optimization dynamics, and representation learning.
Proven track record of training large-scale foundation models from scratch, with deep understanding of distributed training systems and scaling challenges.
Strong ability to reason about optimization behavior beyond hyperparameter tuning, including debugging instability and improving convergence at scale.
Deep expertise in transformer internals, attention mechanisms, and advanced architectural techniques such as GQA, RoPE, ALiBi, and MoE.
Strong understanding of scaling laws, compute/data trade-offs, and model efficiency considerations in large AI systems.
Experience working with distributed training frameworks (e.g., FSDP, ZeRO, tensor/pipeline parallelism) and mixed precision techniques (bf16, fp8).
Comfort operating in ambiguous, research-heavy environments where architectural decisions must be explored, validated, and iterated rapidly.

Benefits

Competitive compensation package with an estimated salary range of $185,000 to $280,000 USD, based on experience and qualifications.
Annual bonus eligibility and equity opportunities for eligible roles.
Comprehensive health coverage including medical, dental, vision, life, and disability insurance.
Paid time off and paid holidays to support work-life balance.
Retirement savings plan with employer contributions (e.g., RRSP/401(k) equivalent depending on location).
Remote or hybrid work flexibility depending on role requirements.
Strong focus on learning, research growth, and access to cutting-edge AI development environments.
Opportunity to contribute to high-impact AI systems deployed at global scale in the automotive and mobility sector.

How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Why Apply Through Jobgether? Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1