As a Senior GPU & AI Infra Expert focusing on Cloud Service Providers (CSPs) in China, you will be a core technical pillar in NVIDIA’s CSP SA team, responsible for driving GPU/AI Infra technical strategy, system-level solution optimization, and high-value customer engagement. You will work closely with major Chinese CSPs to address their critical demands on large-scale AI training/inference, Agentic AI, gaming AI, and distributed computing infrastructure. You will accelerate the mass deployment and performance maximization of NVIDIA GPU software/hardware stacks, and bridge technical gaps between CSP workload iteration and NVIDIA global engineering roadmap. This role requires deep expertise in GPU architecture, AI system optimization, cluster networking, and open-source AI infra contributions, with strong capability to deliver high-value technical outcomes for hyperscale data center workloads.
What you'll be doing:
Partner with Sales, BD and CPM teams to land NVIDIA GPU and AI Infra technologies into top-tier Chinese CSP accounts, drive technical penetration and sustainable business growth.
Serve as the primary technical authority for NVIDIA GPU system and AI infrastructure solutions for Chinese CSPs, providing end-to-end consultation on GPU cluster architecture design, AI workload deployment, heterogeneous computing tuning, and full-stack software stack optimization.
Unlock Vera CPU + GPU co-optimization value for RL training and Agentic AI workloads, eliminate CPU-GPU data movement bottlenecks, optimize end-to-end agent training and reasoning pipeline latency and throughput for CSP AI factory scenarios.
Lead open-source system architecture contributions for NVIDIA AI infra stacks, upstream optimized patches for key open-source projects, build China-localized best practices and shape industry technical standards.
Conduct in-depth GPU workload bottleneck analysis, implement system-level, kernel-level and framework-level tuning for AI training, inference, RL and gaming workloads, deliver production-ready reference designs and tuning guidelines for CSP mass deployment.
Act as the key technical liaison between Chinese CSP customers and NVIDIA global engineering, product and R&D teams, collect high-value local workload requirements, drive product roadmap iteration, and ensure full compliance with NVIDIA global technical policies and export compliance rules.
Lead technical workshops, hands-on training, PoC and production pilot projects for key CSP accounts, quantify and demonstrate GPU/AI Infra business value, accelerate technology adoption and large-scale replication.
Monitor cutting-edge industry trends including Agentic AI, LLM inference optimization, cloud gaming AI, and next-gen data center system architectures, output strategic technical insights to support team and product strategy formulation.
Mentor junior SA team members, standardize CSP technical engagement and solution delivery processes, and drive the precipitation of high-value technical best practices.
What we need to see:
Bachelor’s/Master’s/PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field; equivalent industry experience is highly valued.
8+ years of hands-on experience in GPU architecture, AI system optimization, large-scale data center infrastructure, or hyperscale cloud computing, with solid experience in AI training/inference, distributed computing or HPC workloads.
Deep understanding of GPU microarchitecture, CUDA programming model, GPU memory hierarchy and system scheduling mechanisms; proficient in performance profiling, bottleneck analysis and end-to-end AI workload tuning.
Strong programming proficiency in C/C++ and Python; familiar with CUDA kernels, compiler toolchains, AI framework optimization (PyTorch/TensorRT) and large-scale distributed system tuning.
Proven hands-on experience working with major Chinese CSPs or global hyperscalers, with in-depth knowledge of their public cloud AI service architectures, cluster operation mechanisms and core workload characteristics.
Excellent technical communication and presentation skills, capable of explaining complex GPU system and AI infra technologies to technical engineers, architecture teams and business stakeholders.
Strong cross-functional collaboration capability, able to work efficiently in a global matrix team and prioritize multiple high-value technical projects under fast-paced business demands.
Familiar with NVIDIA full-stack products (GPU data center hardware, TensorRT-LLM, Dynamo, NCCL, CUDA software stack) is a significant plus.
Hands-on engineering capability is mandatory; candidate must be result-oriented, self-driven and able to independently own end-to-end technical project delivery.
Committed, proactive, and capable of sustaining high-quality technical output for long-term strategic CSP projects.
Ways to stand out from the crowd:
Hands-on experience with Vera/Grace CPU + GPU heterogeneous co-optimization, familiar with AI agent and RL training system tuning.
In-depth experience in Dynamo LLM inference optimization, including KV Cache management, intelligent scheduling planner and dynamic resource scaling.
Open-source contribution experience in AI infra, GPU optimization libraries, or distributed computing frameworks with public upstream records.
Solid experience in Agentic AI, RL post-training or long-context LLM workload optimization on GPU clusters.
Familiar with semiconductor and data center technology export compliance requirements in China market.
Proven track record of independently leading CSP technical PoC, pilot verification and large-scale production deployment projects with measurable business outcomes.
With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you.