Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services.
This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.
Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.
About the role
Cerebras is building the world's fastest AI inference platform. Every day, we serve billions of inference tokens for leading AI companies including Cognition, Mistral, AlphaSense, IFM, Block, and others on the industry's largest AI accelerator systems.
As demand for AI continues to accelerate, intelligent capacity management becomes one of the company's most strategic challenges. Every customer commitment, model launch, and infrastructure investment depends on making the right capacity decisions at the right time.
We're looking for an experienced Technical Program Manager to lead capacity planning and fleet strategy for our Inference Service organization. This is a highly visible role working directly with Engineering, Product, Infrastructure, SRE, Operations, and executive leadership to maximize utilization of one of the world's most advanced AI inference fleets.
What you'll own
Capacity planning and forecasting. Build and maintain the 6 / 12 / 26-week rolling capacity model across every cluster. Work with product team to translate customer contracts and sales pipeline asks into capacity requirements. Forecast model replicas, system-hours, and spares by customer and by model. Reconcile against actuals weekly. Maintain the source-of-truth doc.
New Datacenter Capacity bring-up: Collaborate with datacenter infrastructure and operations teams to support new datacenter bringup and ensure production readiness. Drive engineering efforts and related automation to ensure on-time and quality delivery.
Allocation and cluster placement. Partner closely with the SRE and product team to run the weekly capacity review across different customers/models/clusters. Decide model placement and re-balancing: which customer tenants land where, which clusters absorb new launches, which freezes are in effect etc. Run the weekly capacity and utilization report for the Inference Service leadership. Post capacity allocation, drive downstream tasks w.r.t deploying models across the allocated capacity with SRE team.
Drive capacity planning tool adoption. Partner with console engineering team to drive stakeholder adoption of the inhouse built capacity planning and allocation tool, including user acceptance testing, issue resolution, tracking changes, pilot testing and deployment. In general, Contribute to the continuous process improvement and development of internal capacity management tools.
Incident tracking and postmortems. Proactively identify and mitigate capacity bottlenecks, risks, and dependencies. In case of any SLA drop due to capacity misallocations, drive related resolution and postmortem.
Key Responsibilities
· Run weekly capacity planning and daily capacity and deployment tracking with Engineering, product and operations team. Own fleet utilization reporting and forecasting
· Drive capacity planning for new customer deployments and major model launches
· Drive continuous improvement and stakeholder adoption of new capacity management platform
· Drive org level strategic initiatives related to capacity expansion, improving fleet efficiency and maximizing effective utilization of available systems
· Lead planning around major infrastructure events including but not limited to new customer commits, new model releases, change to DC/cluster architecture, etc. that impacts capacity and fleet utilization. Update capacity plans and forecasts accordingly.
· Maintain Jira EPICs and Confluence pages related to capacity planning, reporting and change management to ensure execution transparency across teams
Qualifications
· 5+ years of TPM, technical program management, or product operations experience in cloud infrastructure, large-scale ML serving, or hyperscaler capacity planning
· Experience leading large cross-functional programs involving Engineering, Product, and Operations
· Comfort with the inference serving stack: model replicas, batching, prefill/decode, KV cache, accelerator scheduling
· Strong data fluency: SQL, Grafana, basic Python or Flux to pull your own numbers without waiting for an analyst
· Track record of running a recurring cross-functional ritual involving senior engineers and LT
· Direct experience with AI accelerator fleet operations such as Habana, TPU pods, Inferentia, Trainium
Why Join Cerebras
People who are serious about software make their own hardware. At Cerebras, we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:
Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.
Find out more about what it's like to work at Cerebras here!
Apply today and become part of the forefront of groundbreaking advancements in AI!
Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.
This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.