Senior ML Ops Engineer

KAYAK·Ashby
Berlin OfficeFullTimePosted Jul 3, 2026
Apply

KAYAK, part of Booking Holdings (NASDAQ: BKNG), is a leading travel search engine. With billions of queries across our platforms, we help people find their perfect flight, stay, rental car and vacation package. We're also transforming business travel with a new corporate travel solution, KAYAK for Business.

As an employee of KAYAK, you will be part of a travel company that operates a portfolio of global metasearch brands including momondo, Cheapflights and HotelsCombined, among others. From start-up to industry leader, innovation is in our DNA and every employee has an opportunity to make their mark. Our focus is on building the best travel search engine to make it easier for everyone to experience the world.

Every machine learning model KAYAK ships depends on reliable, scalable infrastructure to move from experiment to production — and that's exactly what this role makes possible. KAYAK is seeking a Senior MLOps Engineer who will focus on the design and implementation of our machine learning infrastructure and production lifecycle. This is a senior, hands-on role where you will bridge the gap between data science and production engineering.

You will join the Machine Learning Platform team and be responsible for building and maintaining scalable infrastructure & automated pipelines for model training, deployment, and monitoring, ensuring our ML models are reliable, reproducible, and performant. You will work closely with Data Scientists, ML Engineering and Operations teams to transform experimental code into robust, production-ready services at scale.

This role requires commuting to the Berlin office 3 times a week.

In this role, you will:

  • Build and maintain ML infrastructure end-to-end: Extend and operate the infrastructure that powers every model we ship — including CI/CD pipelines, model orchestration, and automated training pipelines designed to scale reliably without manual intervention.

  • Own model deployment and serving: Help define and evolve the standards and tooling for model serving, ensuring low latency and high availability across our ML services.

  • Develop core MLOps capabilities: Establish and maintain essential infrastructure that functions as reliable, self-service systems for the entire machine learning organization — with a focus on feature stores, model registries, and automated monitoring for performance and data drift.

  • Operationalize infrastructure for the ML team: Collaborate with Operations to enable Kubernetes (k8s) autoscaling and GPU provisioning, turning these into accessible, self-service tools for ML practitioners — including standing up and operating a Kubernetes-based development cluster and taking models from experimentation to GPU-backed production.

  • Improve platform reliability and performance: Partner with Operations to design resilient monitoring using advanced observability tooling. Define service-level objectives and implement automation to reduce manual interventions and improve system reliability.

  • Empower Data Scientists through standardized, optimized workflows: Amplify the impact of the ML team by building clear, well-supported "golden paths" — standardized workflows that streamline the model development lifecycle and let Data Scientists focus on modeling while you handle the infrastructure.

Please apply if you have:

  • Experience building and operating ML platforms in production environments.

  • Solid working knowledge of containerization and orchestration (Docker, Kubernetes), Linux internals, and model serving at scale.

  • Familiarity with ML lifecycle tooling, including orchestration frameworks, feature stores, model registries, and drift or performance monitoring.

  • Experience owning production systems: defining service-level objectives (SLOs), building observability (for example, using tools such as Prometheus, Grafana, or Datadog), participating in incident response, and diagnosing large-scale failures systematically. You look for opportunities to automate repetitive work rather than absorb it.

  • Comfort writing production-quality code in Python or a comparable language.

  • Experience modernizing production infrastructure with attention to reliability, risk, and cost — including thoughtful sequencing of work to maintain availability and continuity for live systems.

  • The ability to take ownership of technical outcomes, advocate for decisions using data, and communicate clearly in writing and in person — to both technical and non-technical audiences.

Benefits and Perks

  • Work from (almost) anywhere for up to 20 days per year

  • Focus on mental health and well-being:

    • Company-paid therapy sessions through SpringHealth

    • Company-paid subscription to HeadSpace

    • Company-wide week off a year – the whole team fully recharges (and returns without a pile-up of work!)

    • No meeting Fridays

  • Paid parental leave

  • Paid volunteer time

  • Focus on your career growth:

    • Development Dollars

    • Leadership development

    • Access to thousands of on-demand e-learnings

  • Travel Discounts

  • Employee Resource Groups

  • 6 weeks paid vacation + a day off for your birthday

  • Free lunch 2 days per week

  • Pension plan contributions

  • Public transportation subsidies

  • Bike leasing program

  • Monthly social events, Thursday happy hours, sports teams

  • An awesome office in Friedrichshain, Berlin

Inclusion

At KAYAK, we want everyone to have the space to grow, share ideas and do great work. That's why we're focused on hiring the best talent from all walks of life and experiences, supporting them well and making sure no one feels like they have to fit a mold to belong here.

Need any adjustments for the interview, application or on the job? No problem – just give us a heads-up. We've got you.

#LI-AS1

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free