KAYAK, part of Booking Holdings (NASDAQ: BKNG), is a leading travel search engine. With billions of queries across our platforms, we help people find their perfect flight, stay, rental car and vacation package. We're also transforming business travel with a new corporate travel solution, KAYAK for Business.
As an employee of KAYAK, you will be part of a travel company that operates a portfolio of global metasearch brands including momondo, Cheapflights and HotelsCombined, among others. From start-up to industry leader, innovation is in our DNA and every employee has an opportunity to make their mark. Our focus is on building the best travel search engine to make it easier for everyone to experience the world.
In this role, you'll join KAYAK's Operations team in our Concord, MA office and play a key role in the day-to-day support of our development and production environments. You'll work closely with engineering, security, and platform teams to keep our infrastructure reliable, performant, and ready to scale.
If you’re a team player who loves to learn, engage with new tech, and help build better- we’d love to hear from you!
In this role, you will:
Receive, triage, and prioritize inbound tickets from developers and business teams, ensuring timely resolution and clear communication throughout the lifecycle of each request.
Serve as a primary point of contact for infrastructure-related incidents, driving root cause analysis (RCA) and implementing corrective actions to prevent recurrence.
Monitor, audit, and continuously improve the health and performance of our production hosting platform using tools such as LogicMonitor, Kibana, and Elasticsearch.
Proactively identify anomalies, performance bottlenecks, and capacity risks in our infrastructure, escalating and coordinating remediation with relevant engineering teams.
Maintain, test, and refine backup systems and data retention policies to ensure business continuity and compliance with internal standards.
Develop and document operational runbooks, standard operating procedures (SOPs), and post-incident reports to build institutional knowledge and improve team efficiency.
Collaborate cross-functionally with software engineering, security, and platform teams to support infrastructure changes, deployments, and release processes.
Contribute to longer-term infrastructure improvement projects — from automation initiatives to platform migrations — as a hands-on team resource.
Participate in on-call rotations to support 24/7 production environment availability, responding to critical alerts and escalations as needed.
Identify opportunities to automate repetitive operational tasks using scripting (Bash, Python, etc.) to reduce toil and improve team velocity.
Support capacity planning efforts by tracking resource utilization trends and making data-informed recommendations for scaling infrastructure.
In this role, you will not:
Perform IT tasks for users like replacing mice, keyboards, etc.
Manage logins, groups, and email accounts for internal users.
Please apply if you have:
A Bachelor’s degree in Computer Science, Information Systems, or a related technical field (or equivalent practical experience)
3+ years of hands-on experience in an infrastructure, systems, or platform operations role in a production environment
Strong working knowledge of Linux systems administration (RHEL, CentOS, Ubuntu, or similar) and proficiency in shell scripting (Bash); Python scripting experience is a strong plus.
Solid understanding of datacenter operations including physical/virtual server management, networking fundamentals (DNS, TCP/IP, load balancing), and storage systems.
Demonstrable experience with monitoring and observability tooling — such as LogicMonitor, Datadog, Prometheus, or equivalent — including alerting, dashboarding, and threshold tuning.
Hands-on experience with log aggregation and analysis tools such as Kibana / Elasticsearch (ELK stack).
Familiarity with ticketing and incident management workflows using Jira / Atlassian products or similar platforms (e.g., ServiceNow, PagerDuty).
Experience supporting or operating cloud infrastructure (AWS, GCP, or Azure) — including compute, storage, and networking services.
Working knowledge of containerization and orchestration technologies (Docker, Kubernetes) in a production context.
Exposure to infrastructure-as-code or configuration management tools (Terraform, Ansible, Chef, or similar).
Benefits and Perks
Work from (almost) anywhere for up to 20 days per year
Focus on mental health and well-being:
Company-paid therapy sessions through SpringHealth
Company-paid subscription to HeadSpace
Company-wide week off a year - the whole team fully recharges (and returns without a pile-up of work!)
No meeting Fridays
Paid parental leave
Generous paid vacation + time off for your birthday
Paid volunteer time
Focus on your career growth:
Development Dollars
Leadership development
Access to thousands of on-demand e-learnings
Travel Discounts
Employee Resource Groups
Competitive retirement and health plans
Free lunch 2 days per week
Fun quarterly events such as boat trips, arcades, ski trips, Thursday happy hours, and more
There are a variety of factors that go into determining a salary range, including but not limited to external market benchmark data, geographic location, and years of experience sought/required. The range for this Massachusetts based role is $100,000 - 105,000.00, not inclusive of annual bonus and recurring RSU grants.
We offer a competitive base salary and benefits including: health benefits; flexible spending account; retirement benefits; life insurance; paid time off (including PTO, paid sick leave, medical leave, bereavement leave, floating holidays and paid holidays); and parental leave benefits.
Inclusion
At KAYAK, we want everyone to have the space to grow, share ideas and do great work. That's why we're focused on hiring the best talent from all walks of life and experiences, supporting them well and making sure no one feels like they have to fit a mold to belong here.
Need any adjustments for the interview, application or on the job? No problem - just give us a heads-up. We've got you.