Strategic Operations Engineer III

Jobgether·Lever
United StatesFull-time$123k–$175kPosted Jul 3, 2026
Apply

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Strategic Operations Engineer III based in United States.

This role sits at the intersection of engineering operations, reliability, and AI-driven systems management, with a strong focus on keeping large-scale cloud infrastructure resilient, observable, and continuously improving. You will be responsible for shaping how operational processes are run across incident, problem, and change management, ensuring high availability and rapid recovery across complex distributed systems. The environment is highly technical and data-driven, with a strong emphasis on automation, AI-enabled insights, and operational excellence at scale. You will partner closely with engineering teams to improve system reliability, reduce noise in monitoring, and accelerate resolution of critical incidents. This is a high-impact role where you will influence both day-to-day operational stability and long-term platform resilience. It is ideal for someone who thrives in fast-paced environments where engineering rigor and structured operational thinking are equally important.

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Strategic Operations Engineer III based in United States.

This role sits at the intersection of engineering operations, reliability, and AI-driven systems management, with a strong focus on keeping large-scale cloud infrastructure resilient, observable, and continuously improving. You will be responsible for shaping how operational processes are run across incident, problem, and change management, ensuring high availability and rapid recovery across complex distributed systems. The environment is highly technical and data-driven, with a strong emphasis on automation, AI-enabled insights, and operational excellence at scale. You will partner closely with engineering teams to improve system reliability, reduce noise in monitoring, and accelerate resolution of critical incidents. This is a high-impact role where you will influence both day-to-day operational stability and long-term platform resilience. It is ideal for someone who thrives in fast-paced environments where engineering rigor and structured operational thinking are equally important.

Accountabilities:

    • Lead end-to-end incident management processes, including detection, triage, escalation, coordination, and resolution of high-severity production issues.
    • Drive major incident management (MIM) communications and ensure clear, timely updates across stakeholders during critical events.
    • Develop and improve incident response playbooks, runbooks, and automation to reduce MTTR and improve operational consistency.
    • Own and evolve problem management practices, leveraging data and AI/ML insights to identify recurring issues and drive long-term remediation.
    • Lead change management processes, including CAB governance, risk evaluation, and enforcement of safe, compliant deployment practices.
    • Enhance observability and monitoring systems to reduce alert fatigue and improve signal quality across large-scale environments.
    • Apply AIOps methodologies to detect anomalies, enable predictive alerting, and improve root cause analysis and operational workflows.
    • Requirements:

      • 5+ years of experience in IT operations, Site Reliability Engineering (SRE), or similar infrastructure-focused roles in large-scale environments.
      • Strong expertise in incident, problem, and change management frameworks (ITIL or equivalent).
      • Hands-on experience improving operational processes, governance models, and production reliability in high-availability systems.
      • Solid understanding of AI/ML concepts such as anomaly detection, predictive analytics, and data-driven operational insights.
      • Experience with AIOps platforms or building automation and AI-driven operational solutions for monitoring and incident response.
      • Proficiency with operational tooling such as Jira, ServiceNow, FireHydrant, Moogsoft, or similar platforms.
      • Strong communication, analytical, and stakeholder management skills with the ability to drive cross-functional alignment.
      • Benefits:

        • Competitive salary range ($123,000 – $175,000 USD) with performance-based compensation considerations
        • Comprehensive health, dental, and vision insurance coverage
        • Remote-first work environment across the United States
        • Opportunity to work on large-scale, high-availability cloud infrastructure systems
        • Strong focus on automation, AI-driven operations, and continuous improvement
        • Collaborative engineering culture with emphasis on ownership and operational excellence
        • Commitment to diversity, equity, inclusion, and employee belonging
How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best!  Why Apply Through Jobgether?    Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.     #LI-CL1

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free