Senior Manager of Site Reliability Engineering - Securitized Products, Production Management - NA

JPMorganChase·Oracle Recruiting
United States · New York, NYFull-timePosted Jul 2, 2026
Open original posting

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.


As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the Corporate Investment Bank, Markets team, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact. 

Job responsibilities

 

  • Manage day-to-day execution of SRE functions (workload prioritization, shift coverage, triage quality, escalations, runbooks, and handoffs) to ensure consistent and timely outcomes during market hour
  • Drives reuse-first adoption of enterprise-authorized AI capabilities within the work environment to improve reliability operations and customer experience outcomes, with human-in-the-loop validation and appropriate handling of sensitive data.
  • Provide North America leadership for  production management teams supporting trading desks across multiple Markets lines of business; ensure reliable day-to-day operations and sustained stability improvements
  • Lead and coordinate L1/L2 investigations and incident response; ensure clear ownership, high-quality communications, and follow-through to root cause and prevention
  • Act as a key technology partner to the trading desks: monitor operational signals, drive rapid engagement, translate business impact into technical action, and communicate clearly under pressure
  • Drive adoption of SRE practices across delivery teams, ensuring best practices are implemented and demonstrated empirically via stability and reliability metrics (e.g., SLOs, error budgets, incident trends)
  • Own and evolve observability (dashboards/alerts/SLOs, instrumentation, monitoring strategy) and use data to prioritize resiliency, performance, and scalability improvements
  • Deliver automation and tooling that reduces operational toil and improves support effectiveness (faster diagnosis, safer remediation, repeatable fixes, and self-service workflows)
  • Establish and enforce operational standards for delivery teams (operational readiness, testing discipline, release safety, rollback strategy, post-incident actions) and hold teams accountable for closing gaps
  • Establishes team standards for AI-assisted reliability workflows across automation and delivery practices, ensuring traceability/auditability, resiliency, and security controls.

 

Required qualifications, capabilities, and skills

 

  • Formal training or certification on site reliability engineering concepts and 5+ years applied experience . In addition, 2 + years of experience leading technologists to manage and solve complex technical items within your domain of expertise
  • Demonstrated experience supporting front-office / trading desk workflows or similarly time-sensitive production environments, with comfort operating during market hours
  • Proven production management / SRE leadership experience (support rotations, incident response, root cause analysis, post-incident actions, reliability improvements)
  • Experience leading teams in the safe use of enterprise-authorized AI capabilities within the work environment for reliability engineering workflows, including validation habits and awareness of data sensitivity.\

  • Ability to set and reinforce organization-level practices for reviewing AI-assisted recommendations and escalating uncertain decisions while maintaining resiliency, security, and auditability outcomes.

  • Experience leading technologists in a player/coach capacity, including guiding support staff and influencing senior engineers and delivery teams
  • Strong engineering fundamentals: distributed systems thinking, debugging, performance analysis, and pragmatic tradeoffs under pressure
  • Practical AWS experience supporting production services (troubleshooting, deployment, operational visibility)
  • Strong stakeholder management and communication skills, especially during incidents and high-pressure periods
  • Proficiency in at least one programming language (e.g., Python, Java/Spring Boot, .NET) and ability to automate/engineer solutions that reduce toil

 

 

Preferred qualifications, capabilities, and skills

  • Ability to code and demonstrate data fluency
  • Prior Markets experience (Fixed Income preferred; experience supporting multiple lines of business is a plus)
  • Hands-on experience with AWS CLI, CloudWatch, and cloud-native operational patterns
  • Experience with Datadog (metrics/logs/traces, alert tuning, SLOs) and/or comparable observability stacks
  • Experience applying AI-assisted tooling to reduce operational toil (e.g., incident summarization, runbook assistance), with appropriate controls and governance. 
  • Coach and develop engineers through individualized mentoring; ensure knowledge is documented and shared via internal forums and communities of practice

     

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free