Senior Manager of Site Reliability Engineering - Securitized Products, Production Management - NA

United States · New York, NYFull-timePosted Jul 2, 2026

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.

As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the Corporate Investment Bank, Markets team, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.

Job responsibilities

Manage day-to-day execution of SRE functions (workload prioritization, shift coverage, triage quality, escalations, runbooks, and handoffs) to ensure consistent and timely outcomes during market hour
Drives reuse-first adoption of enterprise-authorized AI capabilities within the work environment to improve reliability operations and customer experience outcomes, with human-in-the-loop validation and appropriate handling of sensitive data.
Provide North America leadership for production management teams supporting trading desks across multiple Markets lines of business; ensure reliable day-to-day operations and sustained stability improvements
Lead and coordinate L1/L2 investigations and incident response; ensure clear ownership, high-quality communications, and follow-through to root cause and prevention
Act as a key technology partner to the trading desks: monitor operational signals, drive rapid engagement, translate business impact into technical action, and communicate clearly under pressure
Drive adoption of SRE practices across delivery teams, ensuring best practices are implemented and demonstrated empirically via stability and reliability metrics (e.g., SLOs, error budgets, incident trends)
Own and evolve observability (dashboards/alerts/SLOs, instrumentation, monitoring strategy) and use data to prioritize resiliency, performance, and scalability improvements
Deliver automation and tooling that reduces operational toil and improves support effectiveness (faster diagnosis, safer remediation, repeatable fixes, and self-service workflows)
Establish and enforce operational standards for delivery teams (operational readiness, testing discipline, release safety, rollback strategy, post-incident actions) and hold teams accountable for closing gaps
Establishes team standards for AI-assisted reliability workflows across automation and delivery practices, ensuring traceability/auditability, resiliency, and security controls.

Required qualifications, capabilities, and skills

Formal training or certification on site reliability engineering concepts and 5+ years applied experience . In addition, 2 + years of experience leading technologists to manage and solve complex technical items within your domain of expertise
Demonstrated experience supporting front-office / trading desk workflows or similarly time-sensitive production environments, with comfort operating during market hours
Proven production management / SRE leadership experience (support rotations, incident response, root cause analysis, post-incident actions, reliability improvements)
Experience leading teams in the safe use of enterprise-authorized AI capabilities within the work environment for reliability engineering workflows, including validation habits and awareness of data sensitivity.\
Ability to set and reinforce organization-level practices for reviewing AI-assisted recommendations and escalating uncertain decisions while maintaining resiliency, security, and auditability outcomes.
Experience leading technologists in a player/coach capacity, including guiding support staff and influencing senior engineers and delivery teams
Strong engineering fundamentals: distributed systems thinking, debugging, performance analysis, and pragmatic tradeoffs under pressure
Practical AWS experience supporting production services (troubleshooting, deployment, operational visibility)
Strong stakeholder management and communication skills, especially during incidents and high-pressure periods
Proficiency in at least one programming language (e.g., Python, Java/Spring Boot, .NET) and ability to automate/engineer solutions that reduce toil

Preferred qualifications, capabilities, and skills

Ability to code and demonstrate data fluency
Prior Markets experience (Fixed Income preferred; experience supporting multiple lines of business is a plus)
Hands-on experience with AWS CLI, CloudWatch, and cloud-native operational patterns
Experience with Datadog (metrics/logs/traces, alert tuning, SLOs) and/or comparable observability stacks
Experience applying AI-assisted tooling to reduce operational toil (e.g., incident summarization, runbook assistance), with appropriate controls and governance.
Coach and develop engineers through individualized mentoring; ensure knowledge is documented and shared via internal forums and communities of practice