Software Engineering III - AI/ML Engineer

JPMorganChase·Oracle Recruiting
LONDON, United KingdomFull-timePosted Jun 29, 2026
Open original posting

As a Site Reliability Engineer for AI/ML Data Platforms, you will be instrumental in building scalable, resilient and market-leading data solutions. You will engage in root cause analysis, production changes, budgetary considerations, and staffing challenges. Your experience will be vital in managing and mentoring team members to drive strategic change, both within your team and in partnership with colleagues across JPMorgan Chase & Co.'s global network of innovators.

Job Responsibilities:

  • Expertise in application development and support with multiple technologies such as Databricks, Snowflake, AWS, Kubernetes, etc.
  • Coordinate incident management coverage to ensure effective resolution of application issues.
  • Collaborate with cross-functional teams to perform root cause analysis and implement production changes.
  • Leverages enterprise-authorized AI coding assist tools within the work environment to improve code quality, delivery speed, and productivity across complex deliverables (e.g., code generation/refactoring, unit test creation, documentation), while validating outputs through peer review, automated testing, and secure coding standards; contributes learnings and reusable patterns to improve broader team effectiveness.
  • Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automation capabilities, to improve the value realized by automation.
  • Develop and support AI/ML solutions for troubleshooting and incident resolution.

 

Required Skills and Capabilities:

  • Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
  • Proficiency in running production incident calls and managing incident resolution.
  • Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
  • Strong understanding of SLI/SLO/SLA and Error Budgets
  • Hands-on experience using enterprise-authorized AI-assisted software development tools within the work environment (e.g., for coding, test creation, troubleshooting, or documentation) with demonstrated ability to critically evaluate, validate, and refine AI-generated outputs for correctness, performance, and security.
  • Understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; ability to guide peers on safe and effective usage within team practices.
  • Proficiency in Python or PySpark for AI/ML modeling.
  • Must be able to reduce toil by building new tools to automate repeated tasks.
  • Hands-on experience in system design, resiliency, testing, operational stability, and disaster recovery
  • Awareness of risk controls and compliance with departmental and company-wide standards.
  • Ability to work collaboratively in teams and build meaningful relationships to achieve common goals.

 

Preferred Qualifications

  • 4+ years in an SRE or production support role with AWS Cloud, Databricks, Snowflake or similar Technologies.
  • AWS and Databricks certifications.

 

 

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free