Lead Software Engineer - Data Engineer
Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch technology products.
As a Lead Software Engineer at JPMorganChase within the Commercial & Investment Bank (CIB) – Regulatory Reporting Team, you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a secure, stable, and scalable way. As a core technical contributor, you are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.
Job responsibilities
- Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems, with a focus on data engineering and Spark-based ETL/ELT
- Develops secure high-quality production code in Python/PySpark and Spark SQL, and reviews and debugs code written by others (Spark jobs, SQL logic, and data issues end-to-end)
- Drives team adoption of enterprise-authorized AI-assisted engineering practices within the work environment to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test strategy acceleration, incident/root-cause analysis support), while establishing consistent validation standards (secure coding, peer review, automated testing) and promoting reuse of effective patterns across the team.
- Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automation capabilities, to improve the value realized by automation.
- Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems, including data pipeline reliability and lakehouse maintenance automation
- Leads evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented probing of architectural designs, technical credentials, and applicability for use within existing systems and information architecture (e.g., EMR/Databricks, lakehouse/table formats, catalog/governance patterns)
- Leads communities of practice across Software Engineering to drive awareness and use of new and leading-edge technologies, especially around Spark performance, Iceberg best practices, and data platform operations
- Adds to team culture of diversity, opportunity, inclusion, and respect
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 5+ years applied experience
- 5+ years of applied experience building production data engineering and/or software engineering solutions (design, development, testing, operations)
- Hands-on practical experience delivering system design, application development, testing, and operational stability for large-scale data pipelines
- Advanced in one or more programming language(s), with advanced proficiency in Python and strong hands-on experience with PySpark.
- Advanced proficiency in Spark SQL and strong SQL fundamentals (data modeling, query optimization, execution plan analysis)
- Demonstrated experience leading effective use of approved AI-assisted software development tools (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security
- Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs, outputs, and adherence to resiliency and security expectations; experience coaching engineers on safe, compliant adoption within delivery practice
- Experience with AWS data management patterns including S3 and AWS Glue Data Catalog (metadata governance, table schema hygiene, discoverability). Would also consider other cloud based Data platform.
- Required platform experience: delivering and operating Spark workloads on EMR and or Databricks (tuning, troubleshooting, monitoring, and cost, performance optimization)
- Required lakehouse expertise: production experience with Apache Iceberg, including table design and ongoing operations such as partitioning strategy and file layout optimization, schema evolution and compatibility controls, compaction, small-file mitigation, snapshot retention management and metadata maintenance, safe backfills and rewrites, reprocessing patterns
- Proficiency in automation and continuous delivery methods (CI CD, automated testing, and repeatable deployments for data pipelines)
Preferred qualifications, capabilities, and skills
- Kafka familiarity (topic design, producer/consumer patterns, schema evolution/compatibility, and operational considerations) is a plus
- Experience with Delta Lake concepts and trade-offs vs. Iceberg
- Experience with Spark Structured Streaming and streaming ETL patterns
- Working knowledge of Java (interoperability or leveraging existing JVM-based components)
- Experience using AI-assisted engineering tools and workflows (e.g., GitHub Copilot, Claude) including spec-driven development, prompt-assisted refactoring, and code review—following enterprise-safe usage patterns