Lead Data Engineer - Pipelines, Spark Streaming and Spark Offline

Tampa, FLFull-timePosted Jul 3, 2026

Join us as we embark on a journey of collaboration and innovation, where your unique skills and talents will be valued and celebrated. Together we will create a brighter future and make a meaningful difference.

As a Lead Data Engineer at JPMorganChase within the Commercial & Investment Bank, you are an integral part of an agile team that works to enhance, build, and deliver data collection, storage, access, and analytics solutions in a secure, stable, and scalable way. As a core technical contributor, you are responsible for maintaining critical data pipelines and architectures across multiple technical areas within various business functions in support of the firm’s business objectives.

Job responsibilities

Collaborate with all of JPMorgan’s lines of business and functions to delivery software solutions
Experiment, Architect, develop and productionize efficient Data pipelines, Data services and Data platforms contributing to the business
Design and implement highly scalable, efficient and reliable data processing pipelines and perform analysis and insights to drive and optimize business result
Design and develop features and entities for ML and rule using spark or any bigdata environment
Acts on previously identified opportunities to converge physical, IT, and data security architecture to manage access
Applies reuse-first, AI-assisted practices within delivery and operational routines (e.g., backup/recovery validation and access control review support), ensuring traceability/auditability and alignment to resiliency and security expectations

Required qualifications, capabilities, and skills

Formal training or certification on Data Engineering concepts and 5+ years applied experience
Demonstrated experience using enterprise-authorized AI capabilities within the work environment to support data engineering workflows with strong validation habits and awareness of data sensitivity
Ability to review and validate AI-assisted outputs (e.g., model/design summaries or operational checklists) before use, escalating when uncertain and following data handling requirements
Experienced programming skills with Python, PySpark
Experience across the data lifecycle, building Data frameworks, working with Data lakes
Experience with Batch and Real time Data processing with Spark or Flink and Batch and Real time feature engineering with Spark or Flink or data brick
Working knowledge of AWS Glue and EMR usage for Data processing and real time data processing and features using Flink or Data brick live tables or Spark streaming
Experience working with Databricks and data brick live tables
Experience working in building services using Glue, Lamida, EMR or Flask, and deploying them on AWS EKS or Kubernetes
Working experience with both relational and NoSQL databases
Experience in ETL data pipelines both batch and real-time data processing, Data warehousing, NoSQL DB

Preferred qualifications, capabilities, and skills

Expertise in Amazon Web Services (AWS), Docker, and Kubernetes for cloud-native and containerized data solutions
Experience in big data technologies: Hadoop, Spark, Kafka, Flink
Experience in distributed system design and development