Lead Data Engineer - Pipelines, Spark Streaming and Spark Offline
Join us as we embark on a journey of collaboration and innovation, where your unique skills and talents will be valued and celebrated. Together we will create a brighter future and make a meaningful difference.
As a Lead Data Engineer at JPMorganChase within the Commercial & Investment Bank, you are an integral part of an agile team that works to enhance, build, and deliver data collection, storage, access, and analytics solutions in a secure, stable, and scalable way. As a core technical contributor, you are responsible for maintaining critical data pipelines and architectures across multiple technical areas within various business functions in support of the firm’s business objectives.
Job responsibilities
- Collaborate with all of JPMorgan’s lines of business and functions to delivery software solutions
- Experiment, Architect, develop and productionize efficient Data pipelines, Data services and Data platforms contributing to the business
- Design and implement highly scalable, efficient and reliable data processing pipelines and perform analysis and insights to drive and optimize business result
- Design and develop features and entities for ML and rule using spark or any bigdata environment
- Acts on previously identified opportunities to converge physical, IT, and data security architecture to manage access
- Applies reuse-first, AI-assisted practices within delivery and operational routines (e.g., backup/recovery validation and access control review support), ensuring traceability/auditability and alignment to resiliency and security expectations
Required qualifications, capabilities, and skills
Formal training or certification on Data Engineering concepts and 5+ years applied experience
- Demonstrated experience using enterprise-authorized AI capabilities within the work environment to support data engineering workflows with strong validation habits and awareness of data sensitivity
- Ability to review and validate AI-assisted outputs (e.g., model/design summaries or operational checklists) before use, escalating when uncertain and following data handling requirements
- Experienced programming skills with Python, PySpark
- Experience across the data lifecycle, building Data frameworks, working with Data lakes
- Experience with Batch and Real time Data processing with Spark or Flink and Batch and Real time feature engineering with Spark or Flink or data brick
- Working knowledge of AWS Glue and EMR usage for Data processing and real time data processing and features using Flink or Data brick live tables or Spark streaming
- Experience working with Databricks and data brick live tables
- Experience working in building services using Glue, Lamida, EMR or Flask, and deploying them on AWS EKS or Kubernetes
- Working experience with both relational and NoSQL databases
- Experience in ETL data pipelines both batch and real-time data processing, Data warehousing, NoSQL DB
- Expertise in Amazon Web Services (AWS), Docker, and Kubernetes for cloud-native and containerized data solutions
- Experience in big data technologies: Hadoop, Spark, Kafka, Flink
- Experience in distributed system design and development