Director-Infrastructure Engineering

Phoenix, AZFull-timePosted Jun 25, 2026

As part of our diverse tech team, you can architect, code and ship software that makes us an essential part of our customers’ digital lives. Here, you can work alongside talented engineers in an open, supportive, inclusive environment where your voice is valued, and you make your own decisions on what tech to use to solve challenging problems. American Express offers a range of opportunities to work with the latest technologies and encourages you to back the broader engineering community through open source. And because we understand the importance of keeping your skills fresh and relevant, we give you dedicated time to invest in your professional development. Find your place in technology on #TeamAmex.

American Express Platform Services team is looking for innovators to help us build world-class applications, Cloud platforms and infrastructure supported by integrated CICD, Observability and security capabilities.

The Director Infrastructure Engineering - Head of Private Cloud (OpenShfit, IAC, Data Middleware Services) Operations – US is responsible for leading the strategy, execution, and continuous improvement of cloud operations across OpenShift, Redis, Kafka, Elasticsearch, Terraform and other platform services. This role ensures secure, reliable, scalable, and cost-effective cloud environments that support enterprise applications and digital transformation initiatives.

The ideal candidate combines strong technical depth in cloud infrastructure with operational excellence, financial governance (FinOps), DevOps, Site Reliability Engineering, automation leveraging GenAI/AgenticAI, and people leadership.

Key Responsibilities:

Cloud Operations Leadership

Lead and manage Private Cloud operations for production and non-production environments.
Establish and enforce operational standards, SLAs, and SLOs.
Drive incident, problem, and change management processes.
Ensure high availability, performance, and resilience of cloud platforms.

Cloud Infrastructure & Reliability

Oversee infrastructure design, deployment, monitoring, and optimization.
Implement Infrastructure as Code (IaC) using Terraform
Drive SRE principles including reliability engineering and automation.
Manage Disaster Recovery, and business continuity strategies.

Automation & DevOps Enablement

Champion automation-first operational models.
Leverage GenAI/AgenticAI to automate common platform operations including customer support
Integrate CI/CD pipelines with cloud infrastructure.
Reduce manual operational overhead through scripting and tooling.
Enable platform engineering capabilities for internal teams.

Financial Governance

Own cloud cost management, forecasting, and optimization.
Implement tagging standards and chargeback/showback models.
Drive cost-efficiency initiatives across workloads.

Vendor & Stakeholder Management

Manage relationships with Service providers
Collaborate with application teams, architecture, security, and enterprise IT.
Support cloud migration and modernization programs.

Team Leadership & Development

Build, mentor, and retain high-performing cloud operations teams.
Define hiring strategy and succession planning.
Establish performance metrics and career development plans.
Foster a culture of accountability, innovation, and continuous improvement.

Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred).
8+ years of experience in Platform Engineering & Operations, API Support or Site Reliability Engineering (SRE), with a proven track record of leading teams in managing large-scale cloud infrastructure with a focus on reliability and resilience.
Deep hands-on experience with any Kubernetes platform(multi-cloud preferred).
Strong experience with:
- Infrastructure as Code (Terraform, CloudFormation, ARM)
- Container platforms (OpenShift/Kubernetes)
- Monitoring tools (Prometheus, OTEL, LOKI)
- CI/CD pipelines (Jenkins, GitHub Actions)
Strong understanding of cloud networking, security, and architecture.
Experience managing large-scale, mission-critical production environments.
Proven experience in financial management and cloud cost optimization.
Relevant certifications preferred
Experience with DevOps practices and methodologies, including CI/CD pipelines, configuration management, and infrastructure as code.
Experience in leveraging GenAI and AgenticAI in automation and self-healing of platforms
Experience with observability tools such as Prometheus, Splunk, ELK, Dynatrace.
Strong analytical and problem-solving skills, with the ability to troubleshoot complex issues and drive resolution in a fast-paced environment.
Excellent communication and leadership skills, with the ability to effectively collaborate with cross-functional teams and influence decision-making at all levels of the organization.

Employment eligibility to work with American Express in the United States is required as the company will not pursue visa sponsorship for these positions.