Data Center Production Operations Engineer (Third Shift)

Meta·DEJOBS
New Albany, OH$111k–$159kPosted Jul 3, 2026
Apply
**Summary:** Meta is seeking a Data Center Production Operations Engineer to support the reliability, efficiency, and scalability of our global data center infrastructure. In this role, you will be responsible for the day-to-day operational health of server fleets and production systems that underpin Meta's family of apps and services. You will work at the intersection of hardware lifecycle management, systems reliability, and operational process improvement, ensuring that production environments meet the demands of billions of users worldwide. **Required Skills:** Data Center Production Operations Engineer (Third Shift) Responsibilities: 1. Manage and maintain large-scale server fleets across data center environments, including hardware triage, failure analysis, and coordinating repair and replacement workflows 2. Monitor production systems health using observability tooling and telemetry data to proactively identify and resolve infrastructure anomalies before they impact service availability 3. Develop and refine operational runbooks, escalation procedures, and incident response playbooks specific to data center server environments 4. Collaborate with hardware engineering, network operations, and capacity planning teams to support server deployment, decommissioning, and lifecycle transitions 5. Analyze failure trends and operational data to identify systemic issues in server hardware or firmware, and drive root cause analysis and corrective action 6. Contribute to automation initiatives that reduce manual toil in server provisioning, health checks, and fleet management workflows, including leveraging AI-integrated tooling 7. Partner with cross-functional teams to evaluate and implement process improvements that increase operational efficiency and reduce mean time to resolution for production incidents 8. Communicate infrastructure status, incident timelines, and risk assessments to engineering and operations stakeholders through clear written and verbal updates 9. Support capacity readiness activities by validating server acceptance criteria and coordinating with data center technicians during hardware bring-up and commissioning 10. Identify gaps in monitoring coverage or operational tooling and propose solutions that improve fleet visibility and production reliability 11. Participate in 24/7 on-call rotation 12. Ability to travel up to 15% of the time 13. Required to work a shifted schedule (includes nights and weekends) **Minimum Qualifications:** Minimum Qualifications: 14. 6+ years of experience in data center operations, site operations, or production infrastructure engineering supporting large-scale server environments 15. 6+ years of experience with server hardware components including CPUs, memory, storage, and network interface cards, including hands-on troubleshooting and failure diagnosis 16. Experience using systems monitoring and observability platforms to track fleet health, identify anomalies, and drive incident resolution in production data center environments 17. Experience developing or improving operational processes, runbooks, or automation scripts to support server fleet management at scale 18. Experience collaborating with hardware engineering, network, and capacity teams to coordinate infrastructure deployments and lifecycle activities **Preferred Qualifications:** Preferred Qualifications: 19. Experience contributing to post-incident reviews and translating findings into durable operational improvements that reduce recurrence across a server fleet 20. Background in capacity planning or hardware acceptance testing processes within a large-scale cloud or hyperscale data center organization 21. Familiarity with server firmware management, BIOS configuration, and out-of-band management interfaces such as IPMI or Redfish in hyperscale data center environments 22. Experience with scripting languages such as Python or Bash to automate data center operations tasks including health checks, inventory management, or alerting workflows **Public Compensation:** $111,010/year to $158,995/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@meta.com.

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free