Global Production Systems Engineer

Meta·DEJOBS
New Albany, OHPosted Jun 27, 2026
Open original posting
**Summary:** Meta is seeking a forward-thinking, experienced Production Systems Engineer to join the Data Center Operations team. Our data centers, and the tens of thousands of servers installed in them, are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered. Meta is at the leading edge of the global data center industry, both in terms of how data centers are designed and operated. This role requires prioritizing competing workstreams based on operational impact and adjusting plans as infrastructure needs evolve. The candidate we seek is a forward-thinking IT professional with deep experience in utilizing multiple diverse software tools to identify automation solutions intended to address complex operational issues. This role is deeply cross-functional and considers the technical needs of frontline users to identify and automate diagnostic tooling, which enables quality and efficient delivery of production servers. They should be able to perform deep data analysis to drive decisions on the top priorities for automating repairs on servers in a hyperscale environment. This role requires driving solutions through code and collaborating effectively with globally distributed teams via clear written and verbal communication. Experience managing servers, programming in scripting languages, and administering Linux systems is required. **Required Skills:** Global Production Systems Engineer Responsibilities: 1. Identify and root cause systemic issues in the fleet and drive resolutions. Deliver maximum server fleet uptime and utilization rates, by leveraging data to understand hardware failure conditions and root cause 2. Write and review code, develop documentation, and debug the hardest problems, live, on some of the largest and most complex systems in the world 3. Own and develop diagnostic tooling requirements to run the fleet 4. Own and drive the escalation process for Data Center Operations to identify, root cause, and solve complex tooling and hardware issues affecting the fleet 5. Execute operational validation and verification activities for the new product integration 6. Through consistent collaboration with cross-functional tooling teams, helps determine the root cause and provides input into their development process, with an operations-centric view of how open issues are affecting the fleet 7. Build cross-functional relationships and have the ability to influence policies and procedures to improve global data center operations 8. Mentor team members to evaluate and identify better ways to resolve issues and define updates to tools and processes 9. Travel up to 25% to support global data center operations **Minimum Qualifications:** Minimum Qualifications: 10. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 11. 6+ years of experience in production systems engineering, infrastructure engineering, or systems software development for large-scale hardware environments 12. 6+ years of experience with hardware lifecycle management, fleet automation, or data center operations systems spanning compute, storage, or networking infrastructure 13. Experience developing systems software or tooling in Python, PHP, C, or C++ for Linux-based production environments at scale 14. Experience in configuration and maintenance of applications such as web servers, load balancers, relational databases, storage systems and messaging systems 15. Experience communicating technical designs and infrastructure decisions through written documentation and cross-functional stakeholder alignment across engineering and operations teams **Preferred Qualifications:** Preferred Qualifications: 16. Experience designing or operating configuration management and infrastructure-as-code systems for large heterogeneous hardware fleets 17. Experience supporting global, multi-site data center infrastructure deployments including hardware qualification and regional rollout coordination 18. Familiarity with distributed systems monitoring, alerting, and automated remediation pipelines at hyperscale **Public Compensation:** $144,000/year to $204,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@meta.com.

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free