Research Scientist

Meta·DEJOBS
Pittsburgh, PAPosted Jul 1, 2026
Open original posting
**Summary:** Reality Labs at Meta is seeking a Research Scientist with expertise in multi-modal understanding to advance AI-powered interactions. We're building next-generation capabilities that integrate vision, language, audio, and sensor modalities. This is a unique opportunity to conduct cutting-edge multi-modal research with direct product impact. **Required Skills:** Research Scientist Responsibilities: 1. Lead the design, development, and optimization of multi-modal models that integrate vision, language, audio, and sensor inputs 2. Set technical direction for multi-modal research projects 3. Conduct research and experiments to improve cross-modal alignment and fusion strategies 4. Collaborate with cross-functional teams (engineering, HCI, product) to transition multi-modal research into production 5. Explore and adopt novel model optimization, quantization, and efficiency techniques 6. Stay current with state-of-the-art advances in multi-modal learning, vision-language models, and related fields **Minimum Qualifications:** Minimum Qualifications: 7. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 8. Currently has, or is in the process of obtaining, a PhD in Computer Science, Machine Learning, Computer Vision, or a related technical field. Degree must be completed prior to joining Meta 9. Demonstrated expertise in multi-modal learning — including architecture design, training, and cross-modal alignment techniques 10. Programming experience in Python and hands-on experience with deep learning frameworks such as PyTorch 11. Experience developing machine learning models at scale from inception to impact 12. 5+ years of research experience working autonomously on ML problems involving multiple modalities (vision, language, audio, or sensor data) **Preferred Qualifications:** Preferred Qualifications: 13. Deep expertise in vision-language models, cross-modal attention mechanisms, or contrastive learning approaches 14. First-authored publications at peer-reviewed AI conferences (e.g., CVPR, NeurIPS, ICML, ICLR, ACL, ECCV) 15. Experience with on-device or edge multi-modal model optimization (quantization, sparsity, distillation) 16. Demonstrated software engineering experience via internship, work experience, or widely used contributions in open source repositories 17. Experience bringing multi-modal AI products from research to production 18. Proven track record of developing multi-modal models that fuse vision, language, and/or audio for real-world applications **Public Compensation:** $184,000/year to $257,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@meta.com.

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free