Research Scientist, Multi-Modal

Meta·DEJOBS

Pittsburgh, PAPosted Jul 1, 2026

**Summary:** Meta is seeking a creative, skilled and motivated Research Scientist to advance the state-of-the-art in multi-modal understanding. You will work on developing models that reason across vision, language, and other modalities to enable richer AI experiences across Meta's family of apps and products. You will collaborate with research scientists, software engineers, and data scientists to design technical solutions in a fast-paced multidisciplinary environment. **Required Skills:** Research Scientist, Multi-Modal Responsibilities: 1. Develop and advance multi-modal models that integrate vision, language, audio, and other modalities 2. Research novel architectures and training methods for cross-modal reasoning and understanding 3. Design and prototype interactive experiences that leverage multi-modal AI capabilities 4. Collaborate across teams to develop concepts that advance the entire research pipeline (hardware, software, data collection, machine learning, etc.) 5. Publish research findings at top-tier conferences and contribute to the broader research community **Minimum Qualifications:** Minimum Qualifications: 6. Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta 7. Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, or relevant technical field. Degree must be completed prior to joining Meta 8. Experience in multi-modal learning, combining vision, audio, language, or related areas 9. Experience working with PyTorch or TensorFlow 10. Experience with transformer architectures and large-scale model training 11. Technical knowledge across machine learning, deep learning, and statistical modeling 12. Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment **Preferred Qualifications:** Preferred Qualifications: 13. First-authored publications at leading conferences such as NeurIPS, ICML, and CVPR, or similar 14. Experience with large language models (LLMs) and their integration with other modalities 15. Experience transferring multi-modal research into shipping products 16. Experience working and communicating cross-functionally in a team environment 17. Research experience in vision-language models, multi-modal transformers, or cross-modal representation learning **Public Compensation:** $122,000/year to $181,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@meta.com.