Research Engineer, Computer Vision

Meta·DEJOBS
Pittsburgh, PAPosted Jul 1, 2026
Open original posting
**Summary:** As a Research Engineer focused on Multi-Modal Understanding, you will develop advanced algorithms that integrate computer vision with other modalities such as language, audio, and sensor data. You will also drive the curation of multi-modal datasets and ground truth annotation pipelines to support model training and evaluation. You will work closely with our research team to bring innovative multi-modal solutions to production, bridging the gap between visual perception and holistic contextual understanding for immersive applications. **Required Skills:** Research Engineer, Computer Vision Responsibilities: 1. Design and implement multi-modal understanding systems that combine vision, language, and other sensory inputs to enable richer contextual awareness 2. Develop algorithms for cross-modal learning, fusion, and reasoning to improve human-AI interaction 3. Lead the curation and management of multi-modal datasets, ensuring data quality and diversity across vision, language, and sensor modalities 4. Design and oversee ground truth annotation workflows and quality assurance processes for multi-modal data 5. Complete medium to large features spanning multiple tasks independently with minimal to no guidance 6. Collaborate with researchers and engineers across computer vision and machine learning teams to drive multi-modal innovation 7. Develop well-organized code with proper testing and documentation, building production-ready multi-modal systems **Minimum Qualifications:** Minimum Qualifications: 8. Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta 9. Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta 10. Proven experience with C++ and/or Python, including experience with modern features 11. Experience working with deep learning frameworks such as PyTorch and TensorFlow 12. Demonstrated experience working collaboratively in cross-functional teams **Preferred Qualifications:** Preferred Qualifications: 13. Master's degree in Computer Science, Computer Vision, Machine Learning, or related field 14. Experience with vision-language models or multi-modal transformers 15. Publications or contributions to multi-modal understanding research 16. Familiarity with large language models and their integration with visual understanding systems 17. Experience with data curation, annotation tools, or ground truth labeling pipelines **Public Compensation:** $121,992/year to $181,000/year + bonus + equity + benefits **Industry:** Internet **Equal Opportunity:** Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment. Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@meta.com.

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free