Senior Software Engineer — AI Evaluation & Benchmarks
Role responsibilities
Design and implement coding benchmarks to evaluate AI models and build scalable data pipelines for AI evaluation workflows. Analyze AI-generated code and create structured evaluation scenarios to rigorously test reasoning and debugging capabilities.
Requirements
Candidates must have at least 4 years of professional software engineering experience and expert proficiency in Python. Experience with LLM coding benchmarks, version control systems, and strong written communication skills are also required.
Key skills
Python, Software Engineering, Data Pipelines, AI Evaluation, Debugging, Code Quality, Version Control, CI/CD, Unit Testing, Machine Learning, Security Engineering, Open Source Contributions, JavaScript, Go, C++
Keywords
AI, Software Engineering, Python, Data Pipelines, Benchmarks, Machine Learning, Debugging, Code Quality, Version Control, CI/CD, Unit Testing, Security Engineering, Open Source, JavaScript, Go, C++