Senior Software Engineer — AI Evaluation & Benchmarks
Role responsibilities
Design and implement coding benchmarks for evaluating AI models and build scalable data pipelines for AI evaluation workflows. Analyze AI-generated code and create structured evaluation scenarios to rigorously test reasoning and debugging capabilities.
Requirements
Candidates must have at least 4 years of professional software engineering experience and expert proficiency in Python. Experience with LLM coding benchmarks, version control systems, and a strong command of modern development workflows is also required.
Key skills
Python, Software Engineering, Data Pipelines, AI Evaluation, Version Control, LLM Coding Benchmarks, CI/CD, Unit Testing, Debugging, Code Quality, Communication, Autonomy, Security Engineering, Open-Source Contributions, Machine Learning, Programming Languages
Keywords
Software Engineering, AI Evaluation, Benchmarks, Data Pipelines, Python, LLM, Version Control, CI/CD, Unit Testing, Debugging, Code Quality, Machine Learning, Open-Source, Security Engineering, Programming Languages, Scalable Infrastructure