Senior Software Engineer — AI Evaluation & Benchmarks

Alignerr·LinkedIn
United StatesCONTRACTORPosted Jun 29, 2026
Open original posting

Role responsibilities

Design and implement coding benchmarks for evaluating AI models and build scalable data pipelines for AI evaluation workflows. Analyze AI-generated code and create structured evaluation scenarios to rigorously test reasoning and debugging capabilities.

Requirements

Candidates must have at least 4 years of professional software engineering experience and expert proficiency in Python. Experience with LLM coding benchmarks, version control systems, and a strong command of modern development workflows is also required.

Key skills

Python, Software Engineering, Data Pipelines, AI Evaluation, Version Control, LLM Coding Benchmarks, CI/CD, Unit Testing, Debugging, Code Quality, Communication, Autonomy, Security Engineering, Open-Source Contributions, Machine Learning, Programming Languages

Keywords

Software Engineering, AI Evaluation, Benchmarks, Data Pipelines, Python, LLM, Version Control, CI/CD, Unit Testing, Debugging, Code Quality, Machine Learning, Open-Source, Security Engineering, Programming Languages, Scalable Infrastructure

Want jobs like this matched to you?

Swoopd scores fresh postings against your résumé so you only see the matches that matter.

Get started free