Senior Software Engineer — AI Evaluation & Benchmarks

Alignerr·LinkedIn

United StatesCONTRACTORPosted Jun 29, 2026

Role responsibilities

Design and implement coding benchmarks for evaluating AI models and build scalable data pipelines for AI evaluation workflows. Analyze AI-generated code and create structured evaluation scenarios to rigorously test reasoning and debugging capabilities.

Requirements

Candidates must have at least 4 years of professional software engineering experience and expert proficiency in Python. Experience with LLM coding benchmarks, version control systems, and a strong command of modern development workflows is also required.

Key skills

Python, Software Engineering, Data Pipelines, AI Evaluation, Version Control, LLM Coding Benchmarks, CI/CD, Unit Testing, Debugging, Code Quality, Communication, Autonomy, Security Engineering, Open-Source Contributions, Machine Learning, Programming Languages

Keywords

Software Engineering, AI Evaluation, Benchmarks, Data Pipelines, Python, LLM, Version Control, CI/CD, Unit Testing, Debugging, Code Quality, Machine Learning, Open-Source, Security Engineering, Programming Languages, Scalable Infrastructure