AI/ML Engineer

C-Serv·Workable

United KingdomPosted Jun 29, 2026

Most engineers get to use large language models. You will get to build the systems that run them in production, at scale, under real latency and reliability constraints, for one of the most recognised names in application security and traffic management.

Our client is expanding its AI Core group and is looking for a Senior or Staff AI & ML Engineer in the UK. This is hands-on, build-focused work at the centre of how the business turns generative AI from a prototype into a dependable product. You will design and ship multi-agent systems, retrieval-augmented generation pipelines, anomaly detection, fine-tuned models, and the real-time inference layer that serves them.

If you want ownership over a serving stack rather than a single notebook, and you want your work measured by what holds up in production rather than what demos well, this is the seat.

What you will own

• Building and shipping generative AI features end to end: from model selection and fine-tuning through to the inference path that serves them.

• Designing multi-agent and RAG architectures, and anomaly detection, that are accurate, observable, and cost-aware at scale.

• Owning the real-time inference layer on Triton and TensorRT, optimising for latency, throughput, and GPU efficiency.

• Standing up the surrounding microservices in Python and FastAPI, containerised and orchestrated for reliability.

• Setting the technical bar: making architecture decisions, raising code quality, and mentoring engineers around you.

• Partnering with research and product teams to take ideas from experiment to a service customers depend on.

Requirements

• 5 to 10 years of relevant experience, including a proven track record as a technical lead who mentors others.

• Strong, current Python engineering, with production services built and shipped (FastAPI or similar).

• Genuine hands-on GenAI depth: LLMs, RAG, agentic or multi-agent workflows, anomaly detection, and fine-tuning (for example LoRA or PEFT).

• Real-time inference experience with NVIDIA Triton and TensorRT, with real attention to latency, throughput, and cost.

• A microservices mindset, with services built in Python and FastAPI, containerised and orchestrated for reliability.

• Solid grounding in Docker and Kubernetes, and large-scale distributed systems on a major cloud.

• Right to work in the UK. We welcome applications from all backgrounds and are committed to equal opportunity.

Nice to have

• Experience with vLLM or other serving frameworks alongside Triton and TensorRT.

• Experience in security, networking, or other high-reliability domains.

• Big-data tooling (Spark, Databricks, Snowflake) and modern MLOps practice.

What you will get

• A genuine build mandate inside an established AI Core team, not a proof-of-concept that never ships.

• Fully remote working anywhere in the UK, built around delivery rather than presence.

• Competitive salary, strong benefits, and clear scope to grow into staff and principal-level influence.

• The backing of C-Serv throughout: a delivery partner that runs a real quality filter and looks after its people, end to end.

Benefits

• Fully remote working anywhere in the UK, built around delivery rather than presence.

• A clear path to grow into staff and principal-level technical influence.

• Full support from C-Serv across the hiring process and beyond, with full-cycle accountability.

• A values-led, woman-owned delivery partner built on empathy, integrity, collaboration, and growth.