This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Support Engineer based in Spain.
As a Senior Support Engineer, you will operate at the core of complex, production-grade cloud environments supporting advanced AI and distributed computing workloads. This is a highly technical, hands-on role focused on diagnosing and resolving critical infrastructure issues across Linux, Kubernetes, networking, storage, and GPU-based systems. You will act as a senior escalation point for high-impact incidents, working directly with engineering teams and customers to restore service and identify root causes. The role goes beyond traditional support, combining deep debugging, systems thinking, and automation mindset. You will contribute to improving observability, tooling, and operational maturity across large-scale cloud platforms. This position is ideal for someone who thrives in fast-moving environments and enjoys solving ambiguous, high-stakes technical problems.
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Support Engineer based in Spain.
As a Senior Support Engineer, you will operate at the core of complex, production-grade cloud environments supporting advanced AI and distributed computing workloads. This is a highly technical, hands-on role focused on diagnosing and resolving critical infrastructure issues across Linux, Kubernetes, networking, storage, and GPU-based systems. You will act as a senior escalation point for high-impact incidents, working directly with engineering teams and customers to restore service and identify root causes. The role goes beyond traditional support, combining deep debugging, systems thinking, and automation mindset. You will contribute to improving observability, tooling, and operational maturity across large-scale cloud platforms. This position is ideal for someone who thrives in fast-moving environments and enjoys solving ambiguous, high-stakes technical problems.
Accountabilities
- Investigate, troubleshoot, and resolve complex production issues across cloud and customer environments with a strong focus on root cause analysis.
- Debug across Linux systems, Kubernetes clusters, networking layers, storage systems, and GPU-accelerated workloads.
- Act as a senior escalation point for critical incidents, ensuring fast and effective resolution of high-impact issues.
- Collaborate closely with engineering teams to reproduce issues, identify systemic causes, and drive long-term fixes.
- Support customers running containerized applications, distributed systems, AI/ML pipelines, and inference or training workloads.
- Develop and improve internal tools, scripts, and automation (Python, Bash, Go or similar) to enhance troubleshooting efficiency.
- Contribute to observability, monitoring, and operational process improvements to increase platform reliability and scalability.
- Communicate clearly and effectively with customers and internal stakeholders during active incidents and technical investigations.
- Participate in weekend on-call rotation and urgent incident response activities.
- Strong hands-on experience with Linux system administration and troubleshooting in production environments.
- Solid expertise in Kubernetes and containerized application environments.
- Good understanding of cloud infrastructure platforms such as AWS, GCP, Azure, or OpenStack.
- Strong networking fundamentals, including troubleshooting of complex distributed systems.
- Ability to write scripts or small automation tools in Python, Bash, Go, or similar languages.
- Experience working on production incidents requiring structured debugging and cross-team collaboration.
- Strong analytical mindset with the ability to work independently in ambiguous or high-pressure situations.
- Excellent written and verbal communication skills, especially when explaining technical issues to non-technical stakeholders.
- Experience with GPU-based infrastructure, AI/ML workloads, or LLM pipelines is highly desirable.
- Familiarity with observability tooling, operational workflows, and infrastructure automation is a strong plus.
- Demonstrated ability to improve systems through automation, tooling, or internal engineering contributions is valued.
- Competitive compensation package aligned with experience and expertise.
- Strong focus on career development, learning, and technical growth opportunities.
- Flexible working arrangements with high autonomy and ownership.
- Exposure to cutting-edge AI, cloud infrastructure, and large-scale distributed systems.
- Collaborative, international environment with highly skilled engineering teams.
- Opportunity to work on impactful production systems powering AI workloads at scale.
- Inclusive and innovation-driven culture focused on continuous improvement and engineering excellence.