Principal Platform Software Engineer
Oracle Cloud Infrastructure is seeking an Principal Engineer to join the OCI Developer Tools Team and help shape the next generation of AI-powered developer experiences.
The OCI Developer Tools organization builds products and platforms that improve how developers interact with OCI across the software development lifecycle. This includes tools and experiences for cloud application development, command-line workflows, SDKs, APIs, IDE integrations, CI/CD, infrastructure automation, diagnostics, documentation discovery, and operational troubleshooting.
In this role, you will design and build Agentic AI capabilities that help developers understand complex cloud systems, generate and improve code, troubleshoot failures, automate repetitive workflows, reason over logs and documentation, interact with OCI services, and safely execute multi-step development tasks. These systems will combine large language models, retrieval-augmented generation, tool use, workflow orchestration, code intelligence, cloud APIs, and enterprise-grade safety controls.
This is a senior individual contributor role for an engineer who can operate independently in ambiguous technical spaces, define architecture, influence product direction, mentor engineers, and deliver high-impact capabilities for OCI customers and internal engineering teams.
Key Responsibilities
- Design, build, and operate Agentic AI-powered developer tools for the OCI Developer Tools organization.
- Develop AI agents that assist with code authoring, debugging, test generation, build failure analysis, deployment guidance, infrastructure automation, cloud diagnostics, and developer workflow optimization.
- Build systems that combine LLMs, retrieval-augmented generation, tool calling, workflow orchestration, code intelligence, structured outputs, and OCI service APIs.
- Create agent workflows that can reason across source code, SDKs, APIs, CLI commands, documentation, build logs, telemetry, repositories, deployment artifacts, and cloud resource metadata.
- Design safe and reliable agent execution patterns, including human-in-the-loop approval, guardrails, access control, audit logging, tool-use constraints, error recovery, and policy-aware automation.
- Partner with product managers, UX designers, developer relations, cloud service teams, security, and infrastructure teams to translate developer pain points into scalable AI product capabilities.
- Build evaluation frameworks for developer-facing AI systems, including task success, code correctness, grounding quality, tool-call accuracy, hallucination detection, latency, cost, safety, and regression metrics.
- Contribute to platform architecture for AI-assisted development, including agent runtimes, context management, prompt orchestration, model routing, evaluation pipelines, telemetry, and feedback loops.
- Ensure AI-powered developer tools meet OCI standards for security, privacy, reliability, compliance, operational readiness, scalability, and enterprise-grade quality.
- Provide technical leadership through design documents, architecture reviews, code reviews, mentoring, prototyping, and cross-team technical alignment.
- Stay current with advances in Agentic AI, LLM application design, AI coding assistants, developer productivity tools, cloud-native development, and responsible AI, and apply them pragmatically to OCI products.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Artificial Intelligence, Machine Learning, or a related technical field, with 10+ years experience.
- Strong professional experience designing and building large-scale distributed systems, developer platforms, cloud services, or enterprise software products.
- Hands-on experience building applications using large language models, including prompt design, structured outputs, function calling, tool use, retrieval-augmented generation, or AI workflow orchestration.
- Practical understanding of Agentic AI patterns, including planning, reasoning loops, task decomposition, tool invocation, memory, context management, agent state, and autonomous or semi-autonomous execution.
- Strong programming experience in one or more languages such as Java, Python, Go, or similar.
- Experience building developer-facing tools such as CLIs, SDKs, APIs, IDE extensions, build systems, CI/CD platforms, testing frameworks, observability tools, infrastructure-as-code tooling, or cloud development platforms.
- Strong understanding of modern software development workflows, including source control, code review, testing, build automation, deployment pipelines, release management, and production operations.
- Experience with cloud-native architecture, including microservices, APIs, containers, distributed systems, asynchronous workflows, authentication, authorization, and service observability.
- Familiarity with AI/ML infrastructure components such as embedding models, vector databases, model serving, model evaluation, telemetry, and experimentation frameworks.
- Ability to reason about risks in AI-powered developer tools, including incorrect code generation, hallucinated APIs, prompt injection, unsafe tool execution, data leakage, permission misuse, and unreliable automation.
- Demonstrated ability to lead complex technical projects independently, influence architecture across teams, and deliver high-quality production systems.
- Strong written and verbal communication skills, with the ability to explain complex technical decisions to engineering, product, and leadership audiences.
Preferred Qualifications
- Experience building AI coding assistants, developer copilots, autonomous debugging agents, test generation systems, build failure analyzers, cloud troubleshooting agents, or AI-powered DevOps tools.
- Experience with agent frameworks or orchestration technologies such as LangChain, LangGraph, CrewAI, or custom agent runtimes.
- Experience with commercial or open-source LLM ecosystems such as OpenAI, Anthropic, Google Gemini, Cohere, Meta Llama, Mistral, or enterprise-hosted models.
- Experience designing systems that reason over large codebases, dependency graphs, APIs, SDKs, cloud service documentation, build artifacts, logs, metrics, traces, and runtime telemetry.
- Deep understanding of developer experience, developer productivity, software engineering workflows, and cloud application development.
- Experience with OCI, especially in areas such as developer tools, DevOps, cloud automation, identity, observability, infrastructure provisioning, networking, compute, storage, or managed AI services.
- Experience with Kubernetes, containers, Terraform, CI/CD systems, workflow engines, message queues, distributed job execution, or service deployment platforms.
- Knowledge of secure software supply chain practices, including artifact integrity, dependency scanning, secrets handling, policy enforcement, provenance, and deployment governance.
- Experience building RAG systems with semantic retrieval, hybrid search, reranking, chunking strategies, grounding validation, citation-aware responses, and access-controlled retrieval.
- Experience evaluating LLM and agentic systems using golden datasets, synthetic test generation, human review, automated scoring, red teaming, online experimentation, and regression testing.
- Experience optimizing AI-powered systems for latency, throughput, reliability, token efficiency, cost, model selection, and service availability.
- Track record of technical leadership through architecture ownership, patents, publications, open-source contributions, platform delivery, or high-impact developer tooling initiatives.
- Experience mentoring engineers and raising the engineering bar across a team, platform, or organization.
Career Level - IC4