Back to jobs
R

Senior AI Platform Engineer

馃嚚馃嚘RBC

CALGARY, Alberta, Canada0 applicants
Posted 1d agoApr 30, 2026, 12:00 AMApply by Fri, May 15, 2026
Full TimeSenior

Job Description

Job Description What's the opportunity? We're looking for an experienced Senior AI Platform Engineer who will bring focus and subject-matter expertise around designing and implementing reliable, scalable AI service infrastructure and automation systems. This is a unique opportunity to grow in the world of AI operations and work with a team of passionate individuals committed to bringing enterprise-grade reliability to our production AI services. At RBC Borealis, you鈥檒l be joining a team that works directly with leading researchers in machine learning, has access to rich and massive datasets, and offers the computational resources to support ongoing development in areas such as reinforcement learning, unsupervised learning and computer vision. You can find out more about our research areas at rbcborealis.com. Your responsibilities include: Designing, building, and optimizing AI service reliability infrastructure and automation systems that operate the business's AI and ML applications Designing and implementing best practices and standards for reliability, observability, and incident response across AI systems and ML pipelines Collaborating with engineers and machine learning researchers to ensure continuous deployment, monitoring, and resilience of AI applications at scale Supporting AI applications and projects with infrastructure design decisions, capacity planning, and comprehensive observability Building highly scalable, resilient cloud and on-premise systems for hosting AI services using state-of-the-art technologies You're our ideal candidate if you have: Strong and relevant experience designing and implementing distributed systems and reliability infrastructure for AI systems Proven expertise in Site Reliability Engineering practices, including observability, alerting, and incident management Working with building and maintaining CI/CD pipelines such as Jenkins, GitHub Actions, or similar tools In-depth knowledge of Kubernetes and OpenShift Container Platform (OCP4) or similar container orchestration platforms Hands-on experience with observability and monitoring platforms such as Dynatrace, Datadog, or similar solutions Experience implementing logging and tracing solutions for distributed systems using platforms like Elasticsearch or similar tools Experience optimizing containerized workloads and managing cloud infrastructure across hybrid environments Hands-on experience building and deploying hybrid environments on-prem and major cloud environments, such as AWS and Azure Experience managing NoSQL databases such as MongoDB in production environments Familiarity with machine learning model deployment, serving, and operational requirements Experience or interest in exploring self-hosted machine learning model infrastructure and agentic workflow systems Familiarity with programming languages such as Python, Bash or JavaScript What's in it for you? Become part of a team that thinks progressively and works collaboratively. We care about seein

Read original posting

Required Skills

R
R

RBC