Staff Machine Learning Operations Engineer (Global Security)
🇨🇦RBC
Job Description
Job Description What is the opportunity? In this role, you will be part of the dynamic and critical function of Global IT Risk where our role is to modernize our IT Risk & Control Assessment. These Assessments aim to provide a data-driven Inherent and Residual Risk Rating for every application in RBC across domains of Risk. These IT Risk Assessments drive tremendous impact and value. Today we can only service a subset of applications and manually create the overall risk posture. This solution will scale out our assessment capability across all appcodes globally and domains and will direct leadership decisions on how to reduce the impact and likelihood of regulatory, financial or reputational risks to RBC. As the Staff Machine Learning Operations Engineer (Global Security), you will support this AI modernization initiative by operationalizing advanced AI models that power our IT Risk & Control Assessment platform. Working closely with our Staff AI Engineer, you'll own the deployment, validation, and operational excellence of ML models—ensuring they transition smoothly from development environments to production, perform reliably, and continuously improve based on real-world feedback. This is an ideal role for an engineer who understands both ML fundamentals and production systems—someone who can speak the language of data scientists while architecting robust deployment pipelines. What will you do? Architect and maintain integration infrastructure across the GITR ecosystem that supports multiple models in production, including seamless integration with external model providers Deploy and operationalize AI models from development to production, designing validation frameworks to test model performance, accuracy, and reliability before and after deployment Manage model lifecycle—versioning, monitoring, rollback procedures, and continuous optimization based on real-world performance Build data integration pipelines connecting models to control data sources, assessment engines, and downstream reporting systems Lead deployment and operationalization sub-projects/spikes, making informed recommendations on deployment strategies, architecture approaches, and integration methods Collaborate closely with the Principal AI Engineer to understand model requirements and ensure seamless, production-ready implementation Own platform reliability and performance—monitor, troubleshoot, and optimize the AI platform to ensure consistent availability and performance GITR AI Champion—support training and adoption of AI into process re-designs, ensuring teams understand platform capabilities and best practices What do you need to succeed? Must-have Strong backend development experience with: Python Flask, Fast API Deep expertise with: Redis, PostgreSQL with pgvector capabilities Container orchestration and deployment: Docker, Kubernetes, Openshift ML Workflow Orchestration – Apache Airflow, Kubeflow Model Training and Model Serving – Dagster, MLflow, or similar tool Cloud
Read original postingRequired Skills
RBC