Senior Site Reliability Engineer
🇨🇦RBC
Job Description
Job Description What is the opportunity? This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by the Digital Branch SRE organization. As the Engineering arm of the Digital Branch SRE organization, this team will work collaboratively with the Delivery arm of the same organization and any other IT partners required to succeed in its mandate. The incumbent will need intermediate knowledge and experience working in an application development and/or technology operations organization. Perform production support role and partner with the SRE Delivery team in incident management and problem management. What will you do? Participate in code and non-functional (performance, security, maintainability, compliance, change management) reviews of all production-bound SRE solutions Ensure problems are quickly identified and solved through review of Zeke / Splunk / Dynatrace / Salesforce monitoring, inbound calls, email or ServiceNow tickets while providing the highest possible level of production support Drive transformation by continuously looking for ways to automate existing processes Track, audit, monitor, and implement technical work streams Act as portfolio SME (Subject Matter Expert) – understand & document common components, core functionalities, and infrastructure of supported applications Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements Drive in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership Focus on Continuous improvement and technical standards – Drive improvements in productivity, monitoring, tooling, and best practices Manage technology currency (server patching, certificate renewal, compliance, etc.) with a keen eye on automating opportunities In this role, you will communicate and interact frequently with RBC partners and/or employees located across Canada and/or worldwide. What do you need to succeed? Must have: 5 years of working experience in Site Reliability Engineering (SRE) and best practices for running and maintaining critical systems, including monitoring, alerting, and incident management Intermediate experience in a variety of environments (Cloud, Linux/Unix/Windows and services/APIs, databases Working experience with scripting ideally in Java/.NET and SQL Strong expertise in major incident handling and communication. Issue investigation skills. Effective negotiation skills, stakeholder management Ability to influence the squad at an SRE level Hands-on experience in a variety of SRE languages and tools (Ansible, Dynatrace Managed, Moog, PagerDuty, ServiceNow, GitHub, Slack, Elastic, Logstash, Kibana, Blue Prism, Catch Point) Ability to work in a 7x24x365 work environment NICE-TO-HAVE: Knowledge of KAFKA, OCP, SCON infrastructure & processes Knowledge of cloud platform applications and
Read original postingRequired Skills
RBC