CTC007814 - Platform Support & Reliability Engineer - Wireline EMS

Industry Sector: Telecommunications
Job Type: Contract
Duration: One year
Work Mode: Mixed

Description

Experience SRE Engineers with support experience only.

Locations being considered - Flexible on locations where client has presence (Toronto, London, Montreal, Ottawa, Halifax, St Johns, etc)

Bilingual not required, interviews will be in English

Hybrid role - 3 days in office preferred but not mandatory (as this is a contractual position)


Typical day to day in this role.  

  • The position is for leading delivery & support of a large-scale IP Network Management platform that is Kubernetes based.

  • The day-to-day responsibilities include

  1. collaborating with IP Network specialists/architects to troubleshoot and resolve issues,

  2. deploying automation & reliability initiatives on an infrastructure set of 125+ servers for proactive monitoring/issue-detection/self-healing measures leveraging the latest SRE toolkit/tech stack,

  3. working with the vendor to resolve platform related issues + developing roadmap for platform life-cycle and challenging vendor to quickly mitigate platform risks + deploy features/fixes based on network specialists’ needs.

  • Also includes participating in pager rotation for 24/7 support.


Top 3 skills sets and qualifications you want to see on a candidate’s resume -

  • Support of Kubernetes based platforms with proven experience of critical issues mitigation.

  • Demonstrated experience with monitoring & observability (Zabbix/Dynatrace/Datadog for infrastructure monitoring and ELK stack for log aggregation + visualization + analysis).

  • Fundamental knowledge of TCP/IP Networks – ideally in a telco environment.


Interview process - 2 Rounds of interviews (in-person preferred but not mandatory based on candidate location) including initial screening to guage past experience and a second technical deep-dive


Projects you will be working on - 24/7 Support of an IP Network Management System + Platform lifecycle initiatives (new infra deployment and management, application patching & upgrades)


Potential for extension or to hire full time will be decided based on calibre of candidate and business needs through the duration of the contract.



The successful candidate will be accountable for the following:

  • Deeply understands business drivers and cross-departmental impacts

  • Develops business cases to justify application related capital investments

  • Translates business requirements into technical requirements.

  • Explaining complicated technical issues in a simplistic way to all levels of the organization

  • Leads system requirements gathering for scalable, robust, and optimized designs

  • Provides input and direction to vendors to ensure optimal designs

  • Provides analysis and recommendations for new software / infrastructure

  • Evaluates test results to determine pass/fail status

  • Supports the project team with defect resolution during test activities

  • Develops Method of Procedure (MOP) documentation

  • Creates or provides input and approval of deployment plans

  • Ensures Agile and risk managed deployments of new code to production

  • Accountable to ensure technical documentation (architecture diagrams, as-built design information) is in place and kept up to date

  • Interprets application availability metrics, ensuring actions are proactively taken to achieve target results. Creates new metrics as required to ensure appropriate visibility of key performance indicators

  • Troubleshoots production issues as 3rd-level escalation contact by supporting Operational teams


Critical Skills / Competencies:

  • PROVEN EXPERIENCE WITH COMPLETE SRE STACK (DEPLOYMENT, OBSERVABILITY & SECURITY)

  • Strong interest in building bridges with technical & non-technical teams

  • Strong desire for continuous learning and a desire to mentor and be mentored

  • Strong technical writing skills (ability to write clearly and concisely)

  • Strong troubleshooting skills (ability to uncover root cause rapidly and provide resolution or workaround)

  • Proven ability to meet aggressive deadlines & work under pressure with competing priorities

  • Experience with leading edge concepts and techniques like multi-threading, high availability, virtualization, containerization and database performance analysis tools

  • Deep knowledge of DB systems

  • Familiar with Web Services concepts

  • Performance analysis and tuning experience


Preferred skills / Competencies:

  • Bachelor degree in computer science, software engineering, IT or a related discipline

  • Operating Systems: Red Hat Enterprise Linux, Windows Server 2019-2022

  • Databases: Certifications and or experience with Oracle, MSSQL, SQL, data modeling

  • Devops tooling (Gitlab CI/CI, Zabbix/datadog/dynatrace)

  • Experience with continuous integration/delivery pipelines and automation

  • A good understanding of application/platform security concepts and best practices

  • Network (TCP/IP/Ethernet) Networks - deep knowledge (CCNA or equivalent preferred)

  • Experience or passion related to migration from on-premises deployments to cloud and transition to microservices

  • Experience with data warehousing, report generation and ETL tools

  • Develops Proof of Concepts (POCs) to verify a design is scalable

Our Sidebar

Welcome to our house!