CTC - Détails de l'opportunité de carrière

CTC007814 - Platform Support & Reliability Engineer - Wireline EMS

Secteur industriel: Telecommunications

Type d'emploi: Contract

Durée: One year

Mode de travail: Mixed

Description

Experience SRE Engineers with support experience only.

Locations being considered - Flexible on locations where client has presence (Toronto, London, Montreal, Ottawa, Halifax, St Johns, etc)

Bilingual not required, interviews will be in English

Hybrid role - 3 days in office preferred but not mandatory (as this is a contractual position)

Typical day to day in this role.  

The position is for leading delivery & support of a large-scale IP Network Management platform that is Kubernetes based.
The day-to-day responsibilities include

collaborating with IP Network specialists/architects to troubleshoot and resolve issues,
deploying automation & reliability initiatives on an infrastructure set of 125+ servers for proactive monitoring/issue-detection/self-healing measures leveraging the latest SRE toolkit/tech stack,
working with the vendor to resolve platform related issues + developing roadmap for platform life-cycle and challenging vendor to quickly mitigate platform risks + deploy features/fixes based on network specialists’ needs.

Also includes participating in pager rotation for 24/7 support.

Top 3 skills sets and qualifications you want to see on a candidate’s resume -

Support of Kubernetes based platforms with proven experience of critical issues mitigation.
Demonstrated experience with monitoring & observability (Zabbix/Dynatrace/Datadog for infrastructure monitoring and ELK stack for log aggregation + visualization + analysis).
Fundamental knowledge of TCP/IP Networks – ideally in a telco environment.

Interview process - 2 Rounds of interviews (in-person preferred but not mandatory based on candidate location) including initial screening to guage past experience and a second technical deep-dive

Projects you will be working on - 24/7 Support of an IP Network Management System + Platform lifecycle initiatives (new infra deployment and management, application patching & upgrades)

Potential for extension or to hire full time will be decided based on calibre of candidate and business needs through the duration of the contract.

The successful candidate will be accountable for the following:

Deeply understands business drivers and cross-departmental impacts
Develops business cases to justify application related capital investments
Translates business requirements into technical requirements.
Explaining complicated technical issues in a simplistic way to all levels of the organization
Leads system requirements gathering for scalable, robust, and optimized designs
Provides input and direction to vendors to ensure optimal designs
Provides analysis and recommendations for new software / infrastructure
Evaluates test results to determine pass/fail status
Supports the project team with defect resolution during test activities
Develops Method of Procedure (MOP) documentation
Creates or provides input and approval of deployment plans
Ensures Agile and risk managed deployments of new code to production
Accountable to ensure technical documentation (architecture diagrams, as-built design information) is in place and kept up to date
Interprets application availability metrics, ensuring actions are proactively taken to achieve target results. Creates new metrics as required to ensure appropriate visibility of key performance indicators
Troubleshoots production issues as 3rd-level escalation contact by supporting Operational teams

Critical Skills / Competencies:

PROVEN EXPERIENCE WITH COMPLETE SRE STACK (DEPLOYMENT, OBSERVABILITY & SECURITY)
Strong interest in building bridges with technical & non-technical teams
Strong desire for continuous learning and a desire to mentor and be mentored
Strong technical writing skills (ability to write clearly and concisely)
Strong troubleshooting skills (ability to uncover root cause rapidly and provide resolution or workaround)
Proven ability to meet aggressive deadlines & work under pressure with competing priorities
Experience with leading edge concepts and techniques like multi-threading, high availability, virtualization, containerization and database performance analysis tools
Deep knowledge of DB systems
Familiar with Web Services concepts
Performance analysis and tuning experience

Preferred skills / Competencies: