In order to meet the growing needs of our customers, we are constantly searching for dynamic, qualified individuals to join the CTC resource team. Currently, there are several career opportunities available at CTC. If you feel you satisfy the qualifications for one of the positions listed on this page, please send your CV to
identifying the position(s) you are interested in. Qualified individuals will be contacted by our human resources department.
CTC005652 : SRE Cloud Developer
Position Type :
January 25, 2021
December 17, 2021
Resources Required :
Duration: 12 months with the possibility of permanence. *Option - Temporary to permanent important!
Bilingualism: Fluency in French is mandatory. English is only an asset.
Teleworking: Until further notice. Possibility of a possible presence on site.
As a Cloud Reliability Engineer, you are the expert who helps the organization effectively exploit the cloud. You do this by identifying reliability improvement opportunities and applying software engineering best practices.
Get an overview:
You are a transformation leader in our DevOps revolution focused on the resilience and reliability of our Cloud systems; you use leading practices and state-of-the-art tools. You are practical and have an Agile mindset. You master a range of skills, from development to systems engineering, through all phases of delivery and service lifecycles. From design, development, testing, deployment to ongoing technical support and maintenance "You automate what you can, document what you can't, and have the wisdom to know the difference". (Google SRE)
You are a software engineer focused on operational excellence. You live by the Google SRE manual and seek opportunities to eliminate work, optimize infrastructure utilization, implement resilient architectures, and enjoy sharing your knowledge with the broader development community.
You actively participate in achieving our DevOps vision by defining and improving our cloud reliability and operational practices (e.g. availability, performance, scalability, resilience, SLI / SLO / SLA definition, incident response and problem management investigation) and work with teams to adopt these practices. You improve the observability of IT systems while performing task-based operational work optimization to unlock teams with operational needs where automated or self-service solutions do not yet exist.
You focus on building automation while documenting processes that reduce the impact of errors, simplify systems and ensure the adoption of robust design models. As an SRE Cloud expert, you are a generalist in all aspects of DevOps and have extensive experience in software engineering practices.
Your main responsibilities
Refine the cloud operational management framework and identify automation opportunities and points of failure
Automate and optimize provisioning and configuration of cloud services
Enhance our cloud deployment practices for improved reliability, repeatability and security
Participate in incident management processes to accelerate return to service, identify root causes and develop tools to prevent recurrence of problems. Participate actively in the post-mortem and remediation process
Help develop chaos engineering practices to improve site reliability and internal incident response processes and tools
Design effective observability mechanisms to mitigate incidents, assist in root cause analysis, ensure appropriate alerting (including associated processes) and develop correlation and causality algorithms to improve understanding and system resilience
Help improve continuous deployment processes and tools
Help refine our security posture and automate security practices (security as code) in accordance with government and regulatory agency requirements
Promote a culture of DevOps and reliability by building relationships with technical and non-technical teams
Work with the broader software engineering community to design and implement highly resilient architectures and processes
You write code for all of the above, and if this is not possible, document it in an
University degree in related fields with 3-5 years of experience or equivalent experience
Mastery of Linux / Unix administration, including scripting; knowledge of Windows Server administration (asset)
Strong technical problem solving skills and knowledge of resilient software design models and architectures
Superior knowledge of CLI / API, Terraform, Ansible, Python, Bash and Git
Good knowledge of container administration, including Docker, Kubernetes, Istio or other mesh service implementations
Networking experience, including mail order, SDN / VLAN, DNS, routers and firewalls
Experience with SQL, using RDS-PostgreSQL or other DBMS
Experience with monitoring/alert tools such as Grafana, Prometheus, Sysdig, DataDog
Experience with newspaper aggregation tools such as FluentD, ELK, Splunk
Experience with processes and tools for managing safes and secrets
Strong communication and technical simplification capabilities
Apply for this position
Printer Friendly Version
© 2012 CT Consultants. All rights reserved