Career Opportunities Contact Us Français Italiano
Home Services Solutions Profile
In order to meet the growing needs of our customers, we are constantly searching for dynamic, qualified individuals to join the CTC resource team. Currently, there are several career opportunities available at CTC. If you feel you satisfy the qualifications for one of the positions listed on this page, please send your CV to resources@ctcinc.ca identifying the position(s) you are interested in. Qualified individuals will be contacted by our human resources department.
CTC005652 : SRE Cloud Developer
Location : Montreal, Quebec
Field : Technical Specialist
Position Type : Contract
Starting : January 25, 2021
Ending : December 17, 2021
Resources Required : 1
Position Description
  • Duration: 12 months with the possibility of permanence. *Option - Temporary to permanent important!
  • Bilingualism: Fluency in French is mandatory. English is only an asset.
  • Open rate
  • Teleworking: Until further notice. Possibility of a possible presence on site.

Short description:

  • As a Cloud Reliability Engineer, you are the expert who helps the organization effectively exploit the cloud. You do this by identifying reliability improvement opportunities and applying software engineering best practices.

Get an overview:

  • You are a transformation leader in our DevOps revolution focused on the resilience and reliability of our Cloud systems; you use leading practices and state-of-the-art tools. You are practical and have an Agile mindset. You master a range of skills, from development to systems engineering, through all phases of delivery and service lifecycles. From design, development, testing, deployment to ongoing technical support and maintenance "You automate what you can, document what you can't, and have the wisdom to know the difference". (Google SRE)

Your role:

  • You are a software engineer focused on operational excellence. You live by the Google SRE manual and seek opportunities to eliminate work, optimize infrastructure utilization, implement resilient architectures, and enjoy sharing your knowledge with the broader development community.
  • You actively participate in achieving our DevOps vision by defining and improving our cloud reliability and operational practices (e.g. availability, performance, scalability, resilience, SLI / SLO / SLA definition, incident response and problem management investigation) and work with teams to adopt these practices. You improve the observability of IT systems while performing task-based operational work optimization to unlock teams with operational needs where automated or self-service solutions do not yet exist.
  • You focus on building automation while documenting processes that reduce the impact of errors, simplify systems and ensure the adoption of robust design models. As an SRE Cloud expert, you are a generalist in all aspects of DevOps and have extensive experience in software engineering practices.

Your main responsibilities

  • Refine the cloud operational management framework and identify automation opportunities and points of failure
  • Automate and optimize provisioning and configuration of cloud services
  • Enhance our cloud deployment practices for improved reliability, repeatability and security
  • Participate in incident management processes to accelerate return to service, identify root causes and develop tools to prevent recurrence of problems. Participate actively in the post-mortem and remediation process
  • Help develop chaos engineering practices to improve site reliability and internal incident response processes and tools
  • Design effective observability mechanisms to mitigate incidents, assist in root cause analysis, ensure appropriate alerting (including associated processes) and develop correlation and causality algorithms to improve understanding and system resilience
  • Help improve continuous deployment processes and tools
  • Help refine our security posture and automate security practices (security as code) in accordance with government and regulatory agency requirements
  • Promote a culture of DevOps and reliability by building relationships with technical and non-technical teams
  • Work with the broader software engineering community to design and implement highly resilient architectures and processes
  • You write code for all of the above, and if this is not possible, document it in an

Technical skills:

  • University degree in related fields with 3-5 years of experience or equivalent experience
  • Mastery of Linux / Unix administration, including scripting; knowledge of Windows Server administration (asset)
  • Strong technical problem solving skills and knowledge of resilient software design models and architectures
  • Superior knowledge of CLI / API, Terraform, Ansible, Python, Bash and Git
  • Good knowledge of one or more programming languages such as Python, Go, Java, C ++, Ruby, JavaScript
  • Good knowledge of container administration, including Docker, Kubernetes, Istio or other mesh service implementations
  • Networking experience, including mail order, SDN / VLAN, DNS, routers and firewalls
  • Experience with SQL, using RDS-PostgreSQL or other DBMS
  • Experience with monitoring/alert tools such as Grafana, Prometheus, Sysdig, DataDog
  • Experience with newspaper aggregation tools such as FluentD, ELK, Splunk
  • Experience with processes and tools for managing safes and secrets
  • Strong communication and technical simplification capabilities