CTC006238 - Cloud & DevOps SRE

Secteur industriel: Bancaire/Banking
Type d'emploi: Contract
Durée: Seven months
Mode de travail: On Site

Description

  • PROFILE: Looking for a SENIOR developer profile - someone who can be a COACH for the team
  • PERM: Strong possibility of conversion. Possibility and manager's preference for a resource that can be converted to permanent, but open to temporary
  • BILINGUALISM: Preference for bilingualism, but is open to an English speaker who understands French very well. Must be able to understand a French speaking team.
  • Short Description

  • As a Cloud Site Reliability Engineer (SRE), you are the expert in helping the organization operate the cloud effectively.
  • Team Mission

  • The squad has a four-pronged mission; Technology Architecture, Cloud Solution Architecture, GitOps-based Cloud Project Management and Cloud Operational Governance.
  • Your role

  • You will actively participate in the realization of our DevOps vision by defining and improving our operational and cloud reliability practices and work with the teams to adopt these practices. You will improve the observability of IT systems while performing task-based operational work optimization (labor) to unlock teams with operational needs where automated or self-service solutions do not yet exist.
  • You focus on automating deployment pipelines, while documenting processes that reduce the impact of errors.
  • Simplify systems and ensure adoption of robust design patterns.
  • As an SRE Cloud expert, you are a generalist in all things DevOps and have extensive experience with SDLC software engineering practices.
  • Your primary responsibilities will be to

  • Promote a culture focused on DevOps and reliability by building strong relationships with technical and non-technical teams
  • Work with the broader software engineering community to design and implement highly resilient architectures and processes
  • Improve our cloud deployment practices for improved reliability, repeatability, and security
  • Identify automation opportunities and failure points in application pipelines
  • Automate and optimize provisioning and configuration of cloud services
  • Help develop chaos engineering practices to improve site reliability and internal incident response processes and tools
  • Help improve continuous deployment processes and tools
  • Help refine our security posture and automate security practices (security as code) in accordance with government and regulatory agency requirements
  • You will write code for all of the above and if not possible, document it in a Wiki
  • Technical Skills

  • University degree in related fields with 5 years experience or equivalent experience
  • Official certifications required; AWS, CKA, CNCF
  • Proficiency in Linux / Unix administration, including scripting; knowledge of Windows Server administration (asset)
  • Strong technical problem solving skills and knowledge of resilient software design patterns and architectures
  • Advanced knowledge of CLI/API, Terraform, Ansible, Python, Bash and Git
  • Good knowledge of one or more programming languages such as Python, Go, Java, Ruby, JavaScrip
  • Good knowledge of container administration, including Docker, Kubernetes, Istio or other service mesh implementations
  • Experience with networking, including VPC, SDN / VLAN, DNS, routers and firewalls
  • Experience with SQL, using RDS-PostgreSQL or other DBMS
  • Experience with monitoring/alerting tools such as Grafana, Prometheus, Sysdig, DataDog
  • Experience with log aggregation tools such as FluentD, ELK, Splunk
  • Experience with vault and secret management processes and tools
  • Strong communication and technical simplification skills
  • Notre barre latérale

    Bienvenue sur le nouveau site web de la CTC.