CTC - Détails de l'opportunité de carrière

CTC006238 - Cloud & DevOps SRE

Secteur industriel: Bancaire/Banking

Type d'emploi: Contract

Durée: Seven months

Mode de travail: On Site

Description

PROFILE: Looking for a SENIOR developer profile - someone who can be a COACH for the team

PERM: Strong possibility of conversion. Possibility and manager's preference for a resource that can be converted to permanent, but open to temporary

BILINGUALISM: Preference for bilingualism, but is open to an English speaker who understands French very well. Must be able to understand a French speaking team.

Short Description

As a Cloud Site Reliability Engineer (SRE), you are the expert in helping the organization operate the cloud effectively.

Team Mission

The squad has a four-pronged mission; Technology Architecture, Cloud Solution Architecture, GitOps-based Cloud Project Management and Cloud Operational Governance.

Your role

You will actively participate in the realization of our DevOps vision by defining and improving our operational and cloud reliability practices and work with the teams to adopt these practices. You will improve the observability of IT systems while performing task-based operational work optimization (labor) to unlock teams with operational needs where automated or self-service solutions do not yet exist.

You focus on automating deployment pipelines, while documenting processes that reduce the impact of errors.

Simplify systems and ensure adoption of robust design patterns.

As an SRE Cloud expert, you are a generalist in all things DevOps and have extensive experience with SDLC software engineering practices.

Your primary responsibilities will be to

Promote a culture focused on DevOps and reliability by building strong relationships with technical and non-technical teams

Work with the broader software engineering community to design and implement highly resilient architectures and processes

Improve our cloud deployment practices for improved reliability, repeatability, and security

Identify automation opportunities and failure points in application pipelines

Automate and optimize provisioning and configuration of cloud services

Help develop chaos engineering practices to improve site reliability and internal incident response processes and tools

Help improve continuous deployment processes and tools

Help refine our security posture and automate security practices (security as code) in accordance with government and regulatory agency requirements

You will write code for all of the above and if not possible, document it in a Wiki

Technical Skills

University degree in related fields with 5 years experience or equivalent experience

Official certifications required; AWS, CKA, CNCF

Proficiency in Linux / Unix administration, including scripting; knowledge of Windows Server administration (asset)

Strong technical problem solving skills and knowledge of resilient software design patterns and architectures

Advanced knowledge of CLI/API, Terraform, Ansible, Python, Bash and Git

Good knowledge of one or more programming languages such as Python, Go, Java, Ruby, JavaScrip

Good knowledge of container administration, including Docker, Kubernetes, Istio or other service mesh implementations

Experience with networking, including VPC, SDN / VLAN, DNS, routers and firewalls

Experience with SQL, using RDS-PostgreSQL or other DBMS

Experience with monitoring/alerting tools such as Grafana, Prometheus, Sysdig, DataDog

Experience with log aggregation tools such as FluentD, ELK, Splunk

Experience with vault and secret management processes and tools