Site Reliability Engineer
Ottawa, ON | EST | Work from Home
Who we are
Tehama is the fastest, simplest, and most flexible approach to delivering secure and integrated digital end-user computing (EUC) available today. Our service delivery platform enables organizations to easily onboard, manage, and scale their distributed workforce, by providing “ready to work” integrated and secured EUC environments with controlled access and monitoring—connecting new workers in minutes, not months. The result is decreased time to value, decreased operational costs, and a flexible work model that lets you move at the speed of today’s world.
As a Site Reliability Engineer, you'll be responsible for identifying, troubleshooting and reporting platform problems to product engineers (or fixing the code yourself) in order to ensure that our groundbreaking cloud-based product Tehama is providing a stable and reliable service. We're looking for people who are just as passionate about troubleshooting issues with distributed systems as they are to automate, code and collaborate to solve problems.
Who we are looking for:
- Experience with monitoring & metrics tools (e.g New Relic, Sumo Logic, Grafana, Kibana, ...)
- Knowledge of networking architecture (e.g Load Balancing, TCP/IP, Routing, Nginx, ...)
- Experienced with pipeline development for deploying SaaS within large-scale, cloud-based infrastructure, familiar with best practices for deployments (e.g Blue/Green, A/B Testing, ...)
- Experience with deployment of infrastructure and applications in AWS, Azure or GCP public cloud
- Experience with Docker with AWS Elastic Container Service or Kubernetes
- Experience developing software to automate API-driven tasks at scale, using languages and frameworks like Java, Python, etc...
- Be self-motivated and have a strong work ethic; willing to go the extra mile when it's crunch time
- Experience working in a startup environment
What you will be working on:
- Reporting and solving problems within our cloud infrastructure services and collaborate on issues with product engineers
- Participating in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes
- Monitoring Cloud infrastructures, responding to incidents, correcting and improving systems to prevent incidents and planning capacity
- Measuring service KPIs, providing live data on availability, performance and system health
- Refine and evolve monitoring and alerting infrastructure
- Planning, implementing, supporting and continuously improving our Cloud environment with a focus on enhancing the scalability, reliability, performance, and availability
Automation, write and maintain automation scripts and leverage configuration
What do you get in return:
- Competitive total rewards package, including a self-directed Professional Development budget to invest in your skills growth
- Why Commute? Work remotely when you like
- Build a product that will enable the remote future of work & collaborate digitally with the industry’s top minds!
- Team socials every Friday & Monthly Town Halls to bring our global teams together
- For this job an equivalent combination of education and experience, which results in demonstrated ability to apply skills will also be considered
- Tehama is an equal opportunity employer and welcomes applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process.
- All applicants will need to fulfill the requirements necessary to obtain a background check
- No Recruitment Agencies please