Checkly is looking for an experienced Site Reliability Engineer. This is a great opportunity to join an early stage company, influence the product roadmap and help us do what we love most: building the best monitoring platform for developers.
Make our reliability product more reliable
Checkly is — in essence — a reliability company. People trust our software to alert them when their software goes "poof". We use AWS Lambda/SQS/SNS/S3, Heroku, Postgres, Redis and soon ClickHouse to make this happen, from 20+ locations around the world.
Build and shape our SRE practices
You will play a key role in defining how to "do reliability". Together with your coworkers in the product engineering teams, you will be responsible for:
- Observability of our backend platform: define bottlenecks, track them and fix them.
- Optimize our performance and reduce error rates: from wild queries, to slow queues to Heisenbugs.
- Streamlining our on-call process and optimizing our runbooks.
- Work with the product folks to have reliability baked in to everything we do: define SLO's and SLA's and enforce them.
Requirements
- You have deep experience in operating and troubleshooting mission critical SaaS environments as an SRE.
- You have deep working experience with AWS, SQL & OLAP databases and Node.js.
- You like to work in a growing company with experienced founders.
- You know how to communicate with coworkers and customers in English.
- You are quick to pick up on new stuff and enjoy the process of learning new things.
- You love making software!
Bonus points
- Experience with building SaaS tools for developers.
- Obsessed with browser automation in the cloud.
Benefits
- Competitive salary
- Working hours are flexible and we support families: you can pick up your kids without worrying about work.
- An open, healthy workplace we all can enjoy and grow
- Work with the latest technologies
- Modern laptop and equipment provided