Do you want to be part of a team that helps over one million designers create amazing products every day? We're looking for a full-time Site Reliability Engineer to join us.
At Sketch we work with a unique technology blend: MacOS, iOS applications and a cloud platform. As an SRE, collaborating with Full Stack Engineers in the Infrastructure team, you will be able to work on a diverse stack and projects. Your work will include projects like developing a system to dynamically scale our Sketch document rendering services, achieving the lowest latency possible for our live collaboration feature, or tuning up our databases to be able to grow from thousands to millions of ingested Sketch Artboards a day.
In your day to day, you will focus on shaping our cloud infrastructure and make sure all the pieces work well together, from development environments to metrics processing and observability, including security policies, network design, deployment strategies, high availability, etc.
Our stack
Our stack is currently based on a mix of serverless and traditional server applications, along with other cloud services. Most pieces are deployed on AWS and automated through Terraform.
Our backend APIs are deployed in a mix of Lambda services and also within containers in EC2.
We also have some legacy services that we continuously migrate to AWS in order to have a consistent platform.
What will you be working on
- Automating our different product environments (development, production, etc.)
- Understanding and improving our platform behavior regarding scalability, observability and availability.
- Developing key infrastructure pieces for new projects
- Debugging, finding root causes, and helping to fix problems in production
- Improving security by managing developer authentication and authorization, as well as general auditing
Experience
- Experience developing in a programming language such as Python or Go
- Professional skills in Linux and have managed Linux-based cloud distributed systems in the past
- Experience with Infrastructure as Code tools such as Terraform
Key skills for the position
- Linux system administration
- Able to script and use configuration management tools to automate manual operations
- Understanding of the HTTP protocol and the behavior of production web services
- Excellent communication skills and a good written and spoken English
The ideal candidate
You are proactive and have a "get the job done" attitude. You are also not afraid of getting deeper and deeper in order to debug a problem, especially in production.
You like to back your decisions and proposals with arguments. As a part of a team with very skilled people, being an excellent team player is essential.
There are always many things to do at Sketch. You need to be an organized and communicative person.
You have experience with different stacks (mainly Linux based), technologies and production models and has participated actively on the build of important pieces of a cloud platform.
You can overlap at least 6 working hours with European timezones.
Even if you feel you are not 100% exactly the person described, we would still love to hear from you. We value anything that makes you different from the description.