Please type at least 3 characters

Site Reliability Engineer

Provide and constantly improve site reliability and incident response

type of job

full-time

updated at

20 months ago

job details

Role Overview

Want to build Web 3 with us? The next few years in crypto, NFTs and Web3 belong to builders and believers—not short-term speculators. At Rarible, we believe Web 3 will only proliferate when teams build excellent infrastructure, gaps and solutions that serve communities and create a better internet for everyone. If that sounds like music to your ears, we’re looking for you.

Here’s why: We are looking for a Site Reliability Engineer.  

You have experience and are culturally aligned with fast-moving small teams. You have experience at globally distributed startups. You are self-driven, are comfortable wearing many hats, and can deliver swiftly when needed. You can identify company priorities, own them, and iterate quickly to ship the best solution.

Responsibilities

  • Work closely with engineering teams to ensure Rarible well operated and monitored systems, which are designed and implemented for failure.
  • Provide incident response and support for our production systems.
  • Continuously work with engineering teams to improve MTTR (Mean Time to Recovery).
  • Automate our operational processes as needed, with accuracy and in compliance with our security requirements.
  • Improve tools and advocate operational excellence for continuous monitoring, self-healing systems and alert transparency.
  • Work on tooling, documentation, playbooks and education needed to ensure that engineering teams could deliver and maintain reliable, observable and scalable systems in self-managed format.
  • Make sure that reliability related metrics are calculated, communicated and continuously improved.

Requirements

  • You have 5+ years of relevant experience in ensuring reliability and scalability of production systems.
  • You are proactive and good at communication.
  • Monitoring and observability of the systems is one of your main skills, including usage of tracing, RUM and advanced alerts.
  • Good in programming languages such as TypeScript/JavaScript and Java/Kotlin/Scala.
  • Worked closely with Software Engineers on a day-to-day basis in ensuring together reliability of production systems and having incident response for both infra and software levels.
  • Experience with CI/CD so you can improve deployment process and reduce risks.
  • Deeply understand and worked with Kubernetes and LXC (Linux Containers).
  • Managed: MongoDB, Postgresql, Elasticsearch, Kafka, JVM.

Benefits

  • Working for a rapidly expanding global organization
  • Mentorship, training and career progression plans with leadership focused on developing the teams
  • Team that cares about products and working conditions
  • Flexible hours
  • Full-time, paid vacations 
  • Remote first with relocation packages available

About us

Rarible is a top multichain, community-centric NFT marketplace. It is underpinned by Rarible Protocol, community-governed NFT trading API that simplifies building community marketplaces and other ground-breaking NFT projects and integrations.

With over $300 million in trading volume to date, Rarible is one of the leading NFT brands, constantly innovating on the decentralized solutions for the web3 space.

We are growing and evolving non-stop, and are looking for a Partner Manager to join our dynamic and passionate web3 team.