Please type at least 3 characters

Reliability Engineer

operate a variety of off-chain services and tools for logging historical data and improve its reliability over time

type of job

full-time

last signal

2 months ago

job details

Role Overview

Operating Pyth Network is a nontrivial challenge. Our price feeds run 24x7. DeFi applications depend on the accuracy and availability of these feeds; an inaccurate price or offline feed can cause serious financial losses. Each feed in turn depends on many different services, some of which are run by our data providers and some by us. It’s a complex system with many different failure modes, but it has to work correctly all the time.

We also run a variety of off-chain services, such as the backend for the pyth.network website, and tools for logging historical data. These services run in a Kubernetes cluster that is managed using Terraform. We also need to ensure these services are running and healthy at all times.

We’re looking for people to help us operate this system and improve its reliability over time. This job has many different aspects, including providing front-line support for incidents, developing automation to manage our infrastructure, and defining deployment plans for high availability.

Responsibilities

  • Provide front-line response to incidents and outages, such as unavailable price feeds, or website downtime
  • Develop automation tools to provision and manage our infrastructure, including cloud services and Kubernetes clusters. We currently use Terraform to manage our infrastructure, but we’re not married to it and may use different tools in the future. Some of our tools are written in Python and others in Go
  • Design and implement operational plans to achieve high availability guarantees for our price feeds and web services. Build redundant service deployments, monitoring solutions, dashboards, and alerting tools to ensure that critical services are running continuously. Support services on development and production environments, from before launch through launch. Benchmark application resource consumption to allocate capacity
  • Measure and monitor application metrics (availability, latency, etc.) to understand the health of the system. Work with developers to add metrics and logging to their applications in order to facilitate Grafana dashboards and alerts. Develop logging practices and libraries to standardize metric reporting and alerting across multiple programming languages

Requirements

  • Comfortable developing software. Writing software is a big part of the job, as we write lots of tools to automate processes and monitor deployments
  • Solid understanding of Linux fundamentals, such as processes and permissions, along with an understanding of containers (Docker) and cloud deployments
  • Experience troubleshooting, monitoring and debugging cloud-native applications and distributed systems
  • Ability to handle shared operational and periodic on-call duties
  • 1+ years of experience supporting critical production environments. Work in financial and crypto markets is a plus
  • Predictable and reliable availability

Culture & Perks

  • We are a small team, and about half the team is technical
  • We are mostly remote. Team members live across the world, in the US, Europe, and Asia. We do have offices in some locations (Porto, Chicago, London, Amsterdam, Singapore) for those who prefer in-office work
  • Our team communicates with each other and external developers in English. Strong spoken and written English skills are required
  • We operate like a startup in the rapidly-growing and changing DeFi ecosystem. In order to be successful, we must adapt to meet the current needs of the market. Good candidates will help our organization adapt; they are flexible problem solvers who are willing and able to jump on whatever the occasion demands
  • Our success depends on external developers using our protocol. Good candidates will be able to write code with external developers and answer deep technical questions about Pyth Network
  • We offer a competitive salary and generous benefits package. Furthermore, where applicable, employees may be eligible for token allocations as part of Pyth Network’s employee incentive program

About Us

Pyth connects high-fidelity market data from the world’s largest professional traders and exchanges to any smart contract, anywhere.

Pyth Network is a specialized oracle solution for latency-sensitive financial data that is typically kept behind the “walled gardens” of centralized institutions. Pyth Network is focused on finding a new and inexpensive way to bring this unique data on-chain and aggregating it securely.

Our unique competitive advantage is our network of market data providers. Our network includes over 50 of the biggest exchanges, traders, and market makers in both the crypto and traditional finance worlds. These data providers have agreed to publish their proprietary data on-chain, which allows Pyth Network to build the fastest and most reliable price feeds. It also allows us to access financial data that is not freely-available from other sources, such as real-time US equity prices.

similar jobs

Wow, it’s unique

Oops. In progress...