Our vision is to bring more innovation, efficiency, and equality of opportunity to the world by creating an open financial system. Our first step on that journey is making digital currency accessible and approachable for everyone. To achieve that, it is critical to have timely and reliable access to all of our data, from user clicks on our website down to blockchain transactions.As a Data Platform Engineer, you will build our next generation data platform and accompanying services. Our data pipelines are growing rapidly, currently processing several terabytes of data from production databases and external providers to our data warehouse. We build foundational self-service systems that allow end users to create ETL flows and consume data in batch and streaming fashion for machine learning, fraud prevention, A/B testing and analytics purposes.Responsibilities
- Data ingestion pipeline: Build our next generation streaming ingestion pipeline for scale (10x data), speed (<1 minute of lag), and ease of use (<1 hour to add a new source). Read from a variety of upstream systems (MongoDB, Postgres, DynamoDB, MySQL, API) in both batch and streaming fashion (tail MongoDB’s oplog and Postgres’ WAL). Today we do this with Apache Airflow, Hadoop, Spark and a pure Kotlin service.
- Self-service transformation engine: Build and maintain our self-service tooling that allows anybody at Coinbase to transform complex JSON and create dimensional models. Specific challenges are supporting type 2 slowly changing dimensions, end-to-end testability, validation/monitoring/alerting and efficient execution. Today we do this with Apache Airflow.
- Anomaly detection: Build a comprehensive anomaly detection service that allows anybody at Coinbase to quickly set up notifications in order to detect process breakage.
- Security: build a security layer that authorizes data access at the row/column level. Build a logging and auditing system in order to surface suspicious data access patterns.
- Exhibit our core cultural values: positive energy, clear communication, efficient execution, continuous learning
- Experience building (data) backend systems at scale with parallel/distributed compute
- Experience building microservices
- Experience with Python and/or Java/Scala
- Knowledge of SQL
- A data-oriented mindset
PREFERRED (NOT REQUIRED):
- Computer Science or related engineering degree
- Deep knowledge of Apache Airflow, Spark, Hadoop, Hive, Kafka/Kinesis
WHAT TO SEND
- A resume that describes scalable systems you’ve built
Role: Data Engineer Job Type: Location: San Francisco, Apply for this job now.