Streaming Data into Delta Lake with Rust and Kafka - Databricks Data+AI Summit 2022

Опубликовано: 19 Июль 2022
на канале: Academia de Dados
108
0

Databricks Data + AI Summit 2022

Christian Williams
Software Engineer
Scribd

The future of Scribd’s data platform is trending towards real-time. A notable challenge has been streaming data into Delta Lake in a fast, reliable, and efficient manner. To help address this problem, we developed two foundational open source projects: delta-rs, to allow Rust to read/write Delta Lake tables and kafka-delta-ingest, to quickly and cheaply ingest structured data from Kafka.

In this talk we’ll review the architecture of kafka-delta-ingest and how it fits into a larger real-time data ecosystem at Scribd.

#AI #Data #Databricks #DeltaLake #Lakehouse #MLOps