Databricks Data + AI Summit 2022
Christian Williams
Software Engineer
Scribd
The future of Scribd’s data platform is trending towards real-time. A notable challenge has been streaming data into Delta Lake in a fast, reliable, and efficient manner. To help address this problem, we developed two foundational open source projects: delta-rs, to allow Rust to read/write Delta Lake tables and kafka-delta-ingest, to quickly and cheaply ingest structured data from Kafka.
In this talk we’ll review the architecture of kafka-delta-ingest and how it fits into a larger real-time data ecosystem at Scribd.
#AI #Data #Databricks #DeltaLake #Lakehouse #MLOps