Exploring New Frontiers: How Apache Flink, Apache Hudi and Presto Power...- Danny Chan & Sagar Sumit

Опубликовано: 15 Июнь 2023
на канале: Presto Foundation
207
3

Exploring New Frontiers: How Apache Flink, Apache Hudi and Presto Power New Insights and Gold Nuggets at Scale - Danny Chan & Sagar Sumit, Onehouse

Data exploration and efficiently processing streaming data at scale can be very challenging. It’s very common that the data shape for streaming data can change as patterns and trends evolve. There’s a deep desire in streaming communities like Flink to incrementally process and discover new insights and patterns on-the-fly at scale. While OLTP systems are good for update heavy use cases and can handle high volume transactional data, they are not optimized for read-heavy workloads. For instance they may not support more complex analytical functions required for data exploration because they are unoptimized for scanning large amounts of data required for adhoc queries. Ingesting data into lakehouses can help address data exploration needs, but present challenges with stream processing because of minimal support for data mutability and faster updates. Apache Hudi is a holistic lakehouse platform that is natively designed for handling mutable data, encompassing near real-time insertions and updates, along with a powerful incremental processing framework. Hudi’s out-of-the-box support for streaming data lays the groundwork for CDC ingestion from different sources and enables users to track record level changes into their lakehouse tables. The Hudi-Flink integration extends this support to address the Flink community streaming needs. Furthermore, Hudi’s interoperable compute engine support empowers interactive analysis via Presto to discover fresher insights on streaming data. Hudi and Presto are a powerful duo that can be used together to incrementally process and explore Flink data streams. While Hudi efficiently supports row-level updates and incremental processing of Flink data streams on the data lake, Presto provides powerful ANSI SQL capabilities with the support of complex queries like window functions to allow companies to instantly understand the trends and changes in the data streams without pre-planning queries. From there, downstream consumers can ingest the changelogs from Presto to power their data applications. In this talk, attendees will walk away with:
The current challenges of analytics on transactional data systems with data streams at scale
How the Hudi unlocks incremental processing on the lake
How Presto allows ad-hoc queries that support data exploration on Flink data
How you can leverage Flink, Hudi and Presto to build incremental materialized views