Whether it’s Internet of Things (IoT), analysis of Financial Data, or Adtech, the arrival of events in time order requires tools and techniques that are noticeably missing from the Pandas and pySpark software stack.
In this talk, we’ll cover Two Sigma’s contribution to time series analysis for Spark, our work with Pandas, and propose a roadmap for to future-proof pySpark and establish Python as a first class language in the Spark Ecosystem.