The Berkeley Algorithms, Machines, and People Laboratory (AMPLab) is creating a new approach to data analytics. The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the four years the lab has been in operation, we've released major components of BDAS. Several of these components have deeply influenced current Big Data practice: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Tachyon distributed storage system. BDAS features prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an ongoing academic project. In this talk I will give an overview of BDAS with an emphasis on how we provide an integrated environment for SQL processing, Graph analytics, Streaming, and Machine Learning at scale. I'll then describe our current and planned efforts for moving "up the stack" including new components such as the Velox and MLBase machine learning platforms, and the SampleClean framework for hybrid human/computer data cleaning.
Michael Franklin will present an overview of the AMPLab and will be followed by Ali Ghodsi of Databricks who will demonstrate how to use Spark and other BDAS components in the Databricks Cloud.