The Berkeley Data Analytics Stack: Present and Future

Thursday, May 7, 2015 - 2:00 pm

The Berkeley Algorithms, Machines, and People Laboratory (AMPLab) is creating a new approach to data analytics. The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the four years the lab has been in operation, we've released major components of BDAS. Several of these components have deeply influenced current Big Data practice: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Tachyon distributed storage system. BDAS features prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an ongoing academic project.  In this talk I will give an overview of BDAS with an emphasis on how we provide an integrated environment for SQL processing, Graph analytics, Streaming, and Machine Learning at scale.   I'll then describe our current and planned efforts for moving "up the stack" including new components such as the Velox and MLBase machine learning platforms, and the SampleClean framework for hybrid human/computer data cleaning.

Michael Franklin will present an overview of the AMPLab and will be followed by Ali Ghodsi of Databricks who will demonstrate how to use Spark and other BDAS components in the Databricks Cloud.

The Berkeley Data Analytics Stack: Present and Future - DataEDGE 2015

Thomas M. Siebel Professor of Computer Science and Chair of the Computer Science Division,
UC Berkeley

Michael Franklin is the Thomas M. Siebel Professor of Computer Science and Chair of the Computer Science Division at the University of California, Berkeley. Professor Franklin is also the Director of the Algorithms, Machines, and People Laboratory (AMPLab) at UC Berkeley.  The AMPLab currently works with and is supported by 28 leading information technology companies including founding sponsors Amazon Web Services, Google, and SAP. AMPLab is well-known for creating a number of key systems in the Open Source Big Data ecosystem including Spark, Mesos, GraphX and MLlib, all parts of the Berkeley Data Analytics Stack (BDAS). He was Founder and CTO of Truviso, a real-time analytics company acquired by Cisco in 2012. He  works with and advises numerous technology start ups including Berkeley spinouts Databricks,, and Tachyon Nexus.  

Professor Franklin is a co-PI and Executive Committee member for the Berkeley Institute for Data Science, part of a multi-campus initiative to advance Data Science Environments. He is an ACM Fellow, a two-time winner of the ACM SIGMOD "Test of Time" award, has several recent "Best Paper" awards and two recent CACM Research Highlights selections, and is recipient of the outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley.


Ali Ghodsi is a cofounder of Databricks and currently heads engineering and product management. Prior to that he was an assistant professor at KTH/Sweden and a visiting researcher at UC Berkeley since 2009. He holds a PhD in Computer Science from KTH/Sweden, and an MBA from Mid-Sweden University.