Trainspotting and Predicting Train Delays

Thursday, May 5, 2016 - 3:10 pm to 4:10 pm

Tens of thousands of Bay Area residents commute every day on the Caltrain. Unfortunately, the system is unreliable and the reported delay predictions are often completely wrong. Silicon Valley Data Science has created an extensive data architecture to collect different types of data—video streams, audio streams, GPS data, and web data—for predicting train arrival delays. In the larger data science project, we have conducted train classification using video and audio streams, sentiment analysis using Twitter data, and train arrival delay prediction using various machine learning and statistical methods.

In this talk, we will focus on two aspects of the larger data science project: (1) Classification using video streams and (2) train arrival delay prediction using various machine learning and statistical methods.

Trainspotting and Predicting Train Delays | DataEDGE 2016

Data Scientist
Silicon Valley Data Science

Chloe Mawer is a Data Scientist at Silicon Valley Data Science. She has experience working on a wide variety of problems ranging from developing a data strategy for a pharmaceutical company to devising a methodology for performing longitudinal consumer impact studies at a multinational retail company. Additionally, she has researched, written, and spoken on the subject of valuing data for both monetization and for making internal decisions within an organization. Chloe obtained her doctorate in Environmental Engineering from Stanford University and her undergraduate degree in Civil and Environmental Engineering from Duke University. Her research there focused on developing methods for obtaining hydrologic information from electrical data taken from the subsurface to better inform groundwater management decisions.

Vice President of Data Science
Silicon Valley Data Science

Jeffrey is the Vice President of Data Science at Silicon Valley Data Science, where he leads a team of Ph.D. data scientists. Prior to SVDS, Jeffrey was the VP and Head of Risk Analytics and Quantitative Modeling & Research at Charles Schwab Corporation, and before Charles Schwab, he was a Director of Financial Risk Management Consulting at KPMG. His prior experience also includes analytic roles at Moody’s Analytics, the World Bank, the Wharton School, and University of Pennsylvania School of Medicine. Jeffrey holds a Ph.D. and an M.A. in Economics, with a focus on econometrics, from the University of Pennsylvania and a B.S. in Mathematics and Economics from UCLA.

Data Scientist
Silicon Valley Data Science

Daniel is a Data Scientist at Silicon Valley Data Science. Daniel’s Ph.D. research involved measuring the structure at the largest scales in the universe and tracing their evolution from the earliest moments up until to today. In order to accomplish this, he used observations from a multi-year telescope survey to construct the largest three-dimensional map of our universe to date. His work enables tests of Einstein’s theory of general relativity and helps physicists better understand the fundamental constituents of the universe, including the enigmatic dark energy, which drives our ever-expanding universe. In particular, he focused on developing and scaling algorithms for extracting signals from ever increasing volumes of data. Prior to joining SVDS, Daniel was a fellow at Insight Data Science. While at Insight, he developed a meal ingredient recommendation application that utilized a graph based analysis of ingredient flavor compounds and ingredient lists from publicly available recipe databases. Daniel graduated with a bachelor of science degree in physics from UCLA. He received his masters in physics from UC Irvine and will complete his Ph.D. in Physics there this year.