Big data and data science hold the promise of new understanding of our world. With the increased use of computational methods across the research landscape, questions have arisen regarding the reliability, verifiability, and reproducibility of findings. We will frame some of these concerns, and discuss policy reactions in the White House and on Capitol Hill in light of this framing.
Relying on Data Science: Reproducible Research and the Role of Policy
A former research editor and manager at Palo Alto’s Institute for the Future, Jess Hemerly is currently senior analyst on the public policy and government relations team at Google. As a freelance writer and cultural critic, Hemerly’s writing has appeared in MAKE, The Onion, 7x7, and on Boing Boing, AlterNet, and several Bay Area music blogs. In 2009, Hemerly was nominated for a Webby award in the “Website: Weird” category for a blog she co-created, Sad Guys on Trading Floors. In 2002 she served as an intern in President Clinton’s post-presidential office in Harlem. Hemerly received her B.A. in politics from NYU and her master’s in information management and systems from UC Berkeley’s School of Information. In 2011, she earned the I School’s James R. Chen award for outstanding final project in information research for her master’s thesis, “Making Metadata: The Case of MusicBrainz.”
Fernando Pérez is a research scientist at UC Berkeley’s Henry H. Wheeler Jr. Brain Imaging Center; he works at the interface between high-level scientific computing tools and the mathematical questions that arise in the analysis of neuroimaging data.
Pérez is committed to creating better tools for scientific computing based on the Python language. He created the IPython project while a graduate student in 2001 and continues to lead the project, now as a collaborative effort. He is also an active member of the community that creates freely available scientific computing tools around the SciPy stack, lectures regularly about scientific computing in Python, and is a founding board member of the NumFOCUS foundation. At UC Berkeley, Pérez is involved with a number of efforts to improve the quality of the computational practices of scientists and educators.
Philip B. Stark is a professor of statistics who has done research on the Big Bang, causal inference, the U.S. census, earthquake prediction, election auditing, food web models, the geomagnetic field, geriatric hearing loss, information retrieval, Internet content filters, nonparametrics, the seismic structure of sun and earth, spectroscopy, spectrum estimation, and uncertainty quantification for computational models of complex systems.
Stark conducted the first risk-limiting post-election audits and the first scientific study of the effectiveness of Internet content filters. He developed UC Berkeley’s first official online course. He has testified to Congress regarding census adjustment and has served as an expert in litigation over the Child Online Protection Act, consumer protection, employment discrimination, environmental protection, equal protection, intellectual property, jury selection, import restrictions, insurance, natural resources, product liability, trade secrets, truth in advertising, and wage and hour issues. He received his A.B. in philosophy from Princeton University and his Ph.D. in earth sciences from UC San Diego.
Victoria Stodden is assistant professor of statistics at Columbia University and serves as a member of the National Science Foundation’s Advisory Committee on Cyberinfrastructure (ACCI), and on Columbia University’s Senate Information Technologies Committee. She is one of the creators of SparseLab, a collaborative platform for reproducible computational research and has developed an award winning licensing structure to facilitate open and reproducible computational research, called the Reproducible Research Standard. She is currently working on the NSF-funded project “Policy Design for Reproducibility and Data Sharing in Computational Science.”
Stodden co-chaired a working group on virtual organizations for the NSF’s Office of Cyberinfrastructure Task Force on Grand Challenge Communities in 2010. She is a Science Commons fellow and a nominated member of the Sigma Xi scientific research society. She also serves on the advisory board for hackNY.org, and on the joint advisory committee for the NSF’s EarthCube, the effort to build a geosciences-integrating cyberinfrastructure. She is an editorial board member for Open Research Computation and Open Network Biology. She completed her Ph.D. and law degrees at Stanford University.