Computationally Reproducible Experiments

Todd M. Gureckis · ·  2 minutes to read

Guest and Rougier raise a timely issue regarding reproducible computational models. However, computational models are only as good as the empirical data they seek to explain. In psychology, and many other fields, there is currently a crisis of confidence in the quality of empirical research (Pashler & Wagenmakers, 2012). It appears that an unexpectedly large percentage of research studies do not obtain the same results when repeated by independent researchers (Open Science Collobaration, 2015).

Explanations of these failures often appeal to the difficulties in conducting experiments that are faithful to the original publication. For instance, a researcher might make an error in conducting the replication (e.g., altering the instructions). But even when a replication sticks closely to the text of a published method, there maybe unmentioned, implicit knowledge about how to conduct the study that is specific to a particular lab (Mitchell, 2014) or context (Van Bavel et al., 2016).

The replication debate helps us to refine the difference between replicability and reproducibility. A reproducible experimental method is one that is, in principle, possible to follow exactly. In contrast, a replicable experiment is reproducible but additionally depends on issues of sample size, statistical power, and the variability of the phenomena in question.

Many published experiments in psychology are not reproducible exactly from the methods section of the paper, limiting the feasibility of direct replication (Simons, 2014). There are simply too many variables that are not reported, from the mundane like the room temperature to the potentially crucial like an experimenter's demeanor. We suggest that computationally reproducible experiments represent one solution to this problem. Computationally reproducible experiments are experiments where every aspect of the interaction between the experimenter and the participant is controlled by a computer algorithm. While seemingly impractical (or even dystopian), this level of control is readily available in the form of web-based experiments.

Web-based experiments are psychological experiments that are conducted, most commonly over the Internet, using standard web technologies (e.g., Javascript, HTML) that run within a browser (e.g., Google Chrome). When a participant completes a web-based experiment every interaction must be programmatically scripted, from assignment to conditions and informed consent to task design and debriefing.

Web-based experiments have become a popular tool for behavioral research particularly due to the advent of crowd-sourcing systems like Amazon Mechanical Turk (Mason & Suri, 2012) which also help to standardize the method of subject recruitment and compensation. In an ideal example of this model, a researcher who experimented on Mechanical Turk could pass their experiment script to another researcher who, simply by hosting the code on a website (and recruiting a new sample from Mechanical Turk) could perform an exact replication of the design. In this case, the only difference between the original study and the replication is changes in the sample of participants on Mechanical Turk (or random noise). Other incidental variables (like air temperature) are explicitly not controlled, meaning they may not systematically differ between the original study and the attempted replication.

Today, researchers share the code for experiments informally through email. However, we have created a centralized system for sharing computationally reproducible experiments based on an open-source platform we developed called psiTurk (Gureckis et al., 2016). The psiTurk Experiment Exchange allows researchers to download an experiment from the site and within minutes collect a new sample of data holding all elements of the method constant.

Of course, there will always be some studies that have to be performed in person, and in these cases, videotaped protocols can be used to make the experimental methods more explicit and reproducible (Adolph et al., 2012). But to the extent that studies can be conducted online using an algorithmically scripted framework like psiTurk, the “implicit” or “unmentioned” component of experimental methods is removed aiding reproducibility.

References

  1. Pashler, H., & Wagenmakers, E. J. (2012). Editors’ Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence? Perspectives on Psychological Science. https://doi.org/10.1177/1745691612465253
  2. Open Science Collobaration. (2015). Estimating the reproducibility of psychological science: Open Science Collobaration. Science. https://doi.org/10.1126/science.aac4716
  3. Mitchell, J. (2014). On the evidentiary emptiness of failed replications. http://jasonmitchell.fas.harvard.edu/Papers/Mitchell_failed_science_2014.pdf
  4. Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J., & Reinero, D. A. (2016). Contextual sensitivity in scientific reproducibility. Proceedings of the National Academy of Sciences, 113(23), 6454–6459. https://doi.org/10.1073/pnas.1521897113
  5. Simons, D. J. (2014). The Value of Direct Replication. Perspectives on Psychological Science, 9(1), 76–80. https://doi.org/10.1177/1745691613514755
  6. Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods. https://doi.org/10.3758/s13428-011-0124-6
  7. Gureckis, T. M., Martin, J., McDonnell, J., Rich, A. S., Markant, D., Coenen, A., Halpern, D., Hamrick, J. B., Chan, P., Rich, A. S., McDonnell, J., Halpern, D., Gureckis, T. M., Chan, P., Markant, D., Martin, J., & Hamrick, J. B. (2016). psiTurk: An open-source framework for conducting replicable behavioral experiments online. Behavior Research Methods, 48(3), 829–842. https://doi.org/10.3758/s13428-015-0642-8
  8. Adolph, K. E., Gilmore, R. O., Freeman, C., Sanderson, P., & Millman, D. (2012). Toward Open Behavioral Science. Psychological Inquiry. https://doi.org/10.1080/1047840X.2012.705133
· reproducibility, openscience