Computational reproducibilty is defined as the degree to which it is possible to obtain consistent results using the same input data, computational methods, and conditions of analysis (Sciences 2019). In 2019, the American Economic Association updated its Data and Code Availability Policy to require that the AEA Data Editor verify the reproducibility of all papers before they are accepted by an AEA journal. Similar policies have been adopted in political science, particularly at the American Journal of Political Science. In addition to the requirements laid out in such policies, specific recommendations were produced by data editors of social science journals to facilitate compliance. This change in policy is expected to improve the computational reproducibility of all published research going forward, after several studies showed that rates of computational reproducibility in the social sciences range from somewhat low to alarmingly low (Galiani, Gertler, and Romero 2018; Chang and Li 2015; Kingi et al. 2018).
Replication, or the process by which a study’s hypotheses and findings are re-examined using different data or different methods (or both) (King 1995) is an essential part of the scientific process that allows science to be “self-correcting.” Computational reproducibility, or the ability to reproduce the results, tables, and other figures of a paper using the available data, code, and materials, is a necessary condition for replication. Computational reproducibility is assessed through the process of reproduction. At the center of this process is the reproducer (you!), a party rarely involved in the production of the original paper. Reproductions sometimes involve the original author (whom we refer to as “the author”) in cases where additional guidance and materials are needed to execute the process.
This Guide is meant to be used in conjunction with the Social Science Reproduction Platform (SSRP), an open-source platform that crowdsources and catalogs attempts to assess and improve the computational reproducibility of published social science research. Though in its current version, the Guide is principally intended for reproductions of published research in economics, it may be used in other social science disciplines, and we welcome contributions that aim to “translate” any of its parts to other social science disciplines (learn how you can contribute here). The purpose of this document is to provide a common approach, terminology, and standards for conducting reproductions. The goal of reproductions, in general, is to assess and improve the computational reproducibility of published research in a way that promotes a better understanding of research and facilitates additional robustness checks, extensions, collaborations, and replications.
This Guide and the SSRP were developed as part of the Accelerating Computational Reproducibility in Economics (ACRE) project, which aims to assess, enable, and improve the computational reproducibility of published economics research. The ACRE project is led by the Berkeley Initiative for Transparency in the Social Sciences (BITSS)—an initiative of the Center for Effective Global Action (CEGA)—and Dr. Lars Vilhuber, Data Editor for the journals of the American Economic Association (AEA). This project is supported by the Laura and John Arnold Foundation.
Beyond binary judgments
Assessments of reproducibility can easily gravitate towards binary judgments that declare an entire paper “reproducible” or “non-reproducible.” These guidelines suggest a more nuanced approach by highlighting two realities that make binary judgments less relevant.
First, a paper may contain several scientific claims (or major hypotheses) that may vary in computational reproducibility. Each claim is tested using different methodologies, presenting results in one or more display items (outputs like tables and figures). Each display item will itself contain several specifications. Figure 0.1 illustrates this idea.
Second, for any given specification there are several levels of reproducibility, ranging from the absence of any materials to complete reproducibility starting from raw data. And even for a specific claim-specification, distinguishing the appropriate level can be far more constructive than simply labeling it as (ir)reproducible.
Note that the highest level of reproducibility, which requires complete reproducibility starting from raw data, is very demanding to achieve and should not be expected of all published research — especially before 2019. Instead, this level can serve as an aspiration for the field of economics at large as it seeks to improve the reproducibility of research and facilitate the transmission of knowledge throughout the scientific community.
Stages of the exercise
A reproduction attempt is divided into five stages, corresponding to the first five chapters of these guide:
- Paper selection, where you (the reproducer) will select a candiate paper and verify the availability of a reproduction package. Depending on availability, you will declare the paper and start the excercise, or select a new candidate paper (after leaving a short record);
- Scoping, where you will define the scope of the exercise recording which claims, display items, and specifications you will focus for the remainder of the exercise;
- Assessment, where you will review and describe in detail the available reproduction package, and assess the current level of computational reproducibility of the selected display items;
- Improvement, where you will modify the content and/or the organization of the reproduction package to improve its reproducibility;
- Robustness checks, where you will identify feasible robustness checks and/or assess the reasonableness of specific variations in analytical choices.
These guidelines do not include a possible fifth stage of extension. Here you may extend the current paper by including new methodologies or data. If you where to extend the same methodology and research question into a different sample, that would bring you closer to a replication.
This process need not be chronologically linear. For example, you may realize that the scope of a reproduction is too ambitious and switch to a less intensive one. Later in the exercise, you can also begin testing different specifications for robustness while also assessing a paper’s level of reproducibility. The only stage that should go first, and may not be edited further once finished, is the scoping stage, as it defines the scope of the exercise.
This guide, and the SSRP platform, will change the key unit of analysis as you progress through each stage. As Figure 0.3 shows, the scoping stage will be centered around the key scientific claims selected for reproduction. Once those have been identified, the next two stages will guide you on how to assess and improve the reproducibility of the display items supporting those claims. In the final stage the unit of analysis is once again at the claim level.
Generally, a reproduction will begin with a thorough reading of the study being reproduced. However, subsequent steps may follow from a reproduction strategy. For example, a reproduction may closely follow the order of the steps outlined above. This might entail the reproducer first choosing a set of results whose production they are interested in assessing or understanding, completely reproducing these results to the extent possible, and then making modifications to the reproduction package. Another potential strategy could be for the reproducer to develop potential robustness checks or extensions while reading the study, which would lead to the definition of a set of results to be assessed via reproduction. Yet another reproduction strategy may be for the reproducer to seek out a paper that uses a particular dataset to which they have access or an interest in using, reproducing the results that use that dataset as an input, then probing the robustness of the results to various data cleaning decisions.
The various uses of reproduction makes the number of potential reproduction strategies quite large. In choosing or designing a reproduction strategy, it is helpful to clearly identify the goal of the reproduction. In all of the examples laid out in the paragraph above, the order in which the steps of the reproduction exercise are taken is at least partially determined by what the reproducer hopes to get from the exercise. The structure provided in these guidelines, together with a clear reproduction goal, can facilitate the implementation of an efficient reproduction strategy.
Chang, Andrew, and Phillip Li. 2015. “Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say’usually Not’.” Available at SSRN 2669564.
Galiani, S, P Gertler, and M Romero. 2018. “How to Make Replication the Norm.” Nature 554 (7693): 417–19.
King, Gary. 1995. “Replication, Replication.” PS: Political Science and Politics 28: 444–52.
Kingi, Hautahi, Lars Vilhuber, Sylverie Herbert, and Flavio Stanchi. 2018. “The Reproducibility of Economics Research: A Case Study.” In. Presented at the BITSS Annual Meeting 2018; available at the Open Science ….
Sciences, National Academies of. 2019. Reproducibility and Replicability in Science. National Academies Press. https://pubmed.ncbi.nlm.nih.gov/31596559/.