Chapter 4 Improvements

As you assess a paper, you can start proposing ways to improve its reproducibility. These improvements can be at the paper level or specific to a display item. The SSRP also allows you to record improvements that you’ve already implemented or that you suggest for future reproducers (including yourself) to implement. Considering improvements is an opportunity to gain a deeper understanding of a paper’s methods, findings, and overall contributions. Each contribution can also be assessed and used by the wider Social Science Reproduction Platform (SSRP) community, including other students and researchers using the SSRP.

Some of the improvements might require you to engage with the original authors of the study you are reproducing. This stage will help you identify if the authors have already been contacted with a similar request and, if not, how to approach them in order to have a constructive exchange.

As with the Assessment stage, we recommend that you first focus on one specific display item (e.g., “Table 1”). After making improvements to this first item, you will have a much easier time translating those improvements to other items.

4.1 Display item improvements

As part of your assessment of specific display items, you will identify potential issues with the original reproduction package (for any score lower than level 10). In addition to identifying these gaps, you are encouraged to implement specific improvements. In this section we suggest steps on how to add missing materials (data or code), or debug analysis or cleaning code. Record these improvements in the “Display item improvements” section.

4.1.1 Adding raw data: missing files or metadata

Reproduction packages often do not include all original raw datasets. To obtain any missing raw data or information about them, follow these steps:

  1. Identify the missing file. During the Assessment stage, you identified all data sources from the paper’s body and appendices (Assessment step 1.1.). However, some data sources (as collected by the original investigators) might be missing one or more files. You can sometimes find the specific name of those files by looking at the beginning of the cleaning code scripts. If you find the name of the file, record it in Assessment step 1.1. as above. If not, record it as “Some/All” in the known_missing field of the for each specific data source.
  2. Verify whether this file (or files) can be easily obtained from the web.
    • 2.1 - If yes: obtain the missing files and add them to your revised reproduction package. Make sure to obtain permission from the owners of this data source to publicly share this data. See chapter 7 for more guidance.
    • 2.2 - If no: proceed to step 3.
  3. Eventually you will be able to use the SSRP to verify whether previous reproducers have contacted the authors regarding this paper and the specific missing files. For now, skip to the next step.
  4. Contact the original authors and politely request the original materials. Be mindful of their time, and remember that the paper you are trying to reproduce was possibly published at a time when standards for computational reproducibility were different. See chapter 7 for sample language on how to approach the authors for this specific scenario.
  5. If the datasets are not available due to legal or ethical restrictions, you can still improve the reproduction package by providing detailed instructions on how to access these data. for future researchers to follow, including contact information and possible costs of obtaining the raw data (e.g., access fees, how much time it might take between requesting and receiving access, etc.).

4.1.3 Adding missing analysis code

Analysis code can be added when analytic data files are available, but some or all methodological steps are missing from the code. In this case, follow these steps:

  1. Identify the specific line or paragraph in the paper that describes the analytic step that is missing from the code (e.g., “We impute missing values to…” or “We estimate this regression using a bandwidth of…”).
  2. Identify the code file and the approximate line in the script where the analysis can be carried out. If you cannot find the relevant code file, identify its location relative to the main folder using the the steps in the reproduction diagram.
  3. Eventually you will be able to use the SSRP to verify if previous attempts have been made to contact the authors about this issue. For now, skip to the next step.
  4. Contact the authors and request the specific code files.
  5. If step #4 does not work, we encourage you to attempt to recreate the analysis using your own interpretation of the paper, and making explicit your assumptions when filling in any gaps.

4.1.4 Adding missing data cleaning code

Data cleaning (processing) code might be added when steps are missing in the creation or re-coding of variables, merging, subsetting of the data sets, or other steps related to data cleaning and processing. You should follow the same steps you used when adding missing analysis code (steps 1-5 above).

4.1.5 Debugging analysis code

Whenever code is available in the reproduction package, you should be able to debug those scripts. There are four types of debugging that can improve the reproduction package:

  • Code cleaning: Simplify the instructions (e.g., by wrapping repetitive steps in a function or a loop) or remove redundant code (i.e., old code that was commented out) while keeping the original output intact.
  • Performance improvement: Replace the original instructions with new ones that perform the same tasks but take less time (e.g., choose one numerical optimization algorithm over another while still obtaining the same results).
  • Environment set up: Modify the code to include correct paths to files, specific versions of software, and instructions to install missing packages or libraries.
  • Correcting errors: A coding error will occur when a section of the code in the reproduction package executes a procedure that is in direct contradiction with the intended procedure expressed in the documentation (i.e., paper or code comments). For example, an error will occur if the paper specifies that the analysis is performed on a population of males, but the code restricts the analysis to females only.

4.1.6 Debugging cleaning code

Follow the same steps that you did to debug the analysis code (above), but report them separately.

4.1.7 Adding information on how to access confidential/proprietary data

If the original authors are unable to share the raw or analytical data due to legal or ethical reasons, the reproduction package can still be improved by including information on how to access such data. The American Economic Association requires original authors to provide data availability statements (DAS) for such data sources. Use this form (.pdf, .md ) to assess the completeness of any DAS.

4.2 Paper-level improvements

There are several measures you can take to improve a paper’s overall reproducibility. These additional improvements can be applied across all reproducibility levels (including level 10). Record these improvements in the “Paper-level improvements” section.

File documentation and organization:
1 - Set up the reproduction package using version control software, such as Git.
2 - Improve documentation by adding comments to the code.
3 - Re-organize the reproduction package into a set of folders and sub-folders that follow standardized best practices, and add a master script that executes all the code in order, with no further modifications. See AEA’s reproduction template.

4 - Integrate the documentation with the code by adapting the paper into a literate programming environment (e.g., using Jupyter notebooks, RMarkdown, or a Stata Dynamic Doc).
5 - If the code was written using proprietary statistical software (e.g., Stata or Matlab), re-write some parts of it using open-source statistical software (e.g., R, Python, or Julia).
6 - Set up a computing capsule that executes the entire reproduction in a web browser without needing to install any software. For examples, see Binder and Code Ocean.

Please suggest other paper-level improvements by editing this guide (use the “edit” button above) or contacting .