Chapter 1 Selecting a paper

The goal of this stage is to help you define the scope of your exercise by declaring a paper and the specific output(s) on which you will focus. You might first consider multiple papers without analyzing them more closely (we refer to these as candidate papers) before moving forward with your declared paper.

The main difference between a candidate and a declared paper is the availability of a reproduction package. A reproduction package is the collection of materials that make it possible to reproduce a paper. This package may contain data, code, or documentation. If you are unable to independently locate the reproduction package for your paper, you can ask the paper’s author for it (find guidance on this in Chapter 5) or simply choose another candidate paper. If you still want to explore the reproducibility of a paper with no reproduction package, these guidelines provide instructions for requesting materials from authors to create a public reproduction package, or if this proves unsuccessful, for building your reproduction package from scratch.

To avoid duplicating the efforts of others who may be interested in reproducing one of your candidate papers, we ask that you record your candidate papers in the SSRP database (currently under development).

Note that in this stage, you are not expected to review the reproduction materials in detail, as you will dedicate most of your time to this in later stages of the exercise.

1.1 From candidate to declared paper

At this point of the exercise, you are only validating the availability of (at least) one reproduction package and not assessing the quality of its content. Follow the steps below to verify that a reproduction package is available, and stop whenever you find it (this may mean that you have found your declared paper).

  1. Check whether previous reproduction attempts have been recorded in the SSRP Database for the paper (more on the SSRP Database in the next section).
  2. Check the journal or publisher’s website, looking for materials named “Data and Materials,” “Supplemental Materials,” “Reproduction/Replication Package/Materials,” etc.
  3. Look for links in the paper (review the footnotes and appendices).
  4. Review the personal websites of the paper’s author(s).
  5. Contact the author(s) to request the reproduction package using this email template. In this and future interactions with authors, we encourage you to follow our guidance outlined in Chapter 5.
  6. Deposit the reproduction package in a trusted repository (e.g., Dataverse, Open ICPSR, Zenodo, or the Open Science Framework) under the name Original reproduction package for - Title of the paper. You will be asked to provide the URL of the repository in Survey 1.

In case you need to contact the authors, make sure to allocate sufficient time for this step (we suggest at least three weeks before the date you plan to start the reproduction). Instructors should also plan to accordingly (e.g., if the ACRE exercise is expected to take place in the middle of the semester, students should review candidate papers and (if applicable) contact the authors in the first few weeks of the semester).

Review the decision tree (Figure 1.1) below for a more detailed overview of this process. Remember, if at any step of the process you decide to abandon the paper, make sure to record the candidate paper in the ACRE database before moving on to another candidate paper. Once you have obtained the reproduction package, the candidate paper becomes your declared paper and you can move forward with the exercise! Do not invest time in doing a detailed read of any paper until you are sure that it is your declared paper.

1.1.1 Candidate paper entries in the SSRP Database

If the SSRP database contains previous reproduction attempts of the paper, you will see a report card with the following information:

Box 1: Summary Report Card for ACRE Paper Entry
Title: Sample Title
Authors: Jane Doe & John Doe
Original Reproduction Package Available: Yes (link)/No. [If “No”] Contacted Authors?: Yes/No
[If “Yes(contacted)”] Type of Response: Categories (6).
Additional Reproduction Packages: Number (eg., 2)
Authors Available for Further Questions for ACRE Reproductions: Yes/No/Unknown

If after taking steps 1-5 above (or for some other reason) you are unable to locate the reproduction package, record your candidate paper (and if applicable, the outcome of your correspondence with the original authors) in the SSRP database following the example above.

Decision tree to move from candidate to declared paper

Figure 1.1: Decision tree to move from candidate to declared paper

1.2 Identify your relevant timeline.

Before you begin working on the four main stages of the reproduction exercise (Scoping, Assessment, Improvement, and Robustness), it is important to manage your own expectations and those of your instructor or advisor. Be mindful of your time limitations when defining the scope of your reproduction activity. These will depend on the type of exercise chosen by your instructor or advisor and may vary from a weeklong homework assignment, to a longer class project that may take a month to complete or a semester-long project (an undergraduate thesis, for example).

Table 1 shows an example distribution of time across three different reproduction formats. The Scoping and Assessment stages are expected to last roughly the same amount of time across all formats (lasting longer for the semester-long activities, and acknowledging that less experienced researchers, such as undergraduate students, may need more time). Differences emerge in the distribution of time for the last two main stages: Improvements and Robustness. For shorter exercises, we recommend avoiding any possible improvements to the raw data (or cleaning code). This will limit how many robustness checks are possible (for example, by limiting your ability to reconstruct variables according to slightly different definitions), but it should leave plenty of time for testing different specifications at the analysis level.

2 weeks
(~10 days)
1 month
(~20 days)
1 semester
(~100 days)
analysis data raw data analysis data raw data analysis data raw data
Scoping 10% (1 day) 5% (1 day) 5% (5 days)
Assessment 35% 25% 15%
Improvement 25% 0% 40% 20% 30%
Robustness 25% 5% 25% 25%