Chapter 7 Guidance for Instructors Supervising Reproduction Assignments

The Social Science Reproduction Platform (SSRP) is an openly-licensed platform that crowdsources and catalogs attempts to assess and improve the computational reproducibility of published social science research. Instructors can use the SSRP in combination with this Guide (called the “ACRe Guide”) to facilitate reproduction assignments (sometimes called “replications”) in applied social science courses at the graduate and undergraduate levels. Students can use these materials with little to no supervision, covering the following learning activities:

  • Assessing and improving the reproducibility of published work
  • Applying good coding and data management practices
  • Recording reproduction results and improvements on the SSRP
  • Engaging in constructive exchanges with authors
  • Developing a deep understanding of common methods and computational techniques
  • Creating real, citable scientific contributions.

In the process, students can reference the ACRe Guide as a step-by-step protocol for conducting reproductions and use the SSRP to record and share their work, and instructors can use it as a guide for supervising these assignments. Examples of completed reproductions are available here.

Reproductions typically include five distinct stages (elaborated on at length in the chapters of this Guide):
1. Paper selection, where students select a paper and try to locate its reproduction package;
2. Scoping, where students define the scope of the reproduction by recording the claims, display items, and specifications to focus on;
3. Assessment, where students review and describe in detail the reproduction package, if available, and assess the current level of computational reproducibility of the selected display items;
4. Improvement, where students modify the content and/or the organization of the reproduction package to improve its reproducibility; and
5. Robustness, where students identify feasible robustness checks and/or assess the reasonableness of variations in analytical choices.

This chapter includes tips and resources for instructors interested in using the SSRP to teach reproducibility. We first identify typical use cases and suggest timelines for planning assignments. We then provide sample grading strategies and guidance in instances where the students would like to remain anonymous but still share their work on the SSRP.

Get started by signing up for a free SSRP account and reviewing this chapter to find tips based on your use case below.

7.1 Common use cases

7.1.1 Graduate-level assignment

(Estimated duration: 2 weeks to one semester)

Advanced master’s or PhD-level courses often feature assignments where students reproduce published papers to gain familiarity with fundamental concepts, research methods, and applications. In these courses, instructors can recommend students choose papers from specific sub-fields (e.g., labor economics) or journals, or curate a list of possible papers.

The length of such assignments can vary from a couple of weeks to a semester. For shorter timelines, instructors can ask students to conduct reproductions using only analysis data, focusing on just the main results. For longer timelines, such as semester-long assignments, students might reproduce entire papers from raw data. See section 7.2 below for a suggested distribution of effort across different lengths of assignments.

Graduate students should expect to spend a significant time at the Robustness stage. Therefore, before beginning the assignment, we recommend that students focus either on expanding the set of feasible robustness checks (see Chapter 5) or on testing and defending the reasonableness of a specific robustness check (which may contain more variations in more than one analytical choice).

7.1.2 Undergraduate thesis

(Estimated duration: 2 months to 1 year)

Students can use the platform to carry out detailed reproductions as part of an undergraduate thesis (see an example here). This type of assignment allows students to gain direct research experience while making meaningful contributions to the field.

Depending on the paper’s scope and the assignment’s learning objectives, such projects could last anywhere from a couple of months to an entire year. Students should demonstrate an understanding of the paper by identifying its main claim(s) at the Scoping stage and conducting a detailed assessment of the reproducibility of their associated display items at the Assessment stage. At the Improvement stage, students should try to improve the reproducibility of at least one display item. Depending on the scope of the paper, examples within reach of an undergraduate thesis might include finding and cleaning the raw data, translating code scripts into a different programming language (ideally open source), using dynamic documents, or making other improvements suggested by the instructor. Finally, at the Robustness stage, students should be able to demonstrate a high-level understanding of the conditions under which the paper’s estimates are valid.

We recommend breaking down the assignment into a “proposal” stage, where students scope and assess their papers, and an “execution” stage, where students carry out improvements and robustness checks.

7.1.3 Undergraduate course

(Estimated duration: 1 month to one semester)

Students in undergraduate courses should primarily focus on verifying whether reproductions packages “run” and recording the outcome on the SSRP. At the Scoping stage, undergraduate students can demonstrate a high-level understanding of the paper’s main claims. Students can identify the code and files necessary to reproduce their select display items at the Assessment stage and try to actually run the code. Finally, at the Robustness stage students can inspect the code scripts and identify potential lines of code with analytical choices, increasing the feasible set of robustness checks.

7.2 Paper selection and assignment timelines

The SSRP allows for flexibility in selecting papers and determining the scope of reproduction assignments in terms of the number of claims, display items, or even the required reproduction stages (at minimum, a complete reproduction requires filling in Scoping and Assessment). For example, instructors can curate a list of target papers or allow students to choose from a distinct body of literature or a specific journal. For either strategy, a good starting point is first to reference the SSRP database of papers and reproductions, which allows users to search through submitted reproductions. This can help detect potential “no-go” papers for which other users were unable to find any reproduction materials (labeled as “abandoned papers”), though users are by no means barred from attempting to reproduce such cases. Reproducers can also look into the suggestions of future improvements (under the Improvements stage) and dedicate a reproduction exercise to a specific suggestion identified by in a previous reproduction.

If instructors are unable to curate a list of papers with readily available reproduction packages, we recommend allocating at least 3 additional weeks to try to locate the missing reproduction materials, potentially by contacting the original authors (see this chapter for guidance on constructive exchanges with original authors).

Instructors could select papers base on prior expectations of reproducibility, or based on proxies for impact papers on their field (e.g., citations). See the further guidance on paper selection in the Selecting a Paper chapter.

To plan the distribution of effort across different reproduction stages, we recommend consulting Table 1 below. The Scoping and Assessment stages will take approximately the same amount of time across timelines, but will depend on the students’ prior experience with reproductions and the SSRP. Larger differences emerge in the distribution of time for the last two main stages: Improvements and Robustness. For shorter exercises, we recommend avoiding any possible improvements to the raw data (or cleaning code), which may limit the number of possible robustness checks.

Table 1: Workload distribution across reproduction stages across different timelines
2 weeks
(~10 days)
1 month
(~20 days)
1 semester
(~100 days)
analysis data raw data analysis data raw data analysis data raw data
Scoping 10% (1 day) 5% (1 day) 5% (5 days)
Assessment 35% 25% 15%
Improvement 25% 0% 40% 30% 20%
Robustness 25% 5% 30% 30%

7.3 Grading strategy

Once students have completed their reproductions, they can generate and share with instructors links to view-only reports of their work. Instructors can use these reports to grade and provide feedback on students’ work. Below we suggest a strategy for grading based on the use cases described above. For assignments that don’t contain all reproduction stages, we recommend distributing credits proportionally to the stages that were filled out.

We suggest the following grading categories:

  • Submission: Full credit for timely submitted, non-blank submissions.
  • All key fields: Full credit for filling out details such as paper information, reproduction package links, claims summary for at least one claim, table with claims estimates, inputs description, display item assessments (particularly the subjective assessment section), and assessment of paper-level reproducibility.
  • Paper summary: Full credit for submitting a short summary of the paper that answers these questions.
  • Claims: Full credit for presenting an accurate and clear summary of the paper’s main claims (see guidance here). To properly score this section, the instructor needs to have previous knowledge of the paper.
  • Assessment: There are two kinds of contributions students can make to obtain full credit:
    • Assess multiple claims. Additional credit per each assessment could be assigned at a decreasing rate (e.g., in addition to the first assessment scored in “All key fields” give 50% of the credit for assessing a second claim, 30% for a third, and 20% for a fourth).
    • Conduct a thorough assessment of a single claim. This can be done by clearly identifying all of the inputs (data sources, analysis data, and code scripts) and providing detailed documentation of the reproduction attempt (e.g., duration of execution, errors detected, specific missing files, details about the required computational environment).
  • Improvement: Students can obtain full credit through two types of contributions:
    • Implement improvements of reproducibility of multiple claims with a similar credit structure as for the assessment of multiple claims (see point above).
    • Implement thorough improvements of a single claim. The Improvements chapter describes several types of possible improvements. A key requirement is that students clearly document their improvements (see this section for suggestions on how to do this).
  • Future suggestions for improvement: Instructors can also assign credit for specific suggestions for improvements recorded for future reproducers to attempt. Full credit should be assigned when the recommendations are concrete, feasible, and demonstrate that the student performed an in-depth review of the code and data.
  • Revised reproduction package: Full credit for students who post a revised reproduction package in a trusted repository. Instructors can verify its existence by clicking on the link provided on the reproduction summary page. If the student used version control software as part of the revised reproduction package, the instructor can easily verify the modifications relative to the original reproduction package by checking the commits history.
  • Robustness: Students can obtain full credit for one of two strategies (or a combination of them):
    • Identify a large number of original analytical choices in the code files. The definition of “large” is up to the Instructor, but we suggest at least 20 original analytical choices (see this section for a distinction between original and non-original analytical choices).
    • Conduct a robustness test by modifying one or more analytical choices identified in the paper. Full credit is assigned for a clear and accurate discussion of the results and their reasonableness.
Score
(example)
Weights
(suggested)
Submitted 100 40%
All important fields 80 10%
Summary 100 5%
Clear claims 90 5%
Assessment 100 10%
Improvements 50 10%
Revise reproduction package 100 5%
Specific suggestions for improvements 0 5%
Robustness 100 10%
Final Score 87.5

7.4 Facilitating class anonymous reproductions

The SSRP allows users to select from three distinct privacy models when submitting their work, including public-identified, temporary anonymous, and class anonymous (explained in more detail here). This section includes specific guidance for instructors facilitating reproductions that will be published using the class anonymous privacy model, which allows users to conceal their identities and free-form narrative answers (e.g., notes, descriptions, summaries, explanations) indefinitely. Instead, reproductions published under the class anonymous model contain only categorical responses and basic class identifiers, including course name, year, institution, instructor’s name, and instructor’s email address.

Reproducers looking to publish under the class anonymous model will be unable to maintain their anonymity if they share their revised reproduction packages as part of their reproductions. This is because trusted data repositories and GitHub require users to create accounts in order to post their content and keep records of updates. We advise instructors to create generic repositories for their class (e.g., “Reproductions for Econ 270 - Fall 2021”) and deposit each anonymous reproduction in self-contained folders.

For example, suppose a class has two students who choose the anonymous option. In that case, they should share (privately) their revised reproduction package with their instructor, and the instructor should post them using the following structure:

  Reproductions for [Course Name, Semester]
    └── Revised reproduction package for Smith et al. (2019)
    └── Revised reproduction package for Perez et al. (2019)

Note: When anonymizing revised reproduction packages, we advise that instructors and students avoid using version control software. Even if the current materials are correctly de-identified, the version history may still contain identifiers, such as GitHub usernames.