Chapter 7 Guidance for Instructors Seeking to Supervise Reproduction Excercises

We expect that a significant fraction of users of this Guide and the SSRP will be students conducting reproductions as part of some type of academic assignment. This chapter identifies possible types of assignments, suggests timeliness and grading strategies, and guides instructors in managing instances where the students would like to remain anonymous but still share their work on the SSRP.

7.1 Possible use cases for instructors

Graduate-level assignment
Advanced masters- or PhD-level courses often feature class assignments where students reproduce published papers to gain familiarity with fundamental concepts, research methods, and applications. In such instances, instructors can recommend students to choose papers from specific sub-fields (e.g., labor economics), journals or curate a list of possible papers. The length of such assignments can vary from a couple of weeks to a semester-long. For shorter timelines, instructors can ask students to conduct reproductions using only analysis data and focusing on the main results. For longer timelines such as semester-long assignments, students can be instructed to reproduce entire papers starting from the raw data. See section 1.2 for a suggested distribution of effort across different lengths of assignments.

Graduate students should expect to spend a significant time at the Robustness stage. Therefore, before beginning the assignment, we advise instructors to clearly communicate if they are seeking for students to focus on expanding the set of feasible robustness checks by identifying analytical choices (see Chapter 5) or on testing and defending the reasonableness of a specific robustness check (which may contain more variations in more than one analytical choice).

Undergraduate thesis
Students can use the platform to carry out a detailed reproduction as part of an undergraduate thesis (see an example here). This project could last anywhere from a few months to an entire year. At the undergraduate level, students should demonstrate an understanding of the paper by identifying its main claims and conducting a detailed assessment of the reproducibility of their associated display items. They should not be expected to have a thorough understanding of the methodology, but they should be able to demonstrate a high-level understanding by identifying the conditions under which the paper’s estimates are valid. In addition, students should dedicate part of the exercise to improving the reproducibility of one or more display items.

Depending on the scope of the paper, examples within reach of an undergraduate thesis include finding and cleaning the raw data, translating code scripts into a different programming language (ideally open source), using dynamic documents, etc. We recommend breaking down the assignment into a proposal stage, where the student scopes and assesses a paper, and an execution stage, where the student carries out improvements and conducts some new robustness checks.

This type of assignment allows students to gain direct research experience and potentially make a meaningful contribution to the field.

Undergraduate course
A third possible format for supervised reproductions can take place in undergraduate studies. In this setting the main goal should be for the student to verify that the reproductions packages “runs” and if not, up to which point. Here undergrads can be expected to understand at a high level the claims of the paper, and attempt to find the estimates that support those claims in the reproduction package. Undergraduate students can also contribute by inspecting the code scripts and identifying potential lines of code where analytical choices (increasing the feasible set of robustness checks). This exercise would give students some exposure to what is under the hood the research papers that are later on presented in their textbooks.

7.2 Paper selection and semester timeline

When setting up a reproduction excercise, Instructors should have one important timeline consideration. Whenever a chosen paper does not have a reproduction package (or has important components missing), the reproducer could try contacting the original authors to obtain the missing materials (see Chapter 8). In this scenario, reproducers should have at least a couple of weeks to hear back from the authors. In this scenario we recomend that instructors assing this excercise at the beggining of the semester, requiring completion of the paper selection stage early in the academic cycle. This way repoducers will have the opportunity to add missing materials or move onto another paper in case of an unsuccessful response from the authors.

7.3 Grading Examples

Once students have completed a reproduction in the Social Science Reproduction Platform (SSRP), they will be able to generate a View-only version of their exercise. This report can be used by instructors to grade their work and provide feedback. Here we provide an example fo how to grade a reproduction exercise as used in previous classes.

The final score, set here on a 0-100 scale, depends on the score of several sub-components. Each on 0-100 scale:

  • Submission: full credit if student submits a non-empty reproduction in time.
  • All key fields: full credit if non of the most important fields are missing. Important field include, but are not limited to: paper information, reproduction package links, claims summary, claims estimates table, inputs description, display item assessments (particularly subjective scale section), paper-level assessments.
  • Paper summary: full credit if the student submits a short summary of the paper. See this section for recommendation on what to focus the summary on.
  • Claims: the student presents an accurate and clear summary of the claims to be assessed in one or two sentences (see Chapter 2). To properly score this section, the instructor needs to have previous knowledge of the paper.
  • Assessment: there are two possible contributions the student can make to obtain full credit:
    • The student assesses multiple claims. Additional credit per additional assessment could be assigned at a decreasing rate (e.g., 50% of the credit for the second assessment, 30% for the third and 20% for the fourth).
    • The student conducts a highly thorough assessment of a claim. This can be done by clearly identifying all the inputs (data sources, analysis data and code scripts) and providing a detailed documentation of the reproduction attempt (e.g., how long did it took, where did it break, what are the specific missing pieces).
  • Improvement: similar to the assessment scoring, the student can obtain full credit through two types of contributions:
    • The student provides multiple types of improvements with a similar credit structure as for multiple assessments.
    • The student conducts a highly thorough improvement of a claim. This doing any of the improvements describe in Improvements chapter. A key requirement is that the student clearly documents this improvements (see this section for suggestions on how to do this)
  • Future suggestions: in addition to the specific improvements carried out by the student, credit can be assigned for specific suggestions that the student may leave for future reproducers. Full credit should be assigned when the recommendations are concrete and demonstrate that the student performed an in-depth review of the code and data.
  • Revise reproduction package: full credit if the student posts a revised reproduction package in a trusted repository. Instructors can verify its existence by clicking on the link provided in the reproduction. If the student used version control software as part of the revised reproduction package, the instructor can easily verify the modifications relative to the original reproduction package by checking the commit history.
  • Robustness: similar to assessment and improvement, here the student could obtain full credit from two type of strategies (or a combinations of them):
    • The student identifies a large number of original analytical choices in the code files. The definition of “large” should be left to the Instructor, but here we suggest at least 20 original analytical choices (see this section for a distinction between original and non-original analytical choice).
    • The student conducts a robustness test by modifying one o more of the analytical choices identified in the paper. Full credit is assign for a clear discussion of the results and their reasonableness.
Score
(example)
Weights
(suggested)
Submitted 100 40%
All important fields 80 10%
Summary 100 5%
Clear claims 90 5%
Assessment 100 10%
Improvements 50 10%
Revise reproduction package 100 5%
Specific suggestions for improvements 0 5%
Robustness 100 10%
Final Score 87.5

7.4 Supporting Annonymous Reproduction Within a Course

When conducting a reproduction as part of a class exercise, it may be the case that some student could prefer to remain anonymous when posting their work on to the SSRP. The platform allows for this option when it comes to filling in the different forms associated with each stage. It does not however, provide a way for reproducers to post revised reproduction packages in an anonymous way (all trusted repositories require an identifiable posting user). In this scenario, we advice instructors to create a generic repository for the classroom (e.g. “Reproductions for Econ 270 - Fall 2021”) and deposit each anonymous reproduction in a self contain folder within this repository.

For example if a class has two students who choose the anonymous option, they should share (privately) their revised reproduction package with their instructor and she will post using the following structure:

  Reproductions for [Course Name, Semester]
    └── Revised reproduction package for Smith et al. (2019)
    └── Revised reproduction package for Perez et al. (2019)

Note: when making a revised reproduction package anonymous it is advised that instructors and student avoid using version control software. Even if the current materials are correctly de-identified, the entire version history may still contain identifiers.