Chapter 9 Tips and Resources for Reproducible Workflow

9.1 Reproducible workflows:

Below is a summary from Chapter 11 of Christensen, Freese, and Miguel (2019). If there is a book/chapter that you find particularly helpful for, please write a brief summary and submit a contribution.

Folder organization

Basic file organization is a critical component of a reproducible workflow. The following structure is recommended, but can be adapted to accommodate different reproducers or types of research. The name of the master folder should be easy to read and meaningful to all collaborators on the the project.

Create a master folder with a descriptive name for the project. It should contain:
- Separate folders for programming script files, raw data, edited data, output, and final paper or article text
- A README file: description of contents of each folder, as well as installation and operating instructions for reproducers
Keep raw data intact: Any edits or datasets generated using raw data should be stored in a “data” folder separate from the “raw data” folder.
When naming a directory or file, stick to lowercase letters with underscores (instead of spaces) to avoid cross-operating-system issues.

Efficient and readable programming

The core of programming for reproducibility is to write code wherever possible. Writing scripts leaves a record of any changes to data, which allows other researchers to reproduce work exactly. It is also helpful to leave comments in your code to explain the reasoning for changes or any gaps left if using point-and-click methods is necessary.

Leave a record of any changes to the data: Write code in the programming environment, instead of modifying data by hand in a spreadsheet or relying on point-and-click options.
Include comments in code to explain changes, and save intermediate datasets used in analysis.
Give variables names that will be informative to reproducers.
Use relative directory paths, not absolute paths, so the work can be more easily reproduced from different computers.

Version control

Version control software is used to keep a record of changes to project files. Although it is possible to manually track changes in a central research log or as notes in individual script files, many social scientists recognize the benefits of a distributed version control system. Because each collaborator is able to have a local copy of the project’s entire work history, these systems are particularly suited to collaborative projects. Below are methods to manually track changes and a brief explanation of Git, a popular distributed version control system.

Maintain a written record of work.
- In a central research log: Log activities in a single central file as often as work on the project is being done (keep track of “which team member writes what code, produces what output, edits which files, and when”).
- In individual script files: Record “who edited which part of which file when, and why.”
- With a version control system, such as Git: Git records changes made to files, by whom, and when.
A brief explanation of Git: Users add changed files to the staging area, then commit those changes to the project folder, or repository. Git keeps the filename and records the new version of each file from the staging area.

9.2 Links to resources, organizations and people for reproducibile work

Below we point to an ever growing list of resouces, organizations and specific researcher doing empirical work with a strong orientation towards reproducibility. The list are alphabetical order and contributions are welcome!

9.2.1 Resources

Dynamic documents in R, Python and Stata
Git resources:
IDB’s cheatsheet for transparency, reproducibility and ethics
Lars Vilhuber LDI’s Wiki for Reproducibility. Particularly this section.
Open Science Framework (OSF)
Project TIER
R for Stata users
World Bank DIME’s Wiki for transparent and reproducible research.

9.2.2 Organizations

9.2.3 People

(by last name)

References

Christensen, Garret, Jeremy Freese, and Edward Miguel. 2019. Transparent and Reproducible Social Science Research: How to Do Open Science. University of California Press.