Computer Software Tools for Writing Reproducible Papers

This post is a ?longread mainly designed for graduate pupils and postdocs, but should ideally be available more broadly. Studying the post should simply take about an hour or so, while after the guidelines totally might take the greater element of per day.

As a caveat that is important a lot of exactly what this post analyzes continues to be experimental, in a way that you may possibly come across small problems in after the steps given just below. I am sorry if this happens, and many thanks for the persistence.

Whatever the case, if you learn this post helpful, please cite it in documents which you compose making use of these tools; doing this assists me personally down and helps it be easier for me personally to publish more such advice later on.

Finally, we keep in mind that we now have maybe perhaps not covered several really tools that are important, such as for example ReproZip. This post has already been over 6,000 words very very very long, so we didn’t attempt to explain to you all possible tools. We encourage further research, instead of thinking about this post as definitive.

Thank you for reading! ?


Within my past post, We detailed a few of the ways our software tools and social structures encourage some actions and discourage others. Specially when it comes down to tasks such as for example composing reproducible documents that both offer to considerably enhance research tradition, but they are significantly challening in their own personal right, it is critical to make sure that people favorably encourage doing things somewhat better than we’ve done them before. Having said that, though my past post spilled quite a few pixels in the exactly just just what additionally the why of these encouragements, as well as exactly exactly exactly what help we truly need for reproducible research techniques, I said little about just just how you could practically do better.

This post attempts to enhance on that by providing a concrete and specific workflow that causes it to be somewhat better to compose top papers we could. Notably, in doing this, i shall consider a paper-writing procedure that I’ve developed for my personal usage and that works well for me— everyone approaches things differently, I describe here so you may disagree (perhaps even vehemently) with some of the choices. Whether or not therefore, but, i really hope that in providing a certain pair of pc pc software tools that really custom writings work very well together to aid reproducible research, i could at the very least go the discussion ahead while making my small part of academia extremely somewhat better.

Having stated exactly just exactly what my goals are with this particular post, it is worth taking an instant to think about just just exactly what technical goals we must focus on in developing and configuring pc software tools to be used within our research. Above all, i’ve dedicated to tools which are cross-platform: it is really not my destination nor my need to mandate just just what system that is operating specific researcher should utilize. Furthermore, we quite often need certainly to collaborate with individuals that produce considerably different alternatives about their computer computer software surroundings. Therefore, we ought to be cautious just just exactly what barriers to entry we establish whenever we utilize methodologies which do not port well to platforms apart from our personal.

Then, I have actually centered on tools which minimize the total amount of closed-source computer software that’s needed is to have research done. The conflict between closed-source pc computer pc software and reproducibility goes without saying almost to your point to be self-evident. Hence, without getting purists in regards to the problem, it’s still helpful to reduce our reliance on closed-source gatekeepers just as much as is reasonable provided other constraints.

The very last as well as perhaps least obvious goal that I will follow on this page is that each device we develop or follow right here must certanly be ideal for significantly more than an individual function. Installing computer software presents a brand new cognative load in focusing on how it runs, and increases the basic upkeep price we spend in doing research. Although this may be mitigated to some extent with appropriate usage of package administration, we must additionally be careful it provides to us that we justify each piece of our software infrastructure in terms of what benefits. That means specifically that we will choose things that solve more than just the immediate problem at hand, but that support our research efforts more generally in this post.

Without further ado, then, the others with this post actions through one specific pc software stack for reproducible research in a bit by piece fashion. I’ve attempted to keep this discussion detailed, although not esoteric, into the hopes of creating a description that is accessible. In specific, i’ve maybe not concentrated after all on the best way to develop systematic computer computer software of just how to compose reproducible rule, but alternatively how exactly to incorporate such rule into a manuscript that is high-quality. My advice is therefore always particular as to what I’m sure, quantum information, but ought to be easily adjusted with other industries.

After that, I’ll detail the next elements of a computer software stack for composing research that is reproducible:

  • Command-line environment: PowerShell
  • TeX / LaTeX circulation: TeX Live and MiKTeX
  • Literate programming environment: Jupyter Notebook
  • Text editor: Artistic Studio Code
  • LaTeX template: , , and
  • Venture layout
  • Variation control: Git
  • arXiv develop management: PoShTeX

Command Line

Command-line interfaces and scripting languages prov >bash , tcsh , and zsh , in addition to more recent tools such as for instance fish and xonsh . With this post, but, we will explain just how to use Microsoft’s open-source PowerShell alternatively.

Microsoft provides PowerShell easy-to-install packages for Linux and macOS / OS X on at their GitHub repository. For some Windows users, we don’t need certainly to install energyShell, but we shall want to put in a package supervisor to aid us install a couple of things later on. It now, following their instructions if you don’t already have Chocolatey, go on and install.

Likewise, we will utilize the package supervisor Homebrew for macOS / OS X. The way that is quickest to put in it really is to operate listed here command in Terminal :

Additionally, make sure to restart your window that is terminal after installation. Then, we install PowerShell with all the after two commands:

The very first command installs the Homebrew Cask expansion for programs distributed as binaries.

Apart: Why PowerShell?

As a short as >bash have already been ported to Windows and there work well, but they don’t tend to the office in a manner that plays well with indigenous tools. By way of example, it is hard to have Cygwin Bash to reliably interoperate with commonly-used TeX distributions such as for instance MiKTeX.

A majority of these challenges arise from that bash along with other such tools work by manipulating strings, as opposed to prov/ that is \ in file title paths, while making slashes invariant in cases such as for example TeX supply.

By comparison, PowerShell can be utilized being a command-line REPL (read-evaluate-print cycle) program towards the more structrued .NET development environment. In that way, OS-specific distinctions such as / versus \ may be handled as an API, instead of depending on sequence parsing for every thing. Furthermore, PowerShell comes pre-installed of all recent versions of Windows, making it simpler to manage the lack that is comaprative of administration of all Windows installations. (PowerShell also addresses this by giving some really package that is nice features, which we shall used in subsequent sections.)

Since PowerShell has already been open-sourced, we are able to easily depend on it for the purposes right here.

For composing a reproducible clinical paper, there’s really no replacement nevertheless for TeX. Hence, in the event that you don’t have TeX installed currently, let’s go right ahead and install that now.

(Linux just) TeX Reside

We can use Ubuntu’s package manager to effortlessly install TeX Live:

The process will be somewhat various on other variations of Linux.

(Windows just) MiKTeX

Since we installed Chocolatey early in the day, it is quite simple to set up MiKTeX. From an Administrator session of PowerShell (right-click on PowerShell when you look at the begin menu, and press Run as administrator), run the command that is following

(macOS / OS X just) MacTeX

Installing MacTeX is likewise straightforward Homebrew that is using Caskwhich we must have installed early in the day):

Shifting, let’s have a couple of seconds to get Jupyter ready to go. Put succiently, Jupyter is just a effective infrastructure fo systematic development in many various different languages. Certainly, perhaps the name tips to your variety of tools supported, since it arises from a portmanteau of Julia, Python and R. Jupyter goes well beyond these three examples, however, and supports a language-agnostic user interface for development in JavaScript, F#, as well as MATLAB.

Of specific interest to us may be the Jupyter Notebook functionality, formerly called IPython Notebook. This device permits us to compose documents that are literate intersperse supply rule, explanations, math, numbers and plots. As a result, Jupyter Notebook is great for providing lucid and readable explanations of numerical and experimental outcomes, supplying an approach to plainly explain a project that is reproducible.