Reproducibility is a big issue in the (computational) world of science. Code that runs today might not run tomorrow because packages are updated, functions deprecated or removed, and whole programming languages change. In the case of R, there exist a great variety of packages to ensure that code written today, also runs tomorrow (and hopefully also in a few years). Ths includes packages such as renv, groundhog, miniCRAN, and Require.
But the issue of reproducibility hasn’t always been as strong in the focus as it is today, and particularly old code wasn’t necessarily prepared to be future proof. Reproducing results of 5 year old code is hence not as straightforward as simply executing the script.
Enter the new package rang.
The goal of rang1 is to obtain the dependency graph of R packages at a specific point in time. It can technically be used for similar purposes as renv, groundhog and others, but its main use case is as an “Rchaeological” tool, reconstructing historical R computational environments which have not been completely declared at that point in time.
You can install the development version of rang like so:
remotes::install_github("chainsawriot/rang")
The package was submitted to CRAN on 15/02/2023 and will hopefully soon be available via
install.packages("rang")
library(rang)
The function resolve()
can be used to obtain the dependency graph of R packages. Currently, the package supports both CRAN and Github packages.
x <- resolve(pkgs = c("sna", "schochastics/rtoot"), snapshot_date = "2022-11-30") graph <- resolve(pkgs = c("openNLP", "LDAvis", "topicmodels", "quanteda"), snapshot_date = "2020-01-16") graph ## resolved: 4 package(s). Unresolved package(s): 0 ## $`cran::openNLP` ## The latest version of `openNLP` [cran] at 2020-01-16 was 0.2-7, which has 3 unique dependencies (2 with no dependencies.) ## ## $`cran::LDAvis` ## The latest version of `LDAvis` [cran] at 2020-01-16 was 0.3.2, which has 2 unique dependencies (2 with no dependencies.) ## ## $`cran::topicmodels` ## The latest version of `topicmodels` [cran] at 2020-01-16 was 0.2-9, which has 7 unique dependencies (5 with no dependencies.) ## ## $`cran::quanteda` ## The latest version of `quanteda` [cran] at 2020-01-16 was 1.5.2, which has 63 unique dependencies (33 with no dependencies.) #system requirenments graph$sysreqs ## [1] "apt-get install -y default-jdk" "apt-get install -y libxml2-dev" ## [3] "apt-get install -y make" "apt-get install -y zlib1g-dev" ## [5] "apt-get install -y libpng-dev" "apt-get install -y libgsl0-dev" ## [7] "apt-get install -y libicu-dev" "apt-get install -y python3" #R version graph$r_version ## [1] "3.6.2"
The resolved result is an S3 object called rang
which can be exported as an installation script. This script can be execute on a vanilla R installation.
export_rang(graph, "rang.R")
The execution of the installation script, however, often fails (now) due to missing system dependencies and incompatible R versions. Therefore, the approach outlined below should be used for older code.
A rang
object can be used to recreate the computational environment via Rocker. Note that the oldest R version one can get from Rocker is R 3.1.0.
dockerize(graph, "~/rocker_test")
Now, you can build and run the Docker container.
cd ~/rocker_test docker build -t rang . docker run --rm --name "rangtest" -ti rang
The folder “rocker_test” includes a README which gives more details on how to use docker if you are unfamiliar with it.
More information can also be obtained from the GitHub README and from the FAQ vignette.
vignette("faq", package = "rang")
If you want to include additional resources (e.g. analysis scripts) you can set the parameter material_dir
to the path of the material. This will then be copied into output_dir
and in turn also into the Docker container.
Above I mentioned that Rocker only supports old R version from 3.1.0 onward. But rang can still deal with older versions of R (until 2.1.0), by generating the docker image differently. In this case, R is compiled from source and the Dockerfile generated is based on Debian Woody (3.0). This allows to make any (well, at least most) code dating back to 2005 reproducible again. A solution for code dating back to R 1.0.0 is still being worked on.
If you are interested in more details on how to run old versions of R, I suggest this blog post of my colleague Chung-hong Chan who is also the main developer of rang.
In terms or reproducibility in R, I really enjoyed reading these two posts by Bruno Rodrigues: