Introduction
Today I am going to make a short post on the R package {box}
which was showcased to me quite nicely by Michael Miles. It was informative and I was able to immediately see the usefulness of the {box}
library.
So what is ‘box’? Well here is the description straight from their site:
‘box’ allows organising R code in a more modular way, via two mechanisms:
- It enables writing modular code by treating files and folders of R code as independent (potentially nested) modules, without requiring the user to wrap reusable code into packages.
- It provides a new syntax to import reusable code (both from packages and from modules) which is more powerful and less error-prone than library or require, by limiting the number of names that are made available.
So let’s see how it all works.
Function
The main portion of the script looks like this:
# Main script
# Script setup --------------------------------------
# Load box modules
box::use(. / box / global_options / global_options)
box::use(. / box / io / imports)
box::use(. / box / io / exports)
box::use(. / box / mod / mod)
# Load global options
global_options$set_global_options()
# Main script ---------------------------------------
# Load data, process it, and export results
all_data <- getOption('data_dir') |>
# Load all data
imports$load_all() |>
# Modify dataset
mod$modify_data() |>
# Export data
exports$export_data()
So what does this do? Well it is grabbing data from a predefined location, modifying it and then re-exporting it. Now let’s look at all the code that is behind it, which allows us to do these things and then you will see the power of using box
Example
Let’s take a look at the global options settings.
# Set global options
#' @export
set_global_options <- function() {
options(
look_ups = 'look-ups/',
data_dir = 'data/input/'
)
}
Ok 6 lines, boxed down to one.
Now the import function.
# Function for importing data
#' @export
load_all <- function(file_path) {
box::use(purrr)
box::use(vroom)
file_path |>
# Get all csv files from folder
list.files(full.names = TRUE) |>
# Set list names
purrr$set_names(\(file) basename(file)) |>
# Load all csvs into list
purrr$map(\(file) vroom$vroom(file))
}
Now the modify_data
function.
# Function for modifying data
#' @export
modify_data <- function(df_list) {
box::use(dplyr)
box::use(purrr)
map_fun <- function(df) {
df |>
dplyr$select(name:mass) |>
dplyr$mutate(lol = height * mass) |>
dplyr$filter(lol > 1500)
}
# Apply mapping function to list
purrr$map(df_list, map_fun)
}
Ok again, a big savings here, instead of the above we simply call mod$modify_data()
which makes things clearner and also modular in that we can go to a very specific spot in our proejct to fix an error or add/subtract functionality.
Lastly the export.
# Function for exporting data
#' @export
export_data <- function(df_list) {
box::use(vroom)
box::use(purrr)
# Export data
purrr$map2(.x = df_list,
.y = names(df_list),
~vroom$vroom_write(x = .x,
file = paste0('data/output/',
.y),
delim = ','))
}
Voila! I think to even a fresh user, the power of boxing your functions is fairly apparent and to the advanced user, eyes are most likely glowing!
Continue reading:
An example of using {box}