In this post I describe a useful programming pattern that I implemented, and hopefully provide a gentle introduction to the idea of monads.
The motivation for all of this was that I had a {dplyr} pipeline as part of a {shiny} app that queries a database and I wanted to “record” what steps were in that pipeline so that I could offer them as a way to ‘reproduce’ the query. Some of the steps might be user-defined via the UI, so it was a little more complicated than just a hardcoded query.
One quick-and-dirty solution that might come to mind would be to make a
with_logging()
function that takes an expression, writes a text-representation
of it to a file or a global, then evaluates the expression. This would probably
work, but it means that every step of the pipeline needs to be wrapped in that.
Not the worst, but I had a feeling I knew of something more suitable. I’ve been
trying to learn Haskell this year, and so far it’s going sort of okay, but I’m
taking a detour through Elm which has most of the same syntax but less of the
hardcore ‘maths’ constructs.
Returning readers may have seen me use the term ‘monadic’ in the context of APL where it means that a function ‘takes one argument’ (as compared to ‘dyadic’ which takes two) and I believe this definition predates the mathematical one I’m going to use for the rest of this post.
‘Monad’ is a term often best avoided in conversation, and is often described in overly mathematical terms, the “meme” definition being the category theory version which states
“a monad is just a monoid in the category of endofunctors”
which is mostly true, but also unnecessary. Nonetheless, it’s an extremely useful pattern that comes up a lot in functional programming.
This blog post does a great job of walking through the more practical definition, and it has “translations” into several programming languages including JavaScript and Python.
Basically, map
applies some function to some values. flatMap
does the same,
but first “reaches inside” a context to extract some inner values, and after
applying the function, re-wraps the result in the original context.
One big advantage to this is that the ‘purity’ of the function remains; you always get the same output for the same input, but as well as that you can have some input/output operation be requested to be performed, which is how ‘pure’ languages still manage to communicate with the outside world and not just heat up the CPU for no reason.
The enlightening example for me is a List
– if we have some values and want to
apply some function to them, we can do that with, e.g.
f <- function(x) x^2 Map(f, c(2, 4, 6)) ## [[1]] ## [1] 4 ## ## [[2]] ## [1] 16 ## ## [[3]] ## [1] 36
and if we have a ‘flat’ list, this still works
Map(f, list(2, 4, 6)) ## [[1]] ## [1] 4 ## ## [[2]] ## [1] 16 ## ## [[3]] ## [1] 36
but what if we have an ‘outer context’ list?
Map(f, list(c(2, 3), c(4, 5, 6))) ## [[1]] ## [1] 4 9 ## ## [[2]] ## [1] 16 25 36
In this case, because f
is vectorised, Map
sends each vector to f
and gets
a result for each list. What if we have a list in the inner context?
Map(f, list(list(2, 3), list(4, 5, 6))) ## Error in x^2: non-numeric argument to binary operator
This fails because f(list(2, 3))
fails (it doesn’t know how to deal with an
argument which is a list).
Instead, we can use a version of ‘map’ that first reaches inside the outer
list
context, concatenates what’s inside, applies the function, then re-wraps
the result in a new, flat list
fmap <- function(x, f) { list(f(unlist(x))) } fmap(list(list(2, 3), list(4, 5, 6)), f) ## [[1]] ## [1] 4 9 16 25 36
This is the essence of a monad - something that supports such a fmap
operation
that performs the mapping inside the context (and potentially some other
operations, which we’ll get to). There are various patterns which benefit from
such a context, and this vignette describes an implementation of several of
these via the {monads}
package.
The fmap
operation is so common that it’s typical to find it presented as an
infix function, similar to how pipes work in R
list(list(2, 3), list(4, 5, 6)) |> fmap(f) ## [[1]] ## [1] 4 9 16 25 36
and we can go one step further by defining a new pipe which is just a different syntax for this
x |> fmap(f) x %>>=% f
This infix function borrows from Haskell’s >>=
(pronounced “bind”) which is
so fundamental that forms part of the language’s logo
With all that in mind, here’s how it looks in my (perhaps simplistic) implementation which you can get from GitHub here
library(monads)
Additionally, some toy helper functions are defined in this package for demonstrating application of functions, e.g.
timestwo(4) ## [1] 8 square(5) ## [1] 25 add_n(3, 4) ## [1] 7
As per the example above, the List
monad wraps values (which may be additional
list
s) and when flatMap
ed the results are ‘flattened’ into a single List
.
# identical to a regular Map x <- listM(1, 2, 3) %>>=% timestwo() x ## [[1]] ## [1] 2 4 6 # only possible with the flatMap approach y <- listM(list(1, 2), list(3, 4, 5)) %>>=% timestwo() y ## [[1]] ## [1] 2 4 6 8 10
Note that while x
and y
print as regular lists, they remain List
monads;
a print
method is defined which essentially extracts value(x)
.
As I alluded to earlier, additional operations can happen while the context is unwrapped, including IO. What if I just kept a log of the operations and appended each step to it? The wrapping context can include additional components, and a stored ‘log’ of the expressions used at each step is entirely possible.
All that is required is to wrap the value at the start of the pipeline in a
Logger
context for which there is a constructor helper, loggerM()
library(dplyr, warn.conflicts = FALSE) result <- loggerM(mtcars) %>>=% filter(mpg > 10) %>>=% select(mpg, cyl, disp) %>>=% arrange(desc(mpg)) %>>=% head()
This result is still a Logger
instance, not a value. To extract the value from
this we can use value()
. To extract the log of each step, use logger_log()
(to avoid conflict with base::log
)
value(result) ## mpg cyl disp ## Toyota Corolla 33.9 4 71.1 ## Fiat 128 32.4 4 78.7 ## Honda Civic 30.4 4 75.7 ## Lotus Europa 30.4 4 95.1 ## Fiat X1-9 27.3 4 79.0 ## Porsche 914-2 26.0 4 120.3 logger_log(result) ## ✔ Log of 4 operations: ## ## mtcars %>% ## filter(mpg > 10) %>% ## select(mpg, cyl, disp) %>% ## arrange(desc(mpg)) %>% ## head()
This works with any data value, so we could just as easily use an in-memory SQLite database (or external)
mem <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") dplyr::copy_to(mem, mtcars) res <- loggerM(mem) %>>=% tbl("mtcars") %>>=% filter(mpg > 10) %>>=% select(mpg, cyl, disp) %>>=% arrange(desc(mpg)) %>>=% head()
Again, extracting the components from this
value(res) ## # Source: SQL [6 x 3] ## # Database: sqlite 3.46.0 [:memory:] ## # Ordered by: desc(mpg) ## mpg cyl disp ## <dbl> <dbl> <dbl> ## 1 33.9 4 71.1 ## 2 32.4 4 78.7 ## 3 30.4 4 75.7 ## 4 30.4 4 95.1 ## 5 27.3 4 79 ## 6 26 4 120. logger_log(res) ## ✔ Log of 5 operations: ## ## mem %>% ## tbl("mtcars") %>% ## filter(mpg > 10) %>% ## select(mpg, cyl, disp) %>% ## arrange(desc(mpg)) %>% ## head()
Since the log captures what operations were performed, we could re-run this expression, and a helper is available for that
rerun(res) ## # Source: SQL [6 x 3] ## # Database: sqlite 3.46.0 [:memory:] ## # Ordered by: desc(mpg) ## mpg cyl disp ## <dbl> <dbl> <dbl> ## 1 33.9 4 71.1 ## 2 32.4 4 78.7 ## 3 30.4 4 75.7 ## 4 30.4 4 95.1 ## 5 27.3 4 79 ## 6 26 4 120.
Some similar functionality is present in the {magrittr} package which provides
the ‘classic’ R pipe %>%
; a ‘functional sequence’ starts with a .
and
similarly tracks which functions are to be applied to an arbitrary input once
evaluated - in this way, this is similar to defining a new function.
library(magrittr) # define a functional sequence fs <- . %>% tbl("mtcars") %>% select(cyl, mpg) # evaluate the functional sequence with some input data fs(mem) ## # Source: SQL [?? x 2] ## # Database: sqlite 3.46.0 [:memory:] ## cyl mpg ## <dbl> <dbl> ## 1 6 21 ## 2 6 21 ## 3 4 22.8 ## 4 6 21.4 ## 5 8 18.7 ## 6 6 18.1 ## 7 8 14.3 ## 8 4 24.4 ## 9 4 22.8 ## 10 6 19.2 ## # ℹ more rows # identify the function calls at each step of the pipeline magrittr::functions(fs) ## [[1]] ## function (.) ## tbl(., "mtcars") ## ## [[2]] ## function (.) ## select(., cyl, mpg)
Since the functional sequence is unevaluated, errors can be present and not triggered
errfs <- . %>% sqrt() %>% stop("oops") %>% add_n(3) x <- 1:10 errfs(x) ## Error in function_list[[i]](value): 11.41421356237311.7320508075688822.236067977499792.449489742783182.645751311064592.8284271247461933.16227766016838oops magrittr::functions(errfs) ## [[1]] ## function (.) ## sqrt(.) ## ## [[2]] ## function (.) ## stop(., "oops") ## ## [[3]] ## function (.) ## add_n(., 3)
In the monad context, steps which do raise an error nullify the value and a signifier is added to the log to prevent re-running the error
resx <- loggerM(x) %>>=% sqrt() %>>=% add_n(4) value(resx) ## [1] 5.000000 5.414214 5.732051 6.000000 6.236068 6.449490 6.645751 6.828427 ## [9] 7.000000 7.162278 logger_log(resx) ## ✔ Log of 2 operations: ## ## x %>% ## sqrt() %>% ## add_n(4) err <- loggerM(x) %>>=% sqrt() %>>=% stop("oops") %>>=% add_n(3) value(err) ## NULL logger_log(err) ## ✖ Log of 3 operations: [ERROR] ## ## x %>% ## sqrt() %>% ## [E] stop("oops") %>% ## [E] add_n(3)
Aside from an error destroying the value, returning a NULL
result will also
produce this effect
nullify <- loggerM(x) %>>=% sqrt() %>>=% ret_null() %>>=% add_n(7) value(nullify) ## NULL logger_log(nullify) ## ✖ Log of 3 operations: [ERROR] ## ## x %>% ## sqrt() %>% ## [E] ret_null() %>% ## [E] add_n(7)
One downside to the functional sequence approach is chaining these - since the
first term must be .
, that is always the first entry, and chaining multiple
sequences is not clean.
a <- . %>% sqrt() a ## Functional sequence with the following components: ## ## 1. sqrt(.) ## ## Use 'functions' to extract the individual functions. b <- . %>% a %>% add_n(1) b ## Functional sequence with the following components: ## ## 1. a(.) ## 2. add_n(., 1) ## ## Use 'functions' to extract the individual functions. b(x) ## [1] 2.000000 2.414214 2.732051 3.000000 3.236068 3.449490 3.645751 3.828427 ## [9] 4.000000 4.162278
Because the monad context is recreated at every step, chaining these is not a problem
a <- loggerM(x) %>>=% sqrt() value(a) ## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427 ## [9] 3.000000 3.162278 logger_log(a) ## ✔ Log of 1 operations: ## ## x %>% ## sqrt() b <- a %>>=% add_n(1) value(b) ## [1] 2.000000 2.414214 2.732051 3.000000 3.236068 3.449490 3.645751 3.828427 ## [9] 4.000000 4.162278 logger_log(b) ## ✔ Log of 2 operations: ## ## x %>% ## sqrt() %>% ## add_n(1)
This achieves what I wanted in terms of ‘recording’ the steps of the pipeline, and it only requires wrapping the initial value and using a different pipe.
But there are other monads I could also implement… so I did.
In addition to capturing the expressions in a log, the Timer
monad also
captures the evaluation timing for each step, storing these alongside the
expressions themselves in a data.frame
x <- timerM(5) %>>=% sleep_for(3) %>>=% timestwo() %>>=% sleep_for(1.3) value(x) ## [1] 10 times(x) ## expr time ## 1 5 0.000 ## 2 sleep_for(3) 3.014 ## 3 timestwo() 0.000 ## 4 sleep_for(1.3) 1.306 y <- timerM(5) %>>=% sleep_for(2) %>>=% ret_null() %>>=% sleep_for(0.3) value(y) ## NULL times(y) ## expr time ## 1 5 0.000 ## 2 sleep_for(2) 2.002 ## 3 ret_null() 0.000 ## 4 sleep_for(0.3) 0.302
In some languages it is preferrable to return something rather than raising
an error, particularly if you want to ensure that errors are handled. The
Maybe
pattern consists of either a Nothing
(which is empty) or a Just
containing some value; all functions applied to a Maybe
will be one of these.
For testing the result, some helpers is_nothing()
and is_just()
are defined.
x <- maybeM(9) %>>=% sqrt() %>>=% timestwo() value(x) ## Just: ## [1] 6 is_just(x) ## [1] TRUE is_nothing(x) ## [1] FALSE y <- maybeM(Nothing()) %>>=% sqrt() value(y) ## Nothing is_just(y) ## [1] FALSE is_nothing(y) ## [1] TRUE z <- maybeM(10) %>>=% timestwo() %>>=% add_n(Nothing()) value(z) ## Nothing is_just(z) ## [1] FALSE is_nothing(z) ## [1] TRUE
For what is likely a much more robust implementation, see {maybe}.
Similar to a Maybe
, a Result
can contain either a successful Ok
wrapped
value or an Err
wrapped message, but it will be one of these. This pattern
resembles (and internally, uses) the tryCatch()
approach where the evaluation
will not fail, but requires testing what is produced to determine success, for
which is_ok()
and is_err()
are defined.
x <- resultM(9) %>>=% sqrt() %>>=% timestwo() value(x) ## OK: ## [1] 6 is_err(x) ## [1] FALSE is_ok(x) ## [1] TRUE
When the evaluation fails, the error is reported, along with the value prior to the error
y <- resultM(9) %>>=% sqrt() %>>=% ret_err("this threw an error") value(y) ## Error: ## [1] "this threw an error; previously: 3" is_err(y) ## [1] TRUE is_ok(y) ## [1] FALSE z <- resultM(10) %>>=% timestwo() %>>=% add_n("banana") value(z) ## Error: ## [1] "n should be numeric; previously: 20" is_err(z) ## [1] TRUE is_ok(z) ## [1] FALSE
The flatMap
/“bind” operator defined here as %>>=%
is applicable to any monad
which has a bind()
method defined. The monads defined in this package are all
R6Class
objects exposing such a method of the form m$bind(.call, .quo)
which
expects a function and a quosure. You can add your own extensions to these by
defining such a class (and probably a constructor helper and a print()
method)
# a Reporter monad which reports unpiped function calls Reporter <- R6::R6Class( c("ReporterMonad"), public = list( value = NULL, initialize = function(value) { if (rlang::is_quosure(value)) { self$value <- rlang::eval_tidy(value) } else { self$value <- value } }, bind = function(f, expr) { ## 'undo' the pipe and inject the lhs as an argument result <- unlist(lapply(unlist(self$value), f)) args <- as.list(c(self$value, rlang::call_args(expr))) fnew <- rlang::call2(rlang::call_name(expr), !!!args) cat(" ** Calculating:", rlang::quo_text(fnew), "=", result, "\n") Reporter$new(result) } ) ) reporterM <- function(value) { v <- rlang::enquo(value) Reporter$new(v) } print.Reporter <- function(x, ...) { print(value(x)) } x <- reporterM(17) %>>=% timestwo() %>>=% square() %>>=% add_n(2) %>>=% `/`(8) ## ** Calculating: timestwo(17) = 34 ## ** Calculating: square(34) = 1156 ## ** Calculating: add_n(1156, 2) = 1158 ## ** Calculating: 1158/8 = 144.75 value(x) ## [1] 144.75
This is just a toy example; attempting to cat()
a data.frame
result would
not go well.
There are other patterns that I haven’t implemented. One that would have been
interesting is Promise
- I had a ‘mind-blown’ moment reading
this post about some Roc syntax
with the throw-away line
Tasks can be chained together using the
Task.await
function, similarly to how JavaScript Promises can be chained together using a Promise’sthen()
method. (You might also know functions in other languages similar toTask.await
which go by names likeandThen
,flatMap
, orbind
.)
because I had never made the connection between monads and async/await, but it’s
a lot clearer now. I did try implementing Promise
in {monads} using {future}
but I couldn’t quite get the unevaluated promise object to pipe correctly.
There are a handful of existing implementations, most of which are more fleshed out than mine.
{monads} - a sketched-out implementation
that relies on dispatch for flatMap
operations. I’m using the same name as
this package, but that one hasn’t been touched in quite a while.
{rmonad} - archived on CRAN, but offers a sophisticated ‘funnel’ mechanism and various ways to capture steps of a pipeline.
{maybe} - a more detailed implementation of
Maybe
.
{chronicler} - a way to
post-process the result at each step and capture information, such as the
runtime (see Timer
) or dimensions. Requires an explicit bind()
at each step.
Associated blog post.
I also found this post
about implementing a Maybe
monad, and this one
comparing the {foreach} package’s %do%
to Haskell.
I’m somewhat surprised that in all of the above examples, none seem to use the
Haskell ‘bind’ format of a pipe (>>=
or as a valid R infix special, %>>=%
)
but at least I’m not stepping on other package’s toes there. One particular
benefit of this one is that by deleting the two outermost characters inside the
special you get the {magrittr} pipe %>%
.
If nothing else, I found it really useful to go through the process of defining these myself - I learned a lot about {R6} classes and quosures in the process, too.
My package comes with no guarantees - it works for the examples I’ve tried, but it’s possible (if not likely) that I’ve not thought of all the edge cases. I’ve certainly relied on R’s vectorisation (rather than explicitly re-mapping individual values) and my quosure skills are somewhat underdeveloped.
If you do take it for a spin I’d love to hear your thoughts on it. As always, I can be found on Mastodon and the comment section below.
## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.3 (2024-02-29) ## os Pop!_OS 22.04 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Adelaide ## date 2024-10-18 ## pandoc 3.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## bit 4.0.4 2020-08-04 [3] CRAN (R 4.0.2) ## bit64 4.0.5 2020-08-30 [3] CRAN (R 4.2.0) ## blob 1.2.4 2023-03-17 [3] CRAN (R 4.2.3) ## blogdown 1.19 2024-02-01 [1] CRAN (R 4.3.3) ## bookdown 0.36 2023-10-16 [1] CRAN (R 4.3.2) ## bslib 0.8.0 2024-07-29 [1] CRAN (R 4.3.3) ## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.3.3) ## callr 3.7.3 2022-11-02 [3] CRAN (R 4.2.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.3) ## crayon 1.5.2 2022-09-29 [3] CRAN (R 4.2.1) ## DBI 1.2.1 2024-01-12 [3] CRAN (R 4.3.2) ## dbplyr 2.4.0 2023-10-26 [3] CRAN (R 4.3.2) ## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.2) ## digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.3) ## dplyr * 1.1.4 2023-11-17 [3] CRAN (R 4.3.2) ## ellipsis 0.3.2 2021-04-29 [3] CRAN (R 4.1.1) ## evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.3.3) ## fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.3) ## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.3) ## fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.3) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.3) ## glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.3) ## htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.2) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.2) ## icecream 0.2.1 2023-09-27 [1] CRAN (R 4.3.2) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.3) ## jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.3) ## knitr 1.48 2024-07-07 [1] CRAN (R 4.3.3) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.2) ## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.3) ## magrittr * 2.0.3 2022-03-30 [1] CRAN (R 4.3.3) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.3) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.3) ## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.2) ## monads * 0.1.0.9000 2024-10-14 [1] local ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.3) ## pkgbuild 1.4.2 2023-06-26 [1] CRAN (R 4.3.2) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.3) ## pkgload 1.3.3 2023-09-22 [1] CRAN (R 4.3.2) ## prettyunits 1.2.0 2023-09-24 [3] CRAN (R 4.3.1) ## processx 3.8.3 2023-12-10 [3] CRAN (R 4.3.2) ## profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.2) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.2) ## ps 1.7.6 2024-01-18 [3] CRAN (R 4.3.2) ## purrr 1.0.2 2023-08-10 [3] CRAN (R 4.3.1) ## R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.3) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.2) ## remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.3.2) ## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.3) ## rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.3.3) ## RSQLite 2.3.7 2024-05-27 [1] CRAN (R 4.3.3) ## rstudioapi 0.15.0 2023-07-07 [3] CRAN (R 4.3.1) ## sass 0.4.9 2024-03-15 [1] CRAN (R 4.3.3) ## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2) ## shiny 1.7.5.1 2023-10-14 [1] CRAN (R 4.3.2) ## stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.3) ## stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.3) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.3) ## tidyselect 1.2.0 2022-10-10 [3] CRAN (R 4.2.1) ## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.2) ## usethis 3.0.0 2024-07-29 [1] CRAN (R 4.3.3) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.3) ## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.3) ## withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.3) ## xfun 0.47 2024-08-17 [1] CRAN (R 4.3.3) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.2) ## yaml 2.3.10 2024-07-26 [1] CRAN (R 4.3.3) ## ## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.3 ## [2] /usr/local/lib/R/site-library ## [3] /usr/lib/R/site-library ## [4] /usr/lib/R/library ## ## ──────────────────────────────────────────────────────────────────────────────