IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Reducing my for loop usage with purrr::reduce()

    Maëlle's R blog on Maëlle Salmon's personal website发表于 2023-07-26 00:00:00
    love 0
    [This article was first published on Maëlle's R blog on Maëlle Salmon's personal website, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    I (only! but luckily!) recently got introduced to the magic of purrr::reduce(). Thank you, Tobias! I was told about it right as I was unhappily using many for loops in a package1, for lack of a better idea. In this post I’ll explain how purrr::reduce() helped me reduce my for loop usage. I also hope that if I’m doing something wrong, someone will come forward and tell me!

    Before: many for, much sadness

    I was starting from a thing, that could be a list, or even a data.frame. Then for a bunch of variables, I tweaked the thing. My initial coding pattern was therefore:

    for (var in variables_vector) {
      thing <- do_something(thing, var, other_argument = other_argument)
    }

    I was iteratively changing the thing, along a variables_vector, or sometimes a variables_list.

    Silly example

    Ugh, finding an example is hard, it feels very contrived but I promise my real-life adoption of purrr::usage() was life-changing!

    # Some basic movie information
    movies <- tibble::tribble(
      ~title, ~color, ~elements,
      "Barbie", "pink", "shoes",
      "Oppenheimer", "red", "history"
    )
    
    # More information to add to movies
    info_list <- list(
      list(title = "Barbie", info = list(element = "sparkles")),
      list(title = "Barbie", info = list(element = "feminism")),
      list(title = "Oppenheimer", info = list(element = "fire"))
    )
    
    # Don't tell me this is weirdly formatted data,
    # who never obtains weirdly formatted data?!
    info_list
    #> [[1]]
    #> [[1]]$title
    #> [1] "Barbie"
    #> 
    #> [[1]]$info
    #> [[1]]$info$element
    #> [1] "sparkles"
    #> 
    #> 
    #> 
    #> [[2]]
    #> [[2]]$title
    #> [1] "Barbie"
    #> 
    #> [[2]]$info
    #> [[2]]$info$element
    #> [1] "feminism"
    #> 
    #> 
    #> 
    #> [[3]]
    #> [[3]]$title
    #> [1] "Oppenheimer"
    #> 
    #> [[3]]$info
    #> [[3]]$info$element
    #> [1] "fire"
    
    add_element <- function(movies, info) {
      movies[movies[["title"]] == info[["title"]],][["elements"]] <-
        toString(c(
          movies[movies[["title"]] == info[["title"]],][["elements"]],
          info[["info"]][[1]]
        ))
      movies
    }

    Now how do I add each element of the list to the original table? I could type something like:

    for (info in info_list) {
      movies <- add_element(movies, info)
    }
    movies
    #> # A tibble: 2 × 3
    #>   title       color elements                 
    #>   <chr>       <chr> <chr>                    
    #> 1 Barbie      pink  shoes, sparkles, feminism
    #> 2 Oppenheimer red   history, fire
    

    It’s not too bad, really. But since there’s another way, we can change it.

    After

    With purrr::reduce()

    for (var in variables_vector) {
      thing <- do_something(thing, var)
    }

    can become

    thing <- purrr::reduce(variables_vector, do_something, .init = thing)

    And (notice the other argument),

    for (var in variables_vector) {
      thing <- do_something(thing, var, other_argument = other_argument)
    }

    can become

    thing <- purrr::reduce(
      variables_vector, 
      \(thing, x) do_something(thing, x, other_argument = other_argument), 
      .init = thing
    )

    I haven’t completely internalized the pattern above but the documentation of purrr::reduce() states

    “We now generally recommend against using … to pass additional (constant) arguments to .f. Instead use a shorthand anonymous function:

    Instead of x |> map(f, 1, 2, collapse = “,") do: x |> map((x) f(x, 1, 2, collapse = “,")) This makes it easier to understand which arguments belong to which function and will tend to yield better error messages.”

    It might remind you of how things work for dplyr::across() these days.

    Back to our silly example!

    # Some basic movie information
    movies <- tibble::tribble(
      ~title, ~color, ~elements,
      "Barbie", "pink", "shoes",
      "Oppenheimer", "red", "history"
    )
    
    # More information to add to movies
    info_list <- list(
      list(title = "Barbie", info = list(element = "sparkles")),
      list(title = "Barbie", info = list(element = "feminism")),
      list(title = "Oppenheimer", info = list(element = "fire"))
    )
    
    add_element <- function(movies, info) {
      movies[movies[["title"]] == info[["title"]],][["elements"]] <-
        toString(c(
          movies[movies[["title"]] == info[["title"]],][["elements"]],
          info[["info"]][[1]]
        ))
      movies
    }
    
    purrr::reduce(info_list, add_element, .init = movies)
    #> # A tibble: 2 × 3
    #>   title       color elements                 
    #>   <chr>       <chr> <chr>                    
    #> 1 Barbie      pink  shoes, sparkles, feminism
    #> 2 Oppenheimer red   history, fire
    

    If we tweak the add_element() function to add a separator argument to it,

    add_element <- function(movies, info, separator) {
      movies[movies[["title"]] == info[["title"]],][["elements"]] <-
        paste(c(
          movies[movies[["title"]] == info[["title"]],][["elements"]],
          info[["info"]][[1]]
        ), collapse = separator)
      movies
    }
    
    purrr::reduce(
      info_list, 
      \(movies, x) add_element(movies, x, separator = " - "), 
      .init = movies
    )
    #> # A tibble: 2 × 3
    #>   title       color elements                   
    #>   <chr>       <chr> <chr>                      
    #> 1 Barbie      pink  shoes - sparkles - feminism
    #> 2 Oppenheimer red   history - fire
    
    purrr::reduce(
      info_list, 
      \(movies, x) add_element(movies, x, separator = " PLUS "), 
      .init = movies
    )
    #> # A tibble: 2 × 3
    #>   title       color elements                         
    #>   <chr>       <chr> <chr>                            
    #> 1 Barbie      pink  shoes PLUS sparkles PLUS feminism
    #> 2 Oppenheimer red   history PLUS fire
    

    And voilà!

    Conclusion

    In this post I presented my approximate understanding of purrr::reduce(), that helped me avoid writing some for loops and instead more elegant code… or at least helped me understand a pattern that in the future I could use elegantly. I can only hope I purrr::accumulate() more experience, as I very much still feel like a newbie.

    For more information I’d recommend reading the documentation of purrr::reduce() to be aware of other features, the content on the reduce family in Advanced R by Hadley Wickham… and release-watching the purrr repo to keep up-to-date with latest recommendations. You can also use GitHub Advanced Search to find examples of usage of the function in, say, CRAN packages.

    Edit: For another take of / use case of purrr::reduce(), June Choe wrote a nice detailed tutorial “Collapse repetitive piping with reduce()".


    1. The package is glitter, where we store query objects as a list. ↩︎

    To leave a comment for the author, please follow the link and comment on their blog: Maëlle's R blog on Maëlle Salmon's personal website.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Reducing my for loop usage with purrr::reduce()


沪ICP备19023445号-2号
友情链接