Last week after my useR! talk, someone I had met at the R-Ladies dinner asked me for a list of all the links in my slides. I said I’d prepare it, not because I’m a nice person, but because I knew it’d be an use case where the great tinkr package would shine!
tinkr is an R package I created, and that its current maintainer Zhian Kamvar took much further that I’d ever would have. tinkr can transform Markdown into XML and back.
Under the hood, tinkr uses
Anyway, enough said, let’s go back to today’s use case.
index.qmd
With tinkr I can use XPath, the query language for XML or HTML, to extract links from my slidedeck source. Then I will format them as a list.
First, I create a yarn object from my slidedeck source.
talk_yarn <- tinkr::yarn$new("/home/maelle/Documents/conferences/user2024/index.qmd")
talk_yarn
#> <yarn>
#> Public:
#> add_md: function (md, where = 0L)
#> body: xml_document, xml_node
#> clone: function (deep = FALSE)
#> get_protected: function (type = NULL)
#> head: function (n = 6L, stylesheet_path = stylesheet())
#> initialize: function (path = NULL, encoding = "UTF-8", sourcepos = FALSE,
#> md_vec: function (xpath = NULL, stylesheet_path = stylesheet())
#> ns: http://commonmark.org/xml/1.0
#> path: /home/maelle/Documents/conferences/user2024/index.qmd
#> protect_curly: function ()
#> protect_math: function ()
#> protect_unescaped: function ()
#> reset: function ()
#> show: function (lines = TRUE, stylesheet_path = stylesheet())
#> tail: function (n = 6L, stylesheet_path = stylesheet())
#> write: function (path = NULL, stylesheet_path = stylesheet())
#> yaml: --- format: revealjs: highlight-style: a11y ...
#> Private:
#> encoding: UTF-8
#> md_lines: function (path = NULL, stylesheet = NULL)
#> sourcepos: FALSE
Then I extract all links.
links <- xml2::xml_find_all(
talk_yarn$body,
xpath = ".//md:link",
ns = talk_yarn$ns
)
head(links)
#> {xml_nodeset (6)}
#> [1] <link destination="https://user-maelle.netlify.app" title="">\n <text xm ...
#> [2] <link destination="https://www.pexels.com/photo/old-cargo-ship-on-sea-207 ...
#> [3] <link destination="https://www.pexels.com/photo/the-word-louise-is-spelle ...
#> [4] <link destination="https://www.pexels.com/photo/gray-rotary-telephone-on- ...
#> [5] <link destination="https://www.pexels.com/photo/close-up-photography-of-y ...
#> [6] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...
I then throw away the links to the great website Pexels, because these are image credits rather than information useful to do R stuff.
links <- purrr::discard(
links,
\(x) startsWith(xml2::xml_attr(x, "destination"), "https://www.pexels")
)
head(links)
#> {xml_nodeset (6)}
#> [1] <link destination="https://user-maelle.netlify.app" title="">\n <text xm ...
#> [2] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...
#> [3] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...
#> [4] <link destination="https://www.heltweg.org/posts/who-wrote-this-shit/" ti ...
#> [5] <link destination="https://fosstodon.org/@hadleywickham/11202130903588421 ...
#> [6] <link destination="https://nostarch.com/kill-it-fire" title="">\n <text ...
After that I can format the links and display them here using an “asis” chunk. Yes, my slidedeck uses Quarto but this blog is still powered by R Markdown, hugodown to be precise.
I’m using the formatting as an opportunity to only keep distinct links: sometimes I had very similar slides in a row, with repeated information.
format_link <- function(link) {
url <- xml2::xml_attr(link, "destination")
text <- xml2::xml_text(link)
sprintf("* [%s](%s)", text, url)
}
formatted_links <- purrr::map_chr(links, format_link)
formatted_links <- unique(formatted_links)
formatted_links |>
paste(collapse = "\n") |>
cat()
Using tinkr, XPath and sprintf()
, I was able to create a list of all the links shared in my useR! slidedeck. Some of them have no text, meaning that the URL is used as text for the link; or text that only makes sense in the context of the paragraph they were a part of; others are slightly more informative; but at least none of them is a “click here” link.