As part of our multilingual publishing project, and with funding from the R Consortium, we’ve worked on the R package babeldown for translating Markdown-based content using the DeepL API. In this tech note, we’ll show how you can use babeldown to translate a Hugo blog post!
Translating a Markdown blog post from your R console is not only more comfortable (when you’ve already written said blog post in R), but also less frustrating. With babeldown, compared to copy-pasting the content of a blog post into some translation service, the Markdown syntax won’t be broken1, and code chunks won’t be translated. This works, because under the hood, babeldown uses tinkr to produce XML which it then sends to the DeepL API, flagging some tags as not to be translated. It then converts the XML translated by DeepL back into Markdown again.
Now, as you might expect this machine-translated content isn’t perfect yet! You will still need a human or two to review and amend the translation. Why not have the humans translate the post from scratch then? We have observed that editing an automatic translation is faster than translating the whole post, and that it frees up mental space for focusing on implementing translation rules such as gender-neutral phrasing.
babeldown::deepl_translate_hugo()
assumes the Hugo website uses
content/path-to-leaf-bundle/index.md
);content/path-to-leaf-bundle/index.es.md
.babeldown could be extended work with other Hugo multilingual setups. If you’d be interested in using babeldown with a different setup, please open an issue in the babeldown repository!
Note that babeldown won’t be able to determine the default language of your website2 so even if your website’s default language is English, babeldown will place an English translation in a file called “.en.md” not “.md”. Hugo will recognize the new file all the same (at least in our setup).
First check that your desired source and target languages are supported by the DeepL API!
Look up the docs of the source_lang
and target_lang
API parameters for a full list.
Once you know you’ll be able to take advantage of the DeepL API, you’ll need to create an account for DeepL’s translation service API. Note that even getting a free account requires registering a payment method with them.
You’ll need to install babeldown from rOpenSci R-universe:
install.packages('babeldown', repos = c('https://ropensci.r-universe.dev', 'https://cloud.r-project.org'))
Then, in each R session, set your DeepL API key via the environment variable DEEPL_API_KEY. You could store it once and for all with the keyring package and retrieve it in your scripts like so:
Sys.setenv(DEEPL_API_KEY = keyring::key_get("deepl"))
Lastly, the DeepL API URL depends on your API plan. babeldown uses the DeepL free API URL by default. If you use a Pro plan, set the API URL in each R session/script via
Sys.setenv("DEEPL_API_URL" = "https://api.deepl.com")
You could run the code below
babeldown::deepl_translate_hugo( post_path = <path-to-post>, source_lang = "EN", target_lang = "ES", formality = "less" # that's how we roll here! )
but we’d recommend a tad more work for your own good.
If you use version control, having the translation as a diff is very handy!
index.es.md
) under the target blog post name (index.en.md
) and commit it, then push.babeldown::deepl_translate_hugo()
with force = TRUE
.Now let’s go over this again, but with a coding workflow. Here, we’ll use fs and gert (but you do you!), and we’ll assume your current directory is the root of the website folder, and also the root of the git repository.
index.es.md
) under the target blog post name (index.en.md
) and commit it, then push.fs::file_copy( file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.es.md"), file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.en.md") ) gert::git_add(file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.en.md")) gert::git_commit("Add translation placeholder") gert::git_push()
gert::git_branch_create("translation-tech-note")
babeldown::deepl_translate_hugo()
with force = TRUE
.babeldown::deepl_translate_hugo( post_path = file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.es.md"), force = TRUE, yaml_fields = c("title", "description", "tags"), source_lang = "ES", target_lang = "EN-US" )
You can also omit the post_path
argument if you’re running the code from RStudio IDE and if the open and focused file (the one you see above your console) is the post to be translated.
babeldown::deepl_translate_hugo( force = TRUE, yaml_fields = c("title", "description", "tags"), source_lang = "ES", target_lang = "EN-US" )
gert::git_add(file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.en.md")) gert::git_commit("Add translation") gert::git_push()
Open a PR from the “translation-tech-note” branch to the “new-post” branch.
The only difference between the two branches is the automatic translation of "content/blog/2023-10-01-r-universe-interviews/index.en.md"
.
The human translators can then a open a second PR to the translation branch with their edits! Or they can add their edits as PR suggestions.
In the end there should be two to three branches:
The PR are merged in this order:
Yanina tweaked the automatic translation by suggesting changes on the PR, then accepting them.
By default babeldown translates the YAML fields “title” and “description”.
If you have text in more of them, use the yaml_fields
argument of babeldown::deepl_translate_hugo()
.
Note that if babeldown translates the title, it updates the slug.
Imagine you have a few preferences for some words – something you’ll build up over time.
readr::read_csv( system.file("example-es-en.csv", package = "babeldown"), show_col_types = FALSE ) ## # A tibble: 2 × 2 ## Spanish English ## <chr> <chr> ## 1 paquete package ## 2 repositorio repository
You can record these preferred translations in a glossary in your DeepL account
deepl_upsert_glossary( <path-to-csv-file>, glossary_name = "rstats-glosario", target_lang = "Spanish", source_lang = "English" )
You’d use the exact same code to update the glossary hence the name “upsert” for the function. You need one glossary per source language / target language pair: the English-Spanish glossary can’t be used for Spanish to English for instance.
In your babeldown::deepl_translate_hugo()
call you then use the glossary name (here “rstats-glosario”) for the glossary
argument.
deepl_translate_hugo()
has a formality
argument.
Now, the DeepL API only supports this for some languages as explained in the documentation of the formality
API parameter:
Sets whether the translated text should lean towards formal or informal language. This feature currently only works for target languages DE (German), FR (French), IT (Italian), ES (Spanish), NL (Dutch), PL (Polish), PT-BR and PT-PT (Portuguese), JA (Japanese), and RU (Russian). (…) Setting this parameter with a target language that does not support formality will fail, unless one of the prefer_… options are used.
Therefore to be sure a translation will work, instead of writing formality = "less"
you can write formality = "prefer_less"
which will only use formality if available.
In this post we explained how to translate a Hugo blog post using babeldown.
Although the gist is to use one call to babeldown::deepl_translate_hugo()
,
babeldown has functions for translating Quarto book chapters, any Markdown file, and any Markdown string, with similar arguments and recommended usage, so explore its reference!
We’d be happy to hear about your use cases.
But you should refer to tinkr docs to see what might change in the Markdown syntax style. ︎
adding code to handle Hugo’s “bewildering array of possible config locations” and two possible formats (YAML and TOML) is out of scope for babeldown at this point. ︎