IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Use an LLM to translate help documentation on-the-fly

    Stephen Turner发表于 2024-12-16 13:43:00
    love 0
    [This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    Reposted from Paired Ends at https://blog.stephenturner.us/p/llm-translate-documentation.

    —

    The lang package overrides the ? and help() functions in your R session. The translated help page will appear in the help pane in RStudio or Positron. It can also translate your Roxygen documentation.

    —

    Using LLMs in R

    Most of the developer tooling for AI/LLM training and evaluation is Python-centric, but just over the past few months we’ve seen a surge of new tooling for AI/LLM applications for the R ecosystem.

    • ollamar and rollama provide wrappers around the Ollama API allowing you to run LLMs locally on your machine. I recently wrote a few posts, one demonstrating how to use ollamar, and another demonstrating a package that uses ollama internally.

      Use R to prompt a local LLM with ollamar

      Use R to prompt a local LLM with ollamar

      Stephen Turner
      ·
      Aug 14
      Read full story
      biorecap: an R package for summarizing bioRxiv preprints with a local LLM

      biorecap: an R package for summarizing bioRxiv preprints with a local LLM

      Stephen Turner
      ·
      Aug 24
      Read full story
    • elmer is a new package in the tidyverse that allows you to interact with many different LLMs via R (Claude, ChatGPT, Gemini, and Ollama too.

    • mall is an interesting one that provides an easy way to run multiple LLM predictions against a data frame (sentiment analysis, summarization, classification, extraction, translation, etc).

    • pal provides LLM assistants for doing things like highlighting code and asking for things like roxygen documentation, testthat tests, etc.

    • Shiny Assistant helps you explain how things work in Shiny, and can help build Shiny apps for you (in either R or Python).

    The lang package

    The lang package (source, documentation) is an interesting new addition to the mlverse in R. From the documentation:

    lang overrides the ? and help() functions in your R session. If you are using RStudio or Positron, the translated help page will appear in the usual help pane.

    If you are a package developer, lang helps you translate your documentation, and to include it as part of your package. lang will use the same ? override to display your translated help documents.

    Let’s look at an example. I recently invited my colleague and co-author VP Nagraj to write about the rplanes package we published and released on CRAN for plausibility analysis in epidemiological forecasting.

    PLANES: Plausibility Analysis of Epidemiological Signals

    PLANES: Plausibility Analysis of Epidemiological Signals

    Stephen Turner
    ·
    Sep 3
    Read full story

    One of the first functions you might use from this package is read_forecast(), which reads a probabilistic quantile forecast CSV file for downstream plausibility analysis. Let’s look at the help for this function.

    library(rplanes)
    ?read_forecast

    En Español

    Now let’s get help in Spanish.1 load the lang package and tell it that we’re using llama3.2.2 We’ll set the system language to Spanish, then ask for help again.

    Sys.setenv(LANGUAGE="spanish")
    ?read_forecast

    My fluency in Spanish is limited to general conversation and travel needs so I can’t easily verify the accuracy of the translation of this technical language, but when I ran some of this back through Google Translate it seemed to be mostly faithful. Notice how things that shouldn’t be translated aren’t — function names, arguments, columns in the returned output, code in the examples.

    हिंदी में … … باللغة العربية

    What about non-Western languages?

    Let’s try Hindi!

    Sys.setenv(LANGUAGE="hindi")
    ?read_forecast

    I can’t verify the accuracy of this translation beyond running some of the text back through Google Translate, but in doing so at first glance the translation isn’t bad.

    What about Arabic?

    Sys.setenv(LANGUAGE="arabic")
    ?read_forecast

    If you’re a native speaker of any of these, I’d love to know what you think. Chat with me on Bluesky (@stephenturner.us).

    Translating your package’s Roxygen docs

    The lang documentation has a great section on using lang as a package developer. You can translate all of your Roxygen documentation into the desired language, then edit those translations by hand as needed. Then a special helper function re-roxygenizes your docs placing them in a special inst/man-lang folder. The lang docs explain how this all works, but once you do this, when a user has the lang package loaded, they’ll get your pre-computed and optionally edited translations instead of having to wait around for the LLM to translate the help.

    Demo

    Here’s a demo using a very small package I wrote for something completely different. Don’t worry about all the Docker stuff described here. There’s one single function, missyelliot(), that simply reverse complements a DNA sequence (“take that flip it and reverse it”). That is, it’ll convert GATTACA to it’s reverse complement TGTAATC.

    Restart your R environment, and install the package using devtools/remotes. Load both rpdd and lang.

    devtools::install_github("stephenturner/rpdd")
    library(rpdd)
    library(lang)

    Now get some help for missyelliot(). If your language environment variable is English, you’ll get the English help.

    Now, change your system language to Spanish, and try it again. Notice how the translated help is instantaneous — you’re relying on the pre-translated and possibly hand-edited translations that come with the package rather than asking an LLM to translate the help for you on the fly.

    Sys.setenv(LANGUAGE = "spanish")
    ?missyelliot

    If your language is set to something without a pre-populated translation, you’ll have to register a model through Ollama and translate in real time.

    Sys.setenv(LANGUAGE = "russian")
    ?missyelliot
    I think this might be one of the most impactful applications of LLMs inside a developer environment since the rise and rapid adoption of Copilots. The ability to instantly access documentation in multiple languages through lang represents a significant step forward in making data science more accessible and inclusive for the global R community, breaking down language barriers that have historically made it challenging for non-English speakers to fully engage with R’s rich ecosystem of tools and packages. 




    Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
    To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Use an LLM to translate help documentation on-the-fly


沪ICP备19023445号-2号
友情链接