IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Multi-language pipelines with rixpress

    Econometrics and Free Software发表于 2025-05-13 00:00:00
    love 0
    [This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    If you want to watch a 2-Minute video introduction to {rixpress}, click the image below:

    Video Thumbnail

    In August last year I tried to see how one could use Nix as a built automation tool for data science pipelines, and in March this year, I’ve started working on an R package that would make setting up such pipelines easy, which I already discussed in my previous post.

    After some weeks of work, I think that {rixpress} is at stage where it can already be quite useful to a lot of people. {rixpress} helps you set up your projects as a pipeline of completely reproducible steps. {rixpress} is a sister package to {rix} and together they make true computational reproducibility easier to achieve. {rix} makes it easy to capture and rebuild the exact computational environment in which the code was executed, and {rixpress} helps you move away from script-based workflows that can be difficult to execute and may require manual intervention.

    When I first introduced {rixpress}, it was essentially a proof of concept. It could manage some basic R and Python interplay, but it was clearly in its early stages. I’ve since then added some features that I think really show why using Nix as the underlying build engine is a good idea.

    Just like for its sister package {rix}, I’ve taken the step to submit {rixpress} for peer review by rOpenSci. {rix} really benefitted from rOpenSci’s peer review and I believe that it’ll be the same for {rixpress}.

    Current Capabilities of {rixpress}

    Here are the features currently available in {rixpress}:

    • A key motivation was to simplify building pipelines where different steps might require different language environments. With {rixpress}, this is a central feature:

    • Define steps in R (rxp_r(), rxp_r_file()) or Python (rxp_py(), rxp_py_file()).

    • Importantly, each step can be configured to run in its own Nix-defined environment (for example, use nix_env = "my-python-env.nix" for a Python step, or nix_env = "my-r-env.nix" for an R step). These environments can be generated using my other package, {rix}.

    • Pass data between R and Python steps. {rixpress} manages the serialization, using reticulate by default for R/Python object conversion, and also allows custom functions for other formats like JSON or model-specific files.

    • Build Quarto (or R Markdown) documents using rxp_quarto() (and rxp_rmd()). These documents can access any artifact (rxp_read("my_artifact")) from preceding steps, regardless of the language used to generate it. Quarto rendering can also occur within its own dedicated Nix environment.

    • Every step in a {rixpress} pipeline is treated as a Nix derivation. This means hermetic builds, sandboxed execution, and content-addressable caching, leading to a high degree of reproducibility (as expected with Nix).

    • As pipelines grow, visualization is helpful. rxp_ggdag() (using {ggdag}) and rxp_visnetwork() (using {visNetwork}) provide a visual overview of dependencies. dag_for_ci() exports the DAG as an {igraph} dot file format, which can then be used for text-based visualisation on CI.

    • For CI, rxp_ga() can generate a GitHub Actions workflow to run the pipeline on each push. This workflow includes caching of Nix store paths between runs (using export_nix_archive() and import_nix_archive()) to avoid unnecessary rebuilds.

    • There is ample documentation, and even a vignette detailling how to use {cmdstanr} within a {rixpress} pipeline. {cmdstanr} works in a specific way, by compiling Stan models to C++, and so this requires careful management of Stan model compilation and sampling within the Nix sandbox, demonstrating that complex tools can be integrated.

    • It is possible to retrieve outputs from previous pipeline executions. {rixpress} maintains timestamped build logs. Functions like rxp_list_logs(), rxp_inspect(which_log = "..."), and rxp_read("derivation_name", which_log = "...") allow you to access the history of your pipeline’s execution and retrieve specific artifacts.

    An Invitation for Feedback

    Considerable effort has gone into making {rixpress} robust and useful. A collection of examples is available at the rixpress_demos GitHub repository to illustrate various use cases (R-only, Python-only, R/Python, Quarto, {cmdstanr}, and an XGBoost example).

    I’m now looking for feedback from users: * I encourage you to try it out. I recommend watching this tutorial video to get started quickly. * Install it, explore the examples, and perhaps apply it to one of your projects. * Any observations on what works well, what might be confusing, or any issues encountered would be helpful. * Your feedback would be very valuable. Please feel free to open an issue on the {rixpress} GitHub repository with bug reports, feature suggestions, or questions.

    Why use {rixpress} instead of {targets}?

    {targets} is a fantastic package, and the main source of inspiration of {rixpress}. If you have no need for multilanguage pipelines, then running {targets} inside of a Nix environment, as described here is perfectly valid. But I think that {rixpress} has its place if:

    • you need to use multiple languages, as you don’t need adapt Python code to work with {reticulate},
    • you’re already convinced by Nix and use {rix},
    • want to use a simple pipeline-tool, with a smaller scope.
    To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Multi-language pipelines with rixpress


沪ICP备19023445号-2号
友情链接