IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Git Tricks for Working with Large Repositories

    rOpenSci - open tools for open science发表于 2024-08-06 00:00:00
    love 0
    [This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    Recently Yanina Bellini Saibene reminded us to update our Slack profile:

    Friendly reminder: Let’s increase the value of our rOpenSci Slack community. Please add details to your profile, e.g., your photo, your favorite social media handle, what you do, your pronouns, and how to pronounce your name.

    After doing that I went on to updating my profile photos on the rOpenSci website, which ended up teaching me a few git tricks I would like to share here. Thanks Maëlle Salmon for the encouragement, and Steffi LaZerte for reviewing this post.

    Cloning as usual

    When I tried to clone the source code of rOpenSci’s website I realized the repo was large and it would take me several minutes.

    git clone https://github.com/ropensci/roweb3.git
    

    I decided to stop the process and researched how to just pull the latest version of the specific files I needed.

    Pulling the latest version of specific files

    First I forked the rOpenSci website repository (roweb3). I used the gh CLI from the terminal, but also I could have forked it manually from Github.

    # if not using `gh`, fork ropensci/roweb3 from GitHub
    gh repo fork ropensci/roweb3
    

    Then I created a local empty roweb3 directory and linked it to the fork.

    git init roweb3
    cd roweb3
    git remote add origin git@github.com:maurolepore/roweb3.git
    

    Now for the tricks! I avoided having to download the whole repository by first finding the specific files I needed on GitHub’s “Go to file” box, then:

    • Trick 1: Configured a sparse checkout matching just those files.
    git config core.sparseCheckout true
    echo "themes/ropensci/static/img/team/mauro*" >> .git/info/sparse-checkout
    
    • Trick 2: Pulled with --depth 1 to get only the latest version of those files.
    git pull --depth=1 origin main
    

    I explored the result with tree and it was just what I needed to modify:

    tree
    .
    └── themes
     └── ropensci
     └── static
     └── img
     └── team
     ├── mauro-lepore.jpg
     └── mauro-lepore-mentor.jpg
    

    But how large is it?

    While those tricks were useful, I was still curious about the size of the repo, so I did clone it all and explored disk usage with du:

    du --human-readable --max-depth=1 .
    219M ./themes
    164K ./.Rproj.user
    56K ./archetypes
    628K ./resources
    168K ./data
    376M ./.git
    20K ./static
    12K ./.github
    40K ./scripts
    161M ./content
    24K ./layouts
    475M ./public
    1.3G .
    

    Indeed this is much larger than the source code I typically handle. But now I know a few more Git tricks (and even more about blogging on rOpenSci 🙂 ).

    Conclusion

    If all you have is a hammer, everything looks like a nail. — Abraham Maslow

    Sometimes git clone is not the right tool for the job. A sparse checkout and a shallow pull can help you get just what you need.

    If you enjoy learning from videos you may search “git” on my YouTube channel or explore the playlists git, git-from-the-terminal, and git-con-la-terminal (in Spanish).

    What are your favorite Git tricks? How about blogging about them?

    To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Git Tricks for Working with Large Repositories


沪ICP备19023445号-2号
友情链接