Since 2012, LLVM has relied on its self-hosted Phabricator instance(on Google Cloud Platform) for code review, but now it's making atransition to GitHub pull requests. In this post, I'll share myperspective on this switch, highlighting GitHub offers significantbenefits in some areas while having major drawbacks in the reviewprocess.
I may update this article as the process stabilizes further.
The move to GitHub pull requests has been a topic of discussion overthe past few years. Several lengthy threads on the subject haveemerged:
This transition could very well be the most contentiousinfrastructure change in LLVM's history. If you have the patience todelve into the discussions within the mentioned threads, you'll comeacross numerousremarks critiquingGitHubpull requests as being subpar. I'd like to express my gratitude toJoerg Sonnenberger for sharing a post by Gregory Szorc titled Problemswith Pull Requests and How to Fix Them, which made a great analysiscomparing several code review tools.
Nevertheless, a decision has been made, and the targeted transitiondate was set for September 1, 2023. The negotiation during thedecision-making process could have been handled more strategically to mitigatesome confusion.
On September 1, 2023 (or 2nd on some parts of the world), the pull requestlockdown was completely removed.
In general, I believe that the majority of contributors considerGitHub to offer better accessibility. However, I have heard quite a fewopinions suggesting that GitHub's code review capability issignificantly worse than that of Phabricator. I will delve into thistopic further later in this post.
Having contributed patches to more than 200 projects, many of whichare one-off and even trivial, I genuinely appreciate it when a projectuses GitHub pull requests. This is because I am already familiar withthe system. On the other hand, if a project relies on a self-hosted codereview website, I might find it less convenient as I'm not particularlykeen on registering a username on a website I may never visit again.Even worse, I might need to invest time in getting acquainted with thesystem if I haven't encountered similar instances before.
The same argument applies to LLVM's self-hosted Phabricator instance.Many contributors have not used Phabricator before and would considerboth the website and the command-line tool (https://github.com/phacility/arcanist), to bechallenging to use. In general, I believe that GitHub is morecontributor-friendly but perhaps not as reviewer-friendly.
GitHub provides GitHub Apps and GitHub Actions to extend itsfunctionality. With these, we can automate pull request labelling,testing, code analysis, code coverage, and potentially even a mergequeue in the future. Some fantastic tooling is available for freeand widely used by other open source projects, making it easy to emulatehow other projects have set up automation.
Phabricator can also handle automation, but there are far fewerresources available for it. LLVM's self-hosted Phabricator instance, forinstance, relies on https://github.com/google/llvm-premerge-checks andBuildkite.
The llvm-project repository is vast. With a code frequency of 100+commits every day, it's practically impossible for anyone to monitorevery new commit. Nonetheless, many people wish to stay informed aboutchanges to specific components, making patch subscription essential.
One way to achieve this is through mailing lists, such as llvm-commits.This list contains emails about new pull requests, edits, GitHubactions, labelling, resolved issues, and more, making it quitenoisy.
The other method is to utilize the code review tool, formerlyPhabricator and now GitHub. With Phabricator, users can set up fairlycomplex subscription rules known as Herald. When a patch title,description, affected files, or the acting user matches certaincriteria, you can take actions like adding yourself as areviewer/subscriber or sending a one-off email.
GitHub, however, is less flexible in this regard. Individual userscan choose to watchall pull requests, but they can't do so selectively. Interestingly,users can watchissues with a specific label but not pull requests with a specificlabel.
To enable component-based subscription, the llvm organization onGitHub has created multiple pr-subscribers-*
teams, whichusers can freely join them. ( Note:A maintainer is supposed to accept every join request. Unfortunately,GitHub only displays the "pending request" information on thehttps://github.com/orgs/llvm/teams/pr-subscribers-*
pageand not on any other page. It's not reasonable to expect a maintainer toroutinely check thehttps://github.com/orgs/llvm/teams/pr-subscribers-*
pagesfor pending join requests. So if they miss the email notification thatsays would like to join "LLVM"
, the request may remainpending indefinitely. )
Then we use label-basedsubscription. A GitHub action is set up to label every pullrequest, and these labels are used to notifypr-subscribers-*
teams. For example, a pull requestaffecting clang/lib/Driver/XX
will receive the labelsclang
and clang:driver
, and thepr-subscribers-clang
andpr-subscribers-clang:driver
teams will be notified.
.github/CODEOWNERS
Previously, weadded pr-subscribers-*
teams to.github/CODEOWNERS
. Due to GitHub's CODEOWNERSmechanism, the pr-subscribers-clang
team would be added asa reviewer when a pull request affecting clang/xx
wascreated. pr-subscribers-clang
members would receive anemail notification about the pull request with a full diff.
However, a complication arose when a member of thepr-subscribers-*
team approved a change. It resulted in amessage saying$user approved these changes on behalf of llvm/pr-subscribers-xx
,which could be misleading if the user did not wish to assume suchauthority. In addition, the team was automatically removoed as a teamreviewer, adding to the confusion. This use case wasn't in line withGitHub's intended functionality, and there was a risk that GitHub'sfuture changes might disrupt our workflow.
Filtering is another crucial aspect of managing notifications. GitHubsupports numerousnotification reasons.
I am still in the process to make a curated list of Gmail filters.Here are some filters I currently use:
from:(notifications@github.com) to:(mention@noreply.github.com)
from:(notifications@github.com) to:(team_mention@noreply.github.com)
to:(review_requested@noreply.github.com)
to:(push@noreply.github.com)
from:(notifications@github.com) subject:(llvm-project "Issue #")
In Phabricator, the body a differential (Phabricator's term for apatch) contains is a patch file. The patch file is based on a specificcommit, but Phabricator is not required to know the base commit. Astable identifier,Differential Revision: https://reviews.llvm.org/Dxxxxx
, inthe commit message connects a patch file to a differential. When youamend a patch, Phabricator recognizes that the differential has evolvedfrom patch X to patch Y. The user interface allows for comparisonsbetween any two revisions associated with a differential. Additionally,review comments are confidently associated with the source line.
On the other hand, GitHub structures the concept of pull requestsaround branches and enforces a branch-centric workflow. A pull requestcenters on the difference (commits) between the base branch and thefeature branch. GitHub does not employ a stable identifier for committracking. If commits are rebased, reordered, or combined, GitHub caneasily become confused.
When you force-push a branch after a rebase, the user interface displaysa line such as "force-pushed the BB branch from X to Y". Clickingthe "compare" button in GitHub presents X..Y
, whichincludes unrelated commits. Ideally, GitHub would show the differencebetween the two patch files, as Phabricator does, but it only displaysthe difference between the two head commits. These unrelated in-betweencommits might be acceptable for projects with lower commit frequency butcan be challenging for a project with a code frequency of 100+ commitsevery day.
The fidelity of preserving inline comments after a force push hasalways been a weakness. In the past, there was a notorious "lost inlinecomment" problem. Nowadays, the situation has improved, but some usersstill report that inline comments may occasionally become misplaced.
Due to the difficulties in comparing revisions and the lack ofconfidence in preserving inline comments, some recommendations suggestadopting less flexibleand less preferred workflows, which involve only appending newcommits and discouraging rebases. This approach sometimes results in acluttered commit history, with commit messages like "fix typo," "addtest," and "fix ci."
In a large repository, avoiding rebases may not be realistic due toother commits frequently modifying nearly identical lines. When workingwith both the latest main branch and the pull request branch, switchingbetween branches results in numerous rebuilds.
GitHub's repository setting allows three options for pull requests:"Allow merge commits", "Allow squash merging", and "Allow rebasemerging".
The default effect is quite impactful.
In 2022, GitHub finally introducedan option called "Pull request title and description" for the "Allowsquash merging" option. This new option mitigates some problems.
I am not well-versed in reviewing patch series on GitHub, but this isa widely acknowledged pain point. Numerous projects are exploring waysto alleviate this issue.
Pros
git fetch origin pull/xx/head:local_branch
. Phabricatorallows the less-known method:curl -L 'https://reviews.llvm.org/Dxxxxx?download=1' | patch -p1
@mention
applies to every user on GitHub, which isconvenient when you need to seek for a user's opinions. You cannotexpect every user to have an account on a self-hosted instance.gh
is more powerful than arcanist
.Cons
I've noticed that my review productivity has decreased, a sentimentshared by many others. It's disheartening that alternative solutionshaven't been thoroughly considered. However, we must move forward andadapt to the new workflow while striving to mitigate the loss inproductivity.
I hope that future iterations of GitHub will incorporate some ideasfrom PullRequest feature requests for GitHub #56635.
I've voiced numerous concerns regarding GitHub pull requests, and forthat, I apologize. It's essential to acknowledge that GitHub contributessignificantly to open source projects in many positive ways. Myintention in sharing these concerns is to express a genuine hope thatGitHub pull requests can be enhanced to better support largeprojects.
I would also like to express my gratitude to Phorge, a community-driven fork ofPhabricator, for their dedication and contributions, even as LLVMdecided to migrate to the free but proprietary solution provided byGitHub. Phorge's commitment to providing alternatives and nurturing anopen-source community for organizations that favor self-hosted solutionsis truly commendable.
On September 5, 2023, I added a red banner to reviews.llvm.org todiscourage new patches.
Transitioning existing differentials to GitHub pull requests couldpotentially cause disruption. Resurrectingold patches is a practice that people regularly engage in. As perthe current schedule, new differentials will be disallowed on October 1,2023.
It is anticipated that at some point next year reviews.llvm.orgwill become a read-only website. To preserve /Dxxxxx
pages (referenced by many commits), we can utilize phab-archivefrom Mercurial.
As activities on Phabricator wind down, maintenance should becomemore lightweight.
In the past two weeks, there have been different IP addressescrawling /source/llvm-github/
pages. It very like a botnetas adjacent files are visited by IP addresses from very differentautonomous systems. Such a visit will cause Phabricator to spawn aprocess likegit log --skip=0 -n 30 --pretty=format:%H:%P 988a16af929ece9453622ea256911cdfdf079d47 -- llvm/lib/Demangle/ItaniumDemangle.cpp
that takes a few seconds (llvm-project is huge). I have redirected somepages to https://github.com/llvm/llvm-project/.
On September 8, 2023, I resized the database disk to 850GB to fixa full disk issue. Hopefully, we won't need to resize the diskagain!