Since 2012, LLVM has relied on its self-hosted Phabricator instanceon Google Cloud Platform for code review, but now it's making atransition to GitHub pull requests. In this post, I'll share myperspective on this switch, highlighting GitHub offers significantbenefits in some areas while having major drawbacks in the reviewprocess.
I may update this article as the process stabilizes further.
The move to GitHub pull requests has been a topic of discussion overthe past few years. Several lengthy threads on the subject haveemerged:
This transition could very well be the most contentiousinfrastructure change in LLVM's history. If you have the patience todelve into the discussions within the mentioned threads, you'll comeacross numerousremarks critiquingGitHubpull requests as being subpar. I'd like to express my gratitude toJoerg Sonnenberger for sharing a post by Gregory Szorc titled Problemswith Pull Requests and How to Fix Them, which made a great analysiscomparing several code review tools.
Nevertheless, a decision has been made, and the targeted transitiondate was set for September 1, 2023. The negotiation during thedecision-making process could have been handled more strategically to mitigatesome confusion.
On September 1, 2023 (or 2nd on some parts of the world), the pull requestlockdown was completely removed.
In general, I believe that the majority of contributors considerGitHub to offer better accessibility. However, I have heard quite a fewopinions suggesting that GitHub's code review capability issignificantly worse than that of Phabricator. I will delve into thistopic further later in this post.
Having contributed patches to more than 200 projects, many of whichare one-off and even trivial, I genuinely appreciate it when a projectuses GitHub pull requests. This is because I am already familiar withthe system. On the other hand, if a project relies on a self-hosted codereview website, I might find it less convenient as I'm not particularlykeen on registering a username on a website I may never visit again.Even worse, I might need to invest time in getting acquainted with thesystem if I haven't encountered similar instances before.
The same argument applies to LLVM's self-hosted Phabricator instance.Many contributors have not used Phabricator before and would considerboth the website and the command-line tool (https://github.com/phacility/arcanist), to bechallenging to use. In general, I believe that GitHub is morecontributor-friendly but perhaps not as reviewer-friendly. Within theLLVM community, reviewer resources are extremely limited. Consequently,being reviewer-friendly is likely of critical importance.
GitHub provides GitHub Apps and GitHub Actions to extend itsfunctionality. With these, we can automate pull request labelling,testing, code analysis, code coverage, and potentially even a mergequeue in the future. Some fantastic tooling is available for freeand widely used by other open source projects, making it easy to emulatehow other projects have set up automation.
Phabricator can also handle automation, but there are far fewerresources available for it. LLVM's self-hosted Phabricator instance, forinstance, relies on https://github.com/google/llvm-premerge-checks andBuildkite.
The llvm-project repository is vast. With a code frequency of 100+commits every day, it's practically impossible for anyone to monitorevery new commit. Nonetheless, many people wish to stay informed aboutchanges to specific components, making patch subscription essential.
One way to achieve this is through mailing lists, such as llvm-commits.This list contains emails about new pull requests, edits, GitHubactions, labelling, resolved issues, and more, making it quitenoisy.
The other method is to utilize the code review tool, formerlyPhabricator and now GitHub. With Phabricator, users can set up fairlycomplex subscription rules known as Herald. When a patch title,description, affected files, or the acting user matches certaincriteria, you can take actions like adding yourself as areviewer/subscriber or sending a one-off email.
GitHub, however, is less flexible in this regard. Individual userscan choose to watchall pull requests, but they can't do so selectively. Interestingly,users can watchissues with a specific label but not pull requests with a specificlabel.
To enable component-based subscription, the llvm organization onGitHub has created multiple pr-subscribers-*
teams, whichusers can freely join them. ( Note:A maintainer is supposed to accept every join request. Unfortunately,GitHub only displays the "pending request" information on thehttps://github.com/orgs/llvm/teams/pr-subscribers-*
pageand not on any other page. It's not reasonable to expect a maintainer toroutinely check thehttps://github.com/orgs/llvm/teams/pr-subscribers-*
pagesfor pending join requests. So if they miss the email notification thatsays would like to join "LLVM"
, the request may remainpending indefinitely. )
Then we use label-basedsubscription. A GitHub action is set up to label every pullrequest, and these labels are used to notifypr-subscribers-*
teams. For example, a pull requestaffecting clang/lib/Driver/XX
will receive the labelsclang
and clang:driver
, and thepr-subscribers-clang
andpr-subscribers-clang:driver
teams will be notified.
On https://github.com/notifications, the "Team mentioned"tab lists these patches.
.github/CODEOWNERS
Previously, weadded pr-subscribers-*
teams to.github/CODEOWNERS
. Due to GitHub's CODEOWNERSmechanism, the pr-subscribers-clang
team would be added asa reviewer when a pull request affecting clang/xx
wascreated. pr-subscribers-clang
members would receive anemail notification about the pull request with a full diff.
However, a complication arose when a member of thepr-subscribers-*
team approved a change. It resulted in amessage saying$user approved these changes on behalf of llvm/pr-subscribers-xx
,which could be misleading if the user did not wish to assume suchauthority. In addition, the team was automatically removoed as a teamreviewer, adding to the confusion. This use case wasn't in line withGitHub's intended functionality, and there was a risk that GitHub'sfuture changes might disrupt our workflow.
Filtering is another crucial aspect of managing notifications. GitHubsupports numerousnotification reasons.
I have joined several pr-subscribers-*
teams that I aminterested in as replacements for my Herald rules on Phabricator. I amstill in the process to make a curated list of Gmail filters. Here aresome filters I currently use:
from:(notifications@github.com) to:(mention@noreply.github.com)
from:(notifications@github.com) to:(team_mention@noreply.github.com)
to:(review_requested@noreply.github.com)
to:(push@noreply.github.com)
from:(notifications@github.com) subject:(llvm-project "Issue #")
from:(notifications@github.com) (-cc:(review_requested@noreply.github.com OR mention@noreply.github.com OR team_mention@noreply.github.com) "this pull request" KEYWORDS_I_WANT_TO_IGNORE)
However, it's worth noting that these notifications also appear on https://github.com/notifications. I haven't thought hardabout the issue of receiving double notifications. Perhaps https://github.com/notifications notifications can beused as a reminders and mute them in batch.
Another option is to switch to a pull-based workflow if LLVM providesa public-inboxinstance. https://github.com/pulls provides a dashboard where onecan list review requests.
Code review is the top reason we pick a code review tool. Let'sassess how GitHub pull requests fare in addressing this challenge.
In Phabricator, the body a differential (Phabricator's term for apatch) contains is a patch file. The patch file is based on a specificcommit, but Phabricator is not required to know the base commit. Astable identifier,Differential Revision: https://reviews.llvm.org/Dxxxxx
, inthe commit message connects a patch file to a differential. When youamend a patch, Phabricator recognizes that the differential has evolvedfrom patch X to patch Y. The user interface allows for comparisonsbetween any two revisions associated with a differential. Additionally,review comments are confidently associated with the source line.
On the other hand, GitHub structures the concept of pull requestsaround branches and enforces a branch-centric workflow. A pull requestcenters on the difference (commits) between the base branch and thefeature branch. GitHub does not employ a stable identifier for committracking. If commits are rebased, reordered, or combined, GitHub caneasily become confused.
When you force-push a branch after a rebase, the user interface displaysa line such as "force-pushed the BB branch from X to Y". Clickingthe "compare" button in GitHub presents something likegit diff X..Y
, which includes unrelated commits. Ideally,GitHub would show the difference between the two patch files, asPhabricator does, but it only displays the difference between the twohead commits. These unrelated in-between commits might be acceptable forprojects with lower commit frequency but can be challenging for aproject with a code frequency of 100+ commits every day.
The fidelity of preserving inline comments after a force push hasalways been a weakness. The comments may be presented as "outdated". Inthe past, there was a notorious "lost inline comment" problem. Nowadays,the situation has improved, but some users still report that inlinecomments may occasionally become misplaced.
Due to the difficulties in comparing revisions and the lack ofconfidence in preserving inline comments, some recommendations suggestadopting less flexibleand less preferred workflows, which involve only appending newcommits and discouraging rebases. This approach sometimes results in acluttered commit history, with commit messages like "fix typo," "addtest," and "fix ci".
In a large repository, avoiding pulling upstream commits may not berealistic due to other commits frequently modifying nearby lines. Somepeople use a remote branch to save their work. Having to worry aboutwhether a rebase could cause spam makes the branch more difficult touse. When working with both the latest main branch and the pull requestbranch, switching between branches results in numerous rebuilds.Rebasing follow-up commits could lead to merge conflicts and more pain.In addition, a popular convention among many LLVM contributors is tocommit tests before landing the functional change, which also mandatesforce-push. (The GitHub UI will show a merge conflict.) Sometimes, onlythrough rebasing can one notice that the patch needs adjustments to morecode or tests due to its interaction with another landed patch:
There are two workflows you can do. One is the merge-based workflowrecommended by https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork:
1 | git switch pr1 |
You can invoke git log --first-parent
to list the fixupcommits.
Personally I prefer the rebase-based alternative: perform a rebaseand force push, then make a functional change and either append a newcommit or amend the old commit.
When the functional change is made, leave a comment so that reviewerscan locate the useful "compare" button and disregard the "compare"button associated with the rebase push.
GitHub's repository setting allows three options for pull requests:"Allow merge commits", "Allow squash merging", and "Allow rebasemerging".
The default effect is quite impactful.
In 2022, GitHub finally introducedan option called "Pull request title and description" for the "Allowsquash merging" option. This new option mitigates some problems.
If you land a patch in a manual way, it is easy for the pull requestto end up the "Closed" status (red), even if the commit has the#xxxxx
identifier. I am not sure whether this status mightdiscourage people from manually landing patches, which I believe shouldbe fully supported and not discouraged. [FeatureRequest] Mark PR as Merged from Pull Request API
I am not well-versed in reviewing patch series on GitHub, but this isa widely acknowledged pain point. Numerous projects are exploring waysto alleviate this issue.
In Phabricator, since a differential consists of a patch file anddoesn't have a fixed base branch, we can freely reorder twodifferentials.
The issue Write a guidefor doing "Stacked Reviews" with GitHub Pull Requests tracks thedocumentation requirement.
User branches to the llvm-project
repository are allowedsince 2023-11-01 to enable stacked pull requests. After a month it wasreportedthat some user branches were misused.
Pros
git fetch origin pull/xx/head:local_branch
. Phabricatorallows arc patch Dxxxxx
and the less-known method:curl -L 'https://reviews.llvm.org/Dxxxxx?download=1' | patch -p1
@mention
applies to every user on GitHub, which isconvenient when you need to seek for a user's opinions. You cannotexpect every user to have an account on a self-hosted instance.gh
is more powerful than arcanist
.Cons
Note: the button below the "Files changed" tab allows switchingbetween unified and split diff views. Users who are used toPhabricator's side-by-side diff view may want to adjust thissetting.
I've noticed that my review productivity has decreased, a sentimentshared by many others. It's disheartening that alternativecode review solutions haven't been thoroughly considered. However,we must move forward and adapt to the new workflow while striving tomitigate the loss in productivity.
I hope that future iterations of GitHub will incorporate some ideasfrom PullRequest feature requests for GitHub #56635.
I've voiced numerous concerns regarding GitHub pull requests, and forthat, I apologize. It's essential to acknowledge that GitHub contributessignificantly to open source projects in many positive ways. Myintention in sharing these concerns is to express a genuine hope thatGitHub pull requests can be enhanced to better support largeprojects.
I would also like to express my gratitude to Phorge, a community-driven fork ofPhabricator, for their dedication and contributions, even as LLVMdecided to migrate to the free but proprietary solution provided byGitHub. Phorge's commitment to providing alternatives and nurturing anopen-source community for organizations that favor self-hosted solutionsis truly commendable.
The thread "How’s it going with pull requests?" has some niceanalysis.
I have enabled a browser extension RefinedGitHub and adopted octo.nvim.
I am now trying spr: 1
2
3
4
5githubRemoteName = origin
githubRepository = llvm/llvm-project
githubMasterBranch = main
branchPrefix = users/MaskRay/spr/
requireTestPlan = false
On September 5, 2023, I added a red banner to reviews.llvm.org todiscourage new patches.
Transitioning existing differentials to GitHub pull requests couldpotentially cause disruption. Resurrectingold patches is a practice that people regularly engage in. As perthe current schedule, new differentials will be disallowed on October 1,2023.
It is anticipated that at some point next year reviews.llvm.orgwill become a read-only website. To preserve /Dxxxxx
pages (referenced by many commits), we can utilize phab-archivefrom Mercurial.
As activities on Phabricator wind down, maintenance should becomemore lightweight.
In the past two weeks, there have been different IP addressescrawling /source/llvm-github/
pages. It looks very like abotnet as adjacent files are visited by IP addresses from very differentautonomous systems. Such a visit will cause Phabricator to spawn aprocess likegit log --skip=0 -n 30 --pretty=format:%H:%P 988a16af929ece9453622ea256911cdfdf079d47 -- llvm/lib/Demangle/ItaniumDemangle.cpp
that takes a few seconds (llvm-project is huge). I have redirected somepages to https://github.com/llvm/llvm-project/.
On September 8, 2023, I resized the database disk to 850GB to fixa full disk issue. Hopefully, we won't need to resize the diskagain!
On September 18, 2023, I noticed an aggressive spider that made thewebsite slow. I blocked the user agents for some aggresstivespiders.
On October 19, 2023, we received an attack with malicious using fakebut strange user agents. I blocked two related IP ranges and updated/etc/php/fpm/pool.d/www.conf
(seemed un-configured before)to reserve more processes.
On November 23, 2023, SendGrid used by the website stoppedsending email notifications. This was related to unidentifiedspammers.
I am adapting Mercurial's phab-archive project to mirrorreviews.llvm.org/Dxxxx
pages. I just need to make a fewadjustments (python,js,css,html).
The "History" button functions properly, although there isn't a"Compare" button.