Website League Stewardship - Technology Working Group

Sun 13 Oct 2024 1:36AM

Defining the Git workflow for maintaining patchsets of forked software

ruby Public Seen by 16

Introduction

We've got a couple forks of ActivityPub software (notable, GoToSocial and Akkoma) that we're making modifications to. These modifications are intended to bring the software in line with the League vision, in places where that functionality exists. As a working group, we'll have a few contributors to these forks, so agreement on how we develop and maintain these forks is important. We also want to contribute our changes back to upstream where possible for a few reasons - we want our changes to be as accessible as possible to anyone that may want to use them, within or outside the League, and we also want to keep our changes as close to upstream as possible to reduce the maintenance burden on our end.

Questions

To achieve this, I think we should define and agree upon a set of procedures for maintaining our forks. In this thread, I'd like to hear input from the Tech WG on matters relating to what our process should look like for things such as:

What standards do we hold our changes to?
- How do we review each others' work?
How do we structure/organize commits within our patchset?
- What merge strategies do we use when merging pull requests?
What is our relationship to upstream?
- What versions do we base on?
- How often do we update?
- How do we contribute our changes back to upstream?

Goals

Out of this discussion, we should be aiming to have a well-defined process describing the management and structure of our software forks, as well as a defined relationship to their upstream maintainers. This process should be synthesized from discussions that we have amongst ourselves, and a final shape should be agreed upon through our consensus processes.

ruby Sun 13 Oct 2024 1:39AM

Unless anyone objects, I'm happy to act as a facilitator for this discussion, and draft up a proposal for consensus based on the discussions that we have here. If anyone else wants to take up that role, feel free to nominate yourself as well.

Tenna Sun 13 Oct 2024 1:54AM

I think, if anything, there should be at least one other pair of eyes looking at changes before they get merged.

As far as putting together proposals and whatnot, I've definitely got no qualms with you being the facilitator/drafting a proposal ^w^

kouhai Sun 13 Oct 2024 4:40AM

How do we structure/organize commits within our patchset?
What merge strategies do we use when merging pull requests?

i moved treehouse from merge to rebase (via a painful manual process) because the merge was gnarly

What is our relationship to upstream?
What versions do we base on?
How often do we update?
How do we contribute our changes back to upstream?

the most successful mastodon fork, glitch, is manually updated against mainline with very high frequency.

in my experience as a glitch downstream, there's:
- changes that affect the files that you're patching
- free rebases
- security updates

if treehouse were directly branched off of vanilla, i'd honestly consider "any tagged version" rebase policy, with an expectation of "bite the bullet asap" or potentially even "intermediate rebases"

as for versioning: definitely do something like gts v0.12.3-rc4.wl.5. from painful experience, your version number needs to strictly comply with semver. unfortunately, people parse your version numbers using semver-validating libraries. you will get breakage reports otherwise.

ruby Sun 13 Oct 2024 4:52AM

@kouhai Agreed on the rebase point - merge commits get real hairy real fast when we're regularly reapplying ontop of upstream.

On the versioning end, who do you anticipate to be parsing our version numbers? I don't have much of an objection to complying with semver, but I'm wondering what issues not going with semver might cause.

kouhai Sun 13 Oct 2024 5:12AM

@srxl clients will parse version numbers for… reasons. also, other instances, maybe.

and this will cause hard-break in at least one (misbehaving) client. there's a treehouse bug somewhere for this.

more practically, we want our version number to be distinct from upstream's, which means we ideally have to consistently use a prerelease label, and unfortunately there's not really a great way to handle downstream distributions if you don't have access to the prerelease fields

so maybe it ends up being v0.12.3+wl.v5.abcdef0123456789deadbeef

this is very bikeshed, though

kouhai Mon 14 Oct 2024 1:59AM

http://softwareswirl.blogspot.com/2013/05/git-imerge-practical-introduction.html

Discovered this today

ruby Tue 15 Oct 2024 10:54PM

When it comes to the structure of the patchset, the overall vision I have is "1 commit = 1 PR to upstream". My take is that this makes upstreaming changes really easy, since we just need to cherry-pick one commit onto upstream HEAD, resolve any conflicts there once, and that's a PR ready to go. This would necessarily inform a few things on the workflow:

Merging PRs is always done with a squash and fast-forward. No merge commits - every PR is one commit added right on top of our patched stable.
- Commits should ideally avoid depending on each other - sometimes this is unavoidable though. Some way to track which commits depend on others might be useful, so we can tell if something is ready to upstream. Taking suggestions here
Each PR should be one upstreamable feature. This encourages us to ensure our patchsets are well tested, well written, and in a condition where they'd likely be good to go upstream - all things we want in our software
Bumping to a new stable release should just be a matter of git checkout patch-stable; git rebase upstream/tagged-version, and resolving any conflicts there. Maybe that's a job for a nominated release manager or something?

viviridian Tue 15 Oct 2024 10:57PM

@ruby sounds fine to me

WholeWheatBagels Tue 15 Oct 2024 11:37PM

@ruby this mostly makes sense; this is roughly what we've been doing anyway, at least for patch: commits. I've been labelling commits that aren't meant to be upstreamed as ci: for all the changes to get builds / related working.

WRT. one-commit-one-change: I think if we have larger patches in the future, fitting it all into one commit might be too much (but maybe thats a sign it should be split up anyway. :shrug:).

Definitely agree with no merge commits. That makes rebases easier and the end history cleaner.

I'd be fine with taking the release manager position and wrangling out rebasing stuff.

WholeWheatBagels Tue 15 Oct 2024 11:55PM

Our patch versions have so far been UPSTREAM-wl# with our RCs as `UPSTREAM-wl#rc#.

The -wl# suffix does bump versions as we expect but -wl#rc# sorts above non-rc. If we add a tilda (-wl#~rc#), that sorts correctly as tilda suffix sorts lower than no suffix (following systemd's version format spec and DEB and RPM package specs). This isn't strictly semver-compatible, but semver doesn't account for multiple suffixes in the version string like this.

Defining the Git workflow for maintaining patchsets of forked software

Introduction

Questions

Goals

ruby · Sun 13 Oct 2024 1:39AM

Tenna · Sun 13 Oct 2024 1:54AM

kouhai · Sun 13 Oct 2024 4:40AM

ruby · Sun 13 Oct 2024 4:52AM

kouhai · Sun 13 Oct 2024 5:12AM

kouhai · Mon 14 Oct 2024 1:59AM

ruby · Tue 15 Oct 2024 10:54PM

viviridian · Tue 15 Oct 2024 10:57PM

WholeWheatBagels · Tue 15 Oct 2024 11:37PM

WholeWheatBagels · Tue 15 Oct 2024 11:55PM