Monorepo: Squash Merges

The way we merge Pull Requests into main on the monorepo is soon going to change.

We will be disabling "merge commit" and "rebase merging" options and standardizing on "squash merging". This will improve our ability to see the meaningful units of change on the monorepo, as well as enable tooling for upcoming capabilities such as automatic release note generation per project.

As with any process change, there are trade-offs, and we understand that not everyone will be 100% happy with this change. To help build empathy for why this change is being made we'd like to dig further into those trade-offs and the various challenges we face as a growing organization all working in the same monorepo.

A screenshot of GitHub's "merge button" settings.

Let's talk about `git` commits

In a typical single-project repository, every git commit can represent a meaningful unit of change. A team can look at a series of well-crafted commits and gain insight into how a codebase is evolving, what changes were made, and why.

When git commits exist in this context a team can use tools like git bisect to replay commits one at a time and identify the specific molecule of change which produced a bug. This can be very valuable for especially tricky bugs which aren't obvious on their face.

One positive side-effect of well crafted commits like those produced by adhering to workflows like "Conventional Commits" is that a team can write tooling to scrape git log messages and auto-generate a project's CHANGELOG. The ability to auto-generate a CHANGELOG can be an immensely valuable communication tool for a team in how the broader organization can gain awareness of how its systems are evolving.

Taking all of the above into account it becomes clear that, although a commit is often a rote task undertaken to just get work done, at its core a commit is a tool for communicating with your team about your work.

Looks at these nice commits!

Large Teams, Monorepos, and the Commit Firehose

In organizations like Splice the idyllic world described above, for many reasons, is not the reality.

We organize a large number of our projects not in single-purpose repositories, but in an ever-growing multi-project monorepo. We merge a firehose of commits into main for a wide range of projects. In terms of metrics, in just this past week 40 people have merged 157 pull requests consisting of 289 commits.

As branches are merged commits do not retain the coherent sequencing of we see in our Pull Requests – instead they are interleaved according to the time they were created regardless of the originating branch or author. This results in sequences of commits on main which are difficult to parse as a coherent story of impact and intent.

We are also a large and diverse team of contributors with a broad range of experience using tools like git to their fullest advantage. In truth, "crafting commits" as a communication tool is not high on this organization's list of priorities, and justifiably so. It takes a lot of experience, effort, and time to break up our work into commits of meaningful units of change. Writing code is a process of trial and error, and as such, commits can more accurately be described as "save points along the journey" rather than "meaningful units of change". They can only be transformed into units of change by rewriting git history (a craft in itself) once a solution is found.

In this context it's next to impossible to gain the benefits described earlier in this post from our commit history with our current processes. As a consumer of git commit history I gain little to no value in a "fix test" commit, nor is it easy to connect that commit to the context of its creation where impact and intent are more obvious – the meaningful unit of change in our monorepo is NOT the commit, it is the Pull Request.

Fortunately, we have a pragmatic solution to try out.

Reframing the Meaningful Unit

As it stands today we cannot commit directly to main, we must open a Pull Request, and getting a Pull Request merged involves process and tooling to ensure that that Pull Request is suitable addition of value:

Peer review and approval is required, unlike on commits
Layers of automation are guaranteed to be run, unlike on commits
Usually tied directly to Jira tickets, unlike commits

We already treat Pull Requests as the meaningful unit of change, it's just that our commit history on main does not reflect our processes.

When we switch to squash merges as the standard we will regain much of the values we desire from more cumbersome processes like "Conventional Commits", but in a way which works well with the size and makeup of our team, the organization of our code, and the velocity of our changes. We will additionally be enabling the development of tooling which can use our commit history of merged Pull Requests as a datasource which feeds into automated per-project tagged releases with auto-generated release notes.

Squash Merges and Commit Messages on `main`

When we squash merge via GitHub's "Confirm squash and merge" button, GitHub helpfully pulls all of the commit messages from the commits you're squashing into your textfield for you squash merge commit message. Wow, thanks GitHub!

Well.... this isn't actually helpful for most cases unless we're already stellar at creating a semantic and meaningful commit history on our Pull Requests. For the vast majority of cases, the squash merge commit message is a mess of "fix bug", "oops", "etc" messages, and together they do not AT ALL tell the story of what this unit of change is doing. But there is a place where this information SHOULD already live: your Pull Requests Description (you're writing meaningful Pull Request Descriptions, right?).

As a standard course of action, we should copy what we wrote in the Pull Request's Description down into the squash merge commit message textfield. By doing this we retain all of the context about this meaningful unit of change, and it becomes much easier to look at the commit history on main to gain an understanding of what has changed, why, and when.

What to Expect and When

We will be turning on Squash Merge as the only merge option on Splice/Platform on Friday evening. This change is very easy to revert, so if we run into any unanticipated problems we will do so.

This change will have very little effect on the workflows of most, but as always, there are edge cases. For example:

Each commit on main moving forward will be comprised of the entire change set implemented by the source pull request, and the commit SHA will not exist at all in the originating PR. The commits making up that change will continue to exist in the source pull request, which is linked from the squash commit in main.
Implied by the above, git bisect on main will jump from one complete pull requests of change to the next on each step of the process. What do you do if you found a bug in a specific PR, but it's difficult to identify in the linked PR? You can still bisect the original commits, it's just more involved. For instance, you could:
1. check out the SHA before the PR where the bug is identified as a new debugging branch
2. cherry pick the commits from the source PR onto the debugging branch
3. run git bisect on the debugging branch
If you have multiple daisy-chained pull requests you'll have to rebase each subsequent PR in that chain as earlier PRs in the chain are merged. This is because the commit SHAs in the pull requests you've branched off of do not end up in main, even though the code changes do, and this will result in rebase conflicts. There are a couple of options for this:
1. Use the -onto flag during git rebase. This works, but the commands are a bit tediously long to remember.
2. Run git rebase --interactive origin/main on the subsequent branch and delete the commits from the source branch. This works because the changes made in the source branch already exist in main so we can safely remove them from our subsequent branch's history and point the tip of our branch directly at maininstead of the now merged source branch.

This Makes Me Unhappy!

We are truly sorry if this change is annoying for you. We are balancing many trade-offs and at present this trade-off makes sense for the broader organization.

The background on this decision is that it was proposed in an ADR ( Adopt Squash Commits on Splice/Platform), then discussed and accepted in one of the .

A good place to provide your feedback is in the #guild-admins slack channel, and of course if you don't feel comfortable bringing this up for any reason, please reach out to your manager and they can raise it to the group.

Our processes will continue to change as this organization evolves. If this change turns out to create more problems then it solves, then we will happily turn back in the best interest of the organization.

Additional Reading

An excerpt of commits from a 6 hour slog trying to get Jenkins do what we needed. The branch's history was eventually rewritten, but it took another 30 minutes and a lot of fiddling to do so.

Let's talk about git commits