Avoid breaking the main branch

I’m calling the default branch main in this article, see reason why GitHub/renaming

Why?

Most teams I’ve worked on require a green main branch to deploy changes to production, something breaking the main branch means:

We cannot merge something to deliver something to our customers
We cannot fix an issue that is preventing a customer from getting the value need
Other members of the team might stop what they are doing to try to fix the issue, which means that they had to switch their context to come back to what they were doing.

We reduce the impact on the team by having a shared team where someone can take the issue and solve it and this is a good culture to have, but you’re still affecting those who would look at the issue, or they are waiting on the issue to be fixed to merge their changes.

How did it happen?

There are many reasons why something would pass on a branch and fail once the changes are merged.

Time differences

My build passed on day X but when it was merged on day Y it failed or today was a .

This is because your tests are time dependent, and you should try to avoid this by mocking what is “now” for your tests

Pollution

This only happens if your tests run in a random order every time (which they should). The build randomly fails, the cause for this is because there’s a previous tests which is setting some configuration in the system in order for them to run and pass, but they did not clean the state afterwards.

You should not ignore these as you can’t rely on your test suite covering your scenarios properly. More often than no you should retry the build so that no one else is block but ensure you fix the root cause. For that you can run something call “Bisect”

Bisect

Bisect is where we do a binary search with a seed that guarantees that a test will fail because some other test that run before altered the state of the application in a way that is not normal i.e they changed the timezone, changed some feature flag, etc

To run this we just need a way to know the order in which the test run (seed) and which test fails. Then we can change all the test than run before many times until we find the minimal argument list that makes the test fail.

# This is probably easy to debug but if we imagine a real case it would be hard to do it manually 
test Test1, Test2, Test3, FailingTest

# A minimal argument list might look like this
test Test2, FailingTest

Some testing frameworks like rspec have this method included on the framework. See Bisect

Git Logical Conflict

This is a term that I use for when two people working on related code change something that affects the other person.

Logical conflicts are very hard to catch as you can’t predict what other people will change on another PR. In small teams this is easier to catch as you might be aware of everything going on but as soon as the services grow this issue becomes very common.

You should solve this by enforcing your branches to be up-to-date with the main branch before merging your changes. Although this if not done carefully will impact the performance of your team.

Logical Conflicts

The solution for this is quite trivial you just need to enforce branches to be up-to-date but believe me this is not very easy to afford as it will make your team grumpy and very slow which is what you want to avoid as it might be worse than having the main branch read sometimes.

History Time

A few years ago I was working on a project at intuit. The project was the Global Payroll Platform. It was a very nice monolith application. The code was modularized into several services and each team knew exactly who owned what. As you might be aware people do not like to be paid less than what they should or companies do not want to get into trouble with their local tax authorities. For this reason we had LOTS of tests covering all the scenarios. Either compliance scenarios or implementation details scenarios. for example, an internal service needs to call something else and needs to do something with it vs HMRC says that given an employee with salary X they should be paid Y.

Because of the amount of tests and the complexity of payroll and the service our CI pipeline was very slow (55 minutes slow to be precise) so when we enabled the GitHub check to ensure that the branches were up-to-date people was creative in order to solve their problems.

People had alarms on their phone to check their PRs in 54 minutes so that they could click the lovely green button " Merge commits" So you could hear the alarms from everyone going on at random times. This was funny but definitely not great as everyone was fighting against each other to merge their changes.

Sometimes people merged their changes together so that only one Pull request had to be merged. This although ingenious it makes the code double dangerous as there’s more things to go wrong and harder to debug what affected our customers.

CIManager

CIManager was an internal tool written by our DevOps team at the time (René , Juan Carlos and Israel) that was integrating our CI (Jenkins) and GitHub Enterprise. You need to imagine that this was 5 years ago where there was nothing like GitHub actions or GitHub apps, etc.

This app would trigger builds and send the results back to GitHub for every commit, initiate a deployment, inform everyone about the status of things, close JIRAs if a PR was merged and so on. It was great and I miss some of the things it did. Although it was very custom to our team and the way we ran things.

Ship It

:shipit: Was our solution to the productivity issues caused by enabling the requirement of all PRs must be up-to-date with the main branch before merging.

The process is very simple:

PR goes through review cycle until everyone agrees it’s good to be merged (this was done by approving the PR)
Engineer tells CIManager to ship the PR :shipit:
CIManager would pull from the default branch if needed and attempt to merge it if all the required checks were green (5 years ago this was done by having a hardcoded list of the required checks, GitHub have come a long way since then!).
When the CI step finishes, CIManager attempts to do step number three again (causing a loop where the PRs gets merged).

Stop conditions, if at any time a required step fails or there’s a git conflict, CIManager would stop and call a human for help.

Issues

One of the biggest issues of the shipit approach and the thing some member of the group complained at the time was the git history. They were writing very nice commits with a great message and insights on the description, rebasing to keep everything organised. So they did not like have the “Update branch from default branch” commits that CIManager was generating.

How did we solve this? We talked and agreed that the best way forward to enable the whole team to merge their changes quick and safe was to keep ship it but merge the PRs using the squash method. So the commit message would be the PR title and the description of the PR would be the description of the commit (the long message).

The rationale behind this was that it was more important to have the main branch green rather than git history being the way some people like it. I personally prefer squash commits because I consume the git history through GitHub.

Results

The results of this are very interesting.

Our team got a lot more productive (around 3x more productive), the main branch was never broken again unless one of those pesky tests failed randomly. And everyone agreed was more focused.

Ship It Nowadays

As I’ve mention a long this post this was done a long time ago, but today this is a lot easier to approach. You might not need custom servers or custom logic for your CI.

These days you can just:

Enable the GitHub check with all the custom needs you have (i.e closed comments, signed commits, etc)
Enable Allow auto-merge so that GH merges the PR automatically whenever all the checks are green.
Use a GitHub action to update the branch whenever there’s a new commit on the main branch. I have one that I’ve forked so that I can keep it up to date. Although I’m hoping this is replaced soon by GitHub’s own Merge Queue

name: Auto Update PRs
on:
  push:
    branches:
      - master
jobs:
  shipit:
    name: Run Shipit
    runs-on: ubuntu-latest
    steps:
      - name: Ship It
        uses: kevinrobayna/shipit@v2.0.0
        id: shipit
        env:
          GITHUB_TOKEN: '${{ secrets.GITHUB_TOKEN }}'
          PR_FILTER: 'auto_merge' # Only monitor PRs that have 'auto merge' enabled
          PR_READY_STATE: 'ready_for_review' # Only monitor PRs that are not currently in the draft state.
          EXCLUDED_LABELS: "dependencies" # Ignore PRs raised by dependabot as it should update itself
      - run: echo 'Merge conflicts found!'
        if: ${{ steps.shipit.outputs.conflicted }}

      - run: echo 'No merge conflicts'
        if: ${{ !steps.shipit.outputs.conflicted }}

Conclusion

I’m a firm believer that enabling Require branches to be up to date before merging is something that everyone should do and it should even be a default config for organisations.

I don’t believe solutions like to alert when the main branch is broken before everyone leaves for the afternoon is right. The solution is to prevent the main branch from being broken in the first place.

If you are working on a team who struggles with these “logical conflicts” please consider the shipit method! And if you have people who think I’m terribly wrong, please send them my way! 😂

Why?

How did it happen?

Time differences

Pollution

Bisect

Git Logical Conflict

Logical Conflicts

History Time

CIManager

Ship It

Issues

Results

Ship It Nowadays

Conclusion

Farewell ShipIt, Hello Merge Queue!

Java, is it that bad?

Should you use REPL consoles?

Avoid breaking the main branch

Why?

How did it happen?

Time differences

Pollution

Bisect

Git Logical Conflict

Logical Conflicts

History Time

CIManager

Ship It

Issues

Results

Ship It Nowadays

Conclusion

You may also like

Farewell ShipIt, Hello Merge Queue!

Java, is it that bad?

Should you use REPL consoles?