Platform Engineering

Sohrab Hosseini

Semantic Versioning with Conventional Commits

Posted by Sohrab Hosseini on 11 June 2020

CICD, DevOps, Technology, developer, semver, git, scm, versioning, branching, GitLab, release

Photo from Zach Reiner - Unsplash

Versioning is important. I do not have to tell you this. Yet, I see it done poorly over and over again. 

The deficiencies I usually encounter are a lack of unambiguous developer and release processes and poor tooling support. If your developers sit there wondering how to do certain tasks, then the process is broken. They should know how to release a new version or how to hotfix production while the trunk has moved on. And your CI/CD processes should support these scenarios too!

Many rely on their CI/CD tool to determine the next version of their application. I am guilty of doing this too but at some point, I realised this is a decision that only the developer can make. Our tools are not yet smart enough to look at code changes and tell me if it's a feature, a fix or something else altogether.

This gave me the burning desire to change the way we do things. My goals were to have an unambiguous process that covers all those use cases, is developer-friendly and is supported in the variety of CI/CD tools we tend to use across projects.

This post chronicles this approach.

Semantic Versioning

We are going to use Semantic Versioning (semver) here. This is possibly the most prominent versioning scheme used in software today. 

Most of the applications and libraries we build tend to expose an API, be it a REST API, an interface, etc. Semver is all about versioning this API.

The syntax is universally known:

<MAJOR>.<MINOR>.<PATCH>

MAJOR Introduce a new backward-incompatible change 1.0.0 → 2.0.0
MINOR Introduce a new backward-compatible change 1.0.0 → 1.1.0
PATCH Fix a bug while maintaining backward-compatibility 1.0.0 → 1.0.1

As the table indicates, backward-compatibility is a big differentiator when it comes to version bumps.

Some people still get this wrong so here I am going to use a JSON REST API to illustrate what constitutes a patch, a feature or a breaking change.

Let's assume a competent developer has implemented this REST API and it successfully follows Postel's law:

Major
  • Remove an operation, i.e. remove an HTTP verb/path combination
  • Add a mandatory field to a request payload
  • Add/remove a field to a response payload
    • Adding a field may not be a breaking change if all consumers are deemed sufficiently tolerant
Minor
  • Add a new operation, i.e. a new HTTP verb/path combination
  • Add an optional field to a request payload
Patch
  • No interface changes

Conventional Commits

As good as tools are these days, they cannot identify the nature of a code change yet. The day will come where through Machine Learning, this will be possible but for now, we have to rely on good old-fashioned human intelligence.

Conventional commits provide the mechanism to communicate the nature of changes in a commit between the developers and the CI/CD tools.

In a nutshell, the developer provides a commit message that unambiguously identifies the nature of the change. Then a CI/CD tool can scan all the commit messages since the last version and determine how to bump the version.

In addition to this automation, this approach provides clear communication of changes to other team members and even let us automatically generate release notes and changelogs.

The official summary and examples for Conventional Commits are concise enough that there is no point for me to repeat them here. Go ahead and have a look. I will wait for you here.

Here are some sample commits for our JSON REST API:

Major
  • feat: disable deletion of records

    BREAKING CHANGE: removed an endpoint

 

  • feat: expiry date must be provided by the user

    BREAKING CHANGE: new mandatory field in create operation
Minor
  • feat(my-operation): allow users to provide an optional name to override the default
  • feat: add operation to retrieve sub-records
Patch
  • test: refactor user management test cases
  • ci: point to the new registry
  • docs: add missing method documentation in create operation

If your team is a bit more "fun", you can always give these alternate signals a shot: ✨ (feat), 🐛 (fix), 📚 (docs), 💎 (style), ♻️ (refactor), 🚀 (perf), ✅ (test), 📦 (build), 👷 (ci), 🔧 (chore).

What are you building?

If you have spent any time in software development, you already know that people who advocate for "one size fits all" need to be shown the door. So, in this post, I would like to examine two vastly different approaches to software release and how this technique can be applied to both.

Description Benefits Encountered in
Release Every Merge
  • Each commit on the trunk branch is a release with a new semantic version
  • A new version is not a big deal and versions are not treated like holy, finite resources
  • A single application may go through many versions each day
  • Changes go through quality assurance gates, such as unit testing, peer reviews, pull request builds, and even review apps
  • No need to pre-plan a release train. Choose versions you want to release when you need to.
  • Shorter feedback loop to testers and end-users
  • Projects delivered by vendors/consultants
  • Projects where there are multiple non-production environments available to deploy and test changes rapidly
Pre-release then Release
  • Each release is preceded by one or more pre-releases in the form of alpha and beta versions
  • Version bumps are treated preciously
  • Need to retest when cutting a release from pre-releases
  • Gain full confidence by releasing pre-releasees before introducing the stable version
  • Your consumers do not have to wonder what happened to intervening versions, as they would in the other approach
  • Libraries, software products and open-source projects
  • Projects with slower stable release cadence

I have seen many variations of these two approaches in the wild so what we will discuss here should be as applicable to those as well. Though I should mention that I tend to avoid styles that break the version immutability, such as SNAPSHOT versions in Maven or @next distribution channels in NPM.

Pull Requests

Regardless of how you do your releases, I am hoping you are introducing new features via pull requests (sometimes called merge requests). If you are not, we have bigger problems than just versioning.

Each pull request should ideally contain a single feature or fix. As part of the pull request review, the developer may need to commit more changes to address review comments. But these additional commits are not new features or fixes to code on the trunk.

My solution is to use a squash merge strategy. This way, the developers can do whatever they like with their commit messages on the feature/fix branch. Those commits will all disappear and the developer can provide a conventional commit message for the entire pull request at the point of merge.

Pull Request Commit Graph

Most decent Git repositories also let you use your pull request name as your squash commit message. This is nice if you like to see consistent pull request names from your developers and let reviewers to even review the squash commit message before approving.

Branching Strategy for Release-Every-Merge

Regardless of the release style, I tend to lean towards trunk-based (mainline) branching. I avoid Gitflow because I care enough about my developers so as not to make them spend everyday resolving merge conflicts. Not to mention, rebuilding the same version of an application just because you merged from develop to master flies in the face of "build once, deploy many times" CI/CD practice.

Now that I have reached my per-post quota for ranting about Gitflow, let's talk about how we do branching when we release on every merge.

This is quite straightforward: create a feature/fix branch and follow the pull request process above. 

As a CI/CD process designer, one of your primary goals should be: whatever developers do most often should be easiest to do. I feel the above meets this criterion.

The less common scenarios are not that much more difficult neither. Let's say the developers are building a new major version of the application but there has been a production defect for the previous major version that needs to be hot-fixed. This is the playbook to do this hotfix:

  1. Find out the minor version in production. Let's say v1.3.
  2. Create v1.3.x branch from the latest patch version for that minor version, e.g. v1.3.6.
  3. Create a fix branch and pull request the fix back into v1.3.x
  4. Build, deploy and promote to production from v1.3.x branch
  5. Port the fix to mainline by merging the v1.3.x branch into master
Release-Every-Merge Hotfix Commit Graph

Branching Strategy for PreRelease-then-Release

In this method, you introduce the breaking changes on a pre-release branch. You can use whatever zany name you like for these but I will stick to the traditional alpha/beta terminology.

Essentially, you work on one or more pre-release branches until you are ready to release the new version to your consumers; at which point, you simply merge the pre-release branch into the trunk.

This diagram demonstrates this by releasing 2 alpha and 1 beta versions prior to the next canonical version bump.

Pre-Release-then-Release Commit Graph

Above diagram also demonstrate a hotfix to production during all this pre-release work.

It goes without saying, you can have fewer or more pre-release branches and even merge back-and-forth between them as you desire. It is really up to your personal release style.

Implementation in Tools

Let's have a look at how our approach to versioning and branching can be applied to various source code management and CI/CD platforms.

While you can leave it up to the developers to observe the Conventional Commits conventions, that would require a monk-like level of discipline that I hardly see in our profession. So it is much wiser to enforce the commit message format on the relevant branches. Most SCMs provide this feature as a server-side hook.

For some, realising their commit messages are incorrect on the server is too late. Fortunately, there are tools, such as Commitizencommitlint or even a simple pre-commit that can warn users as early as possible.

Though if you follow our squash merge strategy, you are absolved from caring about your local commit messages.

We, however, are not absolved from caring about the commits on the long-living branches. So, it is always recommended to ensure the build process validates the commit history on those branches since the last build to ensure all commit messages are compliant.

Typically, we have at least the following in place in our CI/CD processes:

  • On each pull request build, validate all commit messages since the last build to ensure they follow conventional commits
  • On a trunk build, in addition to usual testing:
    1. Validate the commit messages
    2. Determine the next version by analysing commit messages and previous tags
    3. Tag the current commit with the new version
    4. Optionally generate release notes
    5. Publish the artifacts

Tagging the commits to identify it as the source for an application version is a valuable practice. It makes the Git repository a self-contained source-of-truth and removes some of the over-reliance on the CI/CD tool. More than once I have seen teams having to restart their versioning when they were using CI/CD as the source-of-truth but then they had to migrate to a different tool or there was an irrecoverable failure of the tool.

Now that we know what needs to be done, we need to figure out how  to do it. This seems like a lot of functionality to implement. Fortunately, there are tools out there that already do all of this for us. My current preferred tool is semantic-release, which provides all of the above and more.

semantic-release

semantic-release provides a command-line interface (CLI) that can be invoked by any CI/CD tool. The prerequisites are NodeJS and Git.

One of the strengths of semantic-release is that it can be extended using plugins. The official plugins let you create release notes and changelog files, publish releases to GitHub and Gitlab, publish packages to NPM and APM, etc. There are plenty of community plugins as well.

I will use a GitLab to illustrate how to configure and use this tool but this approach can as easily be translated to other CI/CD tools.

I am going to use GitLab's Docker executor to run my builds in containers so first, I need to create a Docker image that has semantic-release and all the plugins I need.

FROM node:14.3.0
LABEL maintainer="sohrab"
RUN npm install --global \
 semantic-release@17.0.8 \
 @semantic-release/exec@5.0.0 \
 @semantic-release/gitlab@6.0.4

(Yes, I version-pin everything. Like a pro. "Repeatable, reliable builds"  is another CI/CD principle. If you are re-using this, please check npmjs.com for the latest versions.)

Next, I need to configure semantic-release for my repository. There are a few ways to do this but here I drop .releaserc.json file at the root of my repository with the following content:

{
  "plugins": [
    "@semantic-release/commit-analyzer",
    "@semantic-release/release-notes-generator",
    "@semantic-release/gitlab",
    [
      "@semantic-release/exec", {
        "successCmd": "echo \"VERSION=${nextRelease.version}\" >> vars.env"
      }
    ]
  ]
}

This will enable only the plugins that I want to use, in this case:

  • commit-analyzer determines the next version by analysing the commit history of the repo
  • release-notes-generator generates release notes in the conventional-changelog format
  • gitlab publishes the release notes as a GitLab Release
  • exec writes the release version to a dot-env file so it is available in the subsequent stages

Finally, we need to configure the CI/CD itself. Here it is shown in GitLab YAML. Even if you have never used GitLab before, this should be fairly self-explanatory and translatable into other CI/CD tools:

stages:
- version
- build
version: image: semantic-release script: - semantic-release
artifacts:
reports:
dotenv: vars.env rules:
- if: '$CI_MERGE_REQUEST_TARGET_BRANCH_NAME == "master"'
when: on_success
build:
image: ...
script:
# run all tests, build, package and publish the artifact - ...
rules:
- if: '$VERSION'
when: on_success

The VERSION environment variable in vars.env file, produced by the version stage, can then be used by the build stage to version the artifact. It is worthwhile to note that we skip build and publish if no new version has been produced.

The last bit of configuration is to ensure that semantic-release can push tags into your repository. For this, you need to provide the tool with the Git authentication details. In my use case, it is a matter of setting GITLAB_TOKEN environment variable. 

Tip: If you are using a self-hosted Gitlab instance, you need to also configure GITLAB_URL to point to your instance. This is not required if you are using gitlab.com.

I should note that in case of a build failure, the version stage should not be run again since it has already tagged the commit with the version. So the above sequence is suitable if your CI/CD tool lets you resume failed pipelines from the build stage. If this support doesn't exist, then you need  to either:

  • manually clean-up the tags before re-running the pipeline, or
  • change the pipeline to perform a semantic-release with --dry-run flag to get the new version, run the build and finally run semantic-release for real.

That's it!

We have used this approach, especially the release-every-merge style, on projects with relative success. 

I have to be honest with you, if you don't have good tooling, you are going to need good developer discipline. If you have neither, then this may not be for you. But if you can use these techniques, then you will never have to give versioning much thought past what you are delivering in your commits and pull requests. 

 

If you like what you read, join our team as we seek to solve wicked problems within Complex Programs, Process Engineering, Integration, Cloud Platforms, DevOps & more!

 

GET IN TOUCH!

Leave a comment on this blog: