Introduction
Over the last four or so years I’ve been working with systems composed of many small to medium sized components. These components are either combined as dependencies or integrated as services to each other. For each component an individual build and release procedure and technology suite has been required; and though the technology has generally been the same it’s been my experience that each and every build is it’s own individual beast.
From the experiences I’ve gained in our efforts to improve the build structure I have quite a good understanding of what a bad and a good build system looks like. Over the next few weeks I’m going to write down the conclusions we have come too and what I think is a solid build system which makes good use of the tools we have in the industry at present. In this I’m hoping to showcase the strengths of Ivy and Maven; and how their different philosophies might be stronger when used together.
The tools I will be showcasing will be Apache Ant, Apache Ivy and Apache Maven.
The Problem
The suite of problems that we have faced repeatedly were:
- Each build was created independently of the others meaning there was no consistency to the patterns used. There was a learning curve every time you move to a different component.
- Each components build file was generally a copy of another components build script with masses of subtle ‘tweaks’ to make it fit. Often this was done by a developer who didn’t really understand the original build all that well causing subtle changes in the meaning of some of the targets and other properties. When a build looks the same but really isn’t mistakes are always made when you’re updating it.
- There was no dependency management. Build files referenced a committed project dedicated to storing 3rd party jars, and builds of components. This project was more than 4GB by the time we got rid of it. Checking that out from CVS would take hours and without it no builds (scripted or IDE based) worked.
- Many of the component projects had interdependencies in the IDE Workspace (we use Eclipse), which presented a challenge to any developer’s computer to process. Opening a workspace meant the computer was unusable frequently during the day while an IDE build was taking place.
- Getting a workspace set-up for a new starter on the team usually took someone on the team plus the new starter a full day.
These problems added up to considerably frustration and time as you might imagine.
Build and Release Principals
The suite of principals that I follow for a build and release process is below. I don’t pretend to have come up with these – they are often accepted industry practice and I only reiterate them.
- Every build is a candidate for release. There is no separation between an ‘integration’ build and a ‘release’ build and I believe the terms should be dropped all together. To qualify this I should say ‘any build done on the build server is a candidate for release’. In treating following this principal you remove the need to have a ‘special’ build with differences from the build you use every other time. Using the same build implementation in all cases makes it far easier to make it reusable and it means you don’t need a second build configuration on your build server – again a source of differences. Note however that I don’t suggest you treat builds done on a developers machine as release candidates as there is too much room for unknowns to enter the build and a developer build is rarely if ever repeatable – that’s not to say they should use a different build however. Separating a local build from a build server build is a problem for your dependency management system not the build itself.
- Every build has a unique and permanent build number; i.e. avoid update-able artefacts. To support having every build as a candidate for release it’s necessary for every build to be uniquely identifiable and unchanging. It should not be possible to re-use a build number for another build of the same artefact.
- You must be able to retrieve the exact code that went into any numbered build; i.e. tag every successful build. If any build is a candidate for release then it may actually make it all the way to Production. For Production you you need to be able the pull up exactly the code that went into it for debugging and support. This means every build should be traceable back to the code – hence the code should be tagged for every build. Tagging should be an integral and automated part of the build. In CVS you have to generate an explicit tag; in Subversion you could probably get away with recording the commit id of the build in the output and/or build job or similar – still it’s probably better to tag explicitly. You might think – wow, that’s a lot of tags – but it’s not really unless your smashing a CI server with build-able commits every few minutes; and even then if your tagging convention is intelligent it’s easy to skim the tags to the one you want. How often do you really need to pull up a tag? Usually it’s a branch already or simply Trunk (HEAD in CVS) anyway.
- Builds must be repeatable. They should be scripted and tagged at the very least; exporting a jar from an IDE is not a repeatable build. Tagging is not necessary in a local machine build where it is just for some developer testing but the build itself should still be done by the same script as you would use for a QA or Production grade build.
- A dedicated build server is important. To satisfy the requirement to have repeatable builds it’s important to have a an environment that only changes in a controlled fashion. Developers machines are not a controlled environment. This doesn’t have to be a Continuous Integration (CI) server (thought it is recommended); a dedicated build machine is under someone’s desk would be sufficient initially (for a small team anyway). It shouldn’t be an install of an IDE that is doing the build on the dedicated machine either – IDE’s can do all sorts of unknowable and unrepeatable ‘magic’ under the covers for a build.
- Dependency management makes your life much easier. Maven or Ivy fill this role admirably.
- Unit test execution must be a part of the build. Just get it going – it’s surprisingly easy to embed it in your build script and automating it encourages to developers to write more and better unit tests.
- It’s got to be easily reusable. If a completely new build is used for each and every project (even if it’s copy/paste) then every project will ultimately be built differently. While each of these may accomplish all of these principals it’s going to be difficult for developers to transfer between projects. The build structure or style should be second nature to every developer in team; struggling to understand how the build works for a project that would otherwise be structured the same is not a good place to be – there is business functionality to write!! Note also that nothing can be considered ‘reusable’ until its actually been reused successfully.
Some principals which are not directly related to a build implementation, but do have overlap are:
- A ‘sign-off’ process for deploying artefacts through to UAT and Production should be in place.
- Artefacts should be promotable; i.e. don’t embed environment information in your artefacts. You should be able to promote an artefact from System Test, to UAT to Production without changing it in any way. This means all environment specific information should be externalised. In larger organisations the environment information would normally be inaccessible to developers from UAT upwards (to Production); there would normally be an Environments or Operations team with the responsibility of maintaining UAT and Production environments.
What’s Next
In the next post I’m going to cover the structure and technology of the build we are using now and describe why we think it’s far better than what we’ve had before.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_b.png?x-id=fa12e872-8b2c-4d17-987a-d4026f59495f)