More powerful Docker builds

It’s surprising how quickly you can run into the limitations of Docker, especially when you try to use Docker for building your applications from source code. One of the most frustrating limitations becomes apparent when you try to use Docker to deal with a universal problem in building applications: dependency management. You have to deal with dependencies no matter what your language of choice. And Docker makes it difficult in almost all cases. Let me give you a few examples to illustrate.

Docker world problems for the Java aficionado

Let’s start with the woes of a Java developer. Java application dependencies typically come from a Maven repository. When Java developers start using Docker to build applications, they quickly discover that a simple mvn install can be quite an annoyance. Maven installs can be time-consuming since artifacts are pulled down from remote repositories, and you generally don’t want the installs to be repeated unless there are changes in your dependencies.

You might think that if you add the install as a regular RUN instruction in your Dockerfile, the install will add a layer to your image, and that layer can be retrieved from the image cache. And that is in fact the case. However, in a typical Dockerfile for building Java applications, you will usually have an ADD or COPY instruction to add your project source files to the Docker image before your RUN instruction for the mvn install. The problem then becomes that if any of your source files change, it will result in a cache miss for not just the ADD and COPY instructions, but also the RUN  instruction that appears after them. That means the Maven install has to occur all over again any time you change your source code. This makes it incredibly frustrating to use Docker to build your Java application.

Same story for Node.js

Package management is also where Docker fails Node.js developers. It’s the same story. Any non-trivial Node.js application has package dependencies, and you’ll need to perform a npm install. Again, the install of package dependencies is time-consuming and you don’t want it to happen every single time you build your application. But for the same reasons described above for Maven installs, you are likely see that your NPM install has to be re-run any time you change your source code. Again, this makes it frustrating to build Node.js applications using Docker.

Resorting to clever tricks

You might think the way to solve at least the Maven case is to add the local Maven repository as a volume. But you will quickly discover that Docker doesn’t allow you to add volumes during a build. You can find one of the several issues filed on the Moby project about this here. You will find plenty of users pleading for the ability to do this, and you will find some clever but ultimately frustrating workarounds there. For NPM, here’s one that someone devised that is clever, but ultimately a frustrating hack:

# install npm ( in separate dir due to docker cache)
ADD package.json /tmp/npm_inst/package.json
RUN cd /tmp/npm_inst &&\
    npm install &&\
    mkdir -p /tmp/app &&\
    mv /tmp/npm_inst/node_modules /tmp/app/

Improving Docker builds

So how do you create a build that using Docker that runs these dependency/package management tasks only when needed? I am going to show you how you can achieve this using a Docker-based workflow, and run it using a tool called Sandbox. Now before you run away at the thought of a new tool, you should know that if you have a working knowledge of Docker, the effort for learning Sandbox should be almost trivial. Let me illustrate with example workflows that demonstrate how to deal with the problems I’ve described above.

A workflow for Java applications

First, let’s take a look an example workflow that installs Maven dependencies, and then starts the application:

steps:
  - image: 'openjdk:latest'
    name: Install dependencies
    cache: true
    sourceIncludes:
      - pom.xml
    script: |-
      cd /app
      mvn install
  - image: Install dependencies
    imageSource: step
    script: |-
      cd /app
      mvn spring-boot:run

A Docker-based workflow works just like you imagine it would – it’s a sequence of steps where each step runs a Docker container. Here, you can see from the first step’s script that it performs a mvn install in the /app directory, within a container based on the openjdk:latest image. Why the /app directory? By default, all project source files (workflows always run in the context of a project) are added there within the Docker container for a step.

Notice that this step specifies a  sourceIncludes property – this is used to explicitly define the project source files to be added to the step container. By default, all project source files are added. Because this property is defined, it overrides that behavior and only the  pom.xml file is added. The reason we do this is related to the  cache property. Because the  cache property is set to  true, the final state of this container is cached as an image, much like Docker caches each image layer during a build. The hash for the image (the key into the cache) is based on the source files added in the step, and thus we want to ensure that only the  pom.xml file is included for this step. This ensures that only changes to the  pom.xml file will invalidate the cache, and cause a re-run of the  mvn install.

Once the first step runs, the container will have pulled down all of the project’s Maven dependencies. That final container state with all dependencies in the local Maven repository will be cached as the image for this first step.

You can see that the second step references the previous step as it’s image, and specifies step as it’s imageSource. That means that the base image for the second step is the final state of the first step (i.e., one with all of the Maven dependencies in the local Maven repository). The second step then performs a mvn spring-boot:run in the /app directory. Again, it’s important to note that all the project source files are added by default into /app. Because this step doesn’t specify a sourceIncludes property, all project source files are added. Note that these source files are added in a layer above the base layer, which contains the local repository initialized in the previous step.

And that’s it! We have the behavior we want: since the first step is cached, it will only run when the pom.xml file changes. And the second step launches the application, and can do so because all the required Maven dependencies in the local Maven repository.

A workflow for Node.js applications

The workflow for doing this with NPM is almost identical:

steps:
  - image: 'node:8-alpine'
    name: Install dependencies
    cache: true
    sourceIncludes:
      - package.json
    script: |-
      cd /app
      npm install
  - image: Install dependencies
    imageSource: step
    script: |-
      cd /app
      npm run start

Elegant, and more powerful than Docker!

Now that I’ve shown you how to perform more powerful Docker builds with workflows, let me talk about how easy it is to run workflows. As you saw, workflows are just YAML files, and they can be run with a simple command: sbox run workflow-name. Getting Sandbox in order to run workflows is also really easy – there’s no installation! Sandbox ships as a tiny (<100KB) zip file that you extract into your project, and you’re instantly ready to run workflows using a  sbox run.

If you are looking to do more advanced, and more elegant, Docker builds, get started today with Docker-based workflows.

4 thoughts on “More powerful Docker builds”

  1. Best practices for writing Dockerfiles (https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/) has suggestions to deal with this issue more elegantly than introducing yet another tool in the workflow. In particular, read the “Use a .dockerignore file”, “Use multi-stage builds”, and “ADD or COPY” sections of the document that will not only help solve this cache invalidation issue, but also give more insights on how to make clean and minimal final images with small number of layers.

Leave a Reply

Your email address will not be published. Required fields are marked *