It’s surprising how quickly you can run into the limitations of Docker, especially when you try to use Docker for building your applications from source code. One of the most frustrating limitations becomes apparent when you try to use Docker to deal with a universal problem in building applications: dependency management. You have to deal with dependencies no matter what your language of choice. And Docker makes it difficult in almost all cases. Let me give you a few examples to illustrate.
Docker world problems for the Java aficionado
Let’s start with the woes of a Java developer. Java application dependencies typically come from a Maven repository. When Java developers start using Docker to build applications, they quickly discover that a simple
mvn install can be quite an annoyance. Maven installs can be time-consuming since artifacts are pulled down from remote repositories, and you generally don’t want the installs to be repeated unless there are changes in your dependencies.
You might think that if you add the install as a regular
RUN instruction in your Dockerfile, the install will add a layer to your image, and that layer can be retrieved from the image cache. And that is in fact the case. However, in a typical Dockerfile for building Java applications, you will usually have an
COPY instruction to add your project source files to the Docker image before your
RUN instruction for the
mvn install. The problem then becomes that if any of your source files change, it will result in a cache miss for not just the
COPY instructions, but also the
RUN instruction that appears after them. That means the Maven install has to occur all over again any time you change your source code. This makes it incredibly frustrating to use Docker to build your Java application.
Same story for Node.js
Package management is also where Docker fails Node.js developers. It’s the same story. Any non-trivial Node.js application has package dependencies, and you’ll need to perform a
npm install. Again, the install of package dependencies is time-consuming and you don’t want it to happen every single time you build your application. But for the same reasons described above for Maven installs, you are likely see that your NPM install has to be re-run any time you change your source code. Again, this makes it frustrating to build Node.js applications using Docker.
Resorting to clever tricks
You might think the way to solve at least the Maven case is to add the local Maven repository as a volume. But you will quickly discover that Docker doesn’t allow you to add volumes during a build. You can find one of the several issues filed on the Moby project about this here. You will find plenty of users pleading for the ability to do this, and you will find some clever but ultimately frustrating workarounds there. For NPM, here’s one that someone devised that is clever, but ultimately a frustrating hack:
# install npm ( in separate dir due to docker cache) ADD package.json /tmp/npm_inst/package.json RUN cd /tmp/npm_inst &&\ npm install &&\ mkdir -p /tmp/app &&\ mv /tmp/npm_inst/node_modules /tmp/app/
Improving Docker builds
So how do you create a build that using Docker that runs these dependency/package management tasks only when needed? I am going to show you how you can achieve this using a Docker-based workflow, and run it using a tool called Sandbox. Now before you run away at the thought of a new tool, you should know that if you have a working knowledge of Docker, the effort for learning Sandbox should be almost trivial. Let me illustrate with example workflows that demonstrate how to deal with the problems I’ve described above.
A workflow for Java applications
First, let’s take a look an example workflow that installs Maven dependencies, and then starts the application:
steps: - image: 'openjdk:latest' name: Install dependencies cache: true sourceIncludes: - pom.xml script: |- cd /app mvn install - image: Install dependencies imageSource: step script: |- cd /app mvn spring-boot:run
A Docker-based workflow works just like you imagine it would – it’s a sequence of steps where each step runs a Docker container. Here, you can see from the first step’s script that it performs a
mvn install in the /app directory, within a container based on the
openjdk:latest image. Why the /app directory? By default, all project source files (workflows always run in the context of a project) are added there within the Docker container for a step.
Notice that this step specifies a
sourceIncludes property – this is used to explicitly define the project source files to be added to the step container. By default, all project source files are added. Because this property is defined, it overrides that behavior and only the
pom.xml file is added. The reason we do this is related to the
cache property. Because the
cache property is set to
true, the final state of this container is cached as an image, much like Docker caches each image layer during a build. The hash for the image (the key into the cache) is based on the source files added in the step, and thus we want to ensure that only the
pom.xml file is included for this step. This ensures that only changes to the
pom.xml file will invalidate the cache, and cause a re-run of the
Once the first step runs, the container will have pulled down all of the project’s Maven dependencies. That final container state with all dependencies in the local Maven repository will be cached as the image for this first step.
You can see that the second step references the previous step as it’s
image, and specifies
step as it’s
imageSource. That means that the base image for the second step is the final state of the first step (i.e., one with all of the Maven dependencies in the local Maven repository). The second step then performs a
mvn spring-boot:run in the
/app directory. Again, it’s important to note that all the project source files are added by default into
/app. Because this step doesn’t specify a
sourceIncludes property, all project source files are added. Note that these source files are added in a layer above the base layer, which contains the local repository initialized in the previous step.
And that’s it! We have the behavior we want: since the first step is cached, it will only run when the
pom.xml file changes. And the second step launches the application, and can do so because all the required Maven dependencies in the local Maven repository.
A workflow for Node.js applications
The workflow for doing this with NPM is almost identical:
steps: - image: 'node:8-alpine' name: Install dependencies cache: true sourceIncludes: - package.json script: |- cd /app npm install - image: Install dependencies imageSource: step script: |- cd /app npm run start
Elegant, and more powerful than Docker!
Now that I’ve shown you how to perform more powerful Docker builds with workflows, let me talk about how easy it is to run workflows. As you saw, workflows are just YAML files, and they can be run with a simple command:
sbox run workflow-name. Getting Sandbox in order to run workflows is also really easy – there’s no installation! Sandbox ships as a tiny (<100KB) zip file that you extract into your project, and you’re instantly ready to run workflows using a
If you are looking to do more advanced, and more elegant, Docker builds, get started today with Docker-based workflows.