Containers have evolved very quickly from being an obscure technology to being confused with virtual machines to being mainstream production drivers today. Docker itself has been around for 7 years but many developers of the old world still do not understand it very much. This leads to sub-optimal implementations because people are forced to use it and take the easiest possible approach. In this post, we’ll explore how to optimise Docker images in the context of Java, using a Spring Boot example.
Let’s start by generating a sample app from start.spring.io. We’ll pick the following options to create a bare bones web app:
- Maven Project
- Spring Boot 2.2.6
- Java 14
- Dependencies: Spring Web
Build the project using
mvn clean package and you will find a far JAR file weighing
about 17 megabytes. This standalone, self-executable format was made popular by Spring
Boot since it’s convenient to just deploy a single file and ignore app server
dependencies. Developers hate sharing app servers for good reason.
Within the project, create the simplest
FROM openjdk:14-slim COPY ./target/*.jar app.jar ENTRYPOINT [ "java", "-jar", "./app.jar" ]
Build the docker image and push it to your image registry.
docker build -t your-registry/demo . docker push your-registry/demo
You’ll notice that it pushes 5 layers ranging from a miniscule 3 kilobytes to a much larger 336 megabytes. That first layer should have a size similar to your far JAR.
ea2b82629d91: Pushing 0kB / 17.6MB64dd9292f295: Pushing 0kB / 335.9MB 38ae49ae5249: Pushing 0kB / 3.6kB 2eaf0d58a380: Pushing 0kB / 8.8MB c2adabaecedb: Pushing 0kB / 69.2MB
Docker works by caching layer data. If the registry already has an identical layer, the client skips pushing that layer. Thus, building images that reuse layers as much as possible reduces the net storage and bandwidth usage. If you don’t care about that, it also makes your pushes and pulls faster, which directly translates to app startup time in the Kubernetes world.
Let’s make one tiny change to our code, maybe add a log statement, then rebuild the project and docker image again and push it.
aac3c35cde67: Pushing 0kB/ 17.6MB64dd9292f295: Layer already exists 38ae49ae5249: Layer already exists 2eaf0d58a380: Layer already exists c2adabaecedb: Layer already exists
Hold on.. why did I have to spend time and use another 17.6 megabytes of bandwidth just for a single line of code that changed? Because fat JAR. While this number doesn’t seem very large by modern standards, it will grow as large as the number of dependencies your project requires. Imagine the cost of changing a single variable on a gigabyte-sized layer. There are people for whom this scenario needs no imagination.
If you do a
docker images|grep demo, you’ll find the image you just built weighs a hefty
432 megabytes while your fat JAR only contributed 17. The rest of that bulk is from
openjdk:14-slim base image (slim being.. relative).
When the Java world finally moved on from Java 8, the concept of Java modules was
introduced. It’s essentially an abstraction above packages so a module represents a bunch
of packages. Nothing fancy. What’s interesting is the introduction of the
which lets you generate your own JVM based on what modules you need. This removes all the
unnecessary Java APIs you do not use into a truly slim build. This begs the question of
“How do I know which modules I use?”, which is answered by the
jdeps utility which can
analyse your compiled code.
Now that we have the background sorted, let’s work out what needs to be done:
- Write your code and build the far JAR as usual
jdepsto analyse what modules your app requires
- Build a two-stage
- In Stage 1:
- Use the standard JDK base image
jlinkto build an optimised JVM
- Unpack the far JAR
- Organise the unpacked artifacts between dependencies and app code
- In Stage 2:
- Use a tiny base image
- Move the optimised JVM over
- Move the unpacked dependencies over
- Move the app code over
- Launch your main class using the optimised JVM
After building your fat JAR, unpack it and run
jdeps on recursive mode against the
dependencies and the JAR itself. Some dependencies like
log4j-api are multi-release so
you might need to specify a release number based on your JDK. We’re using 14 here.
$ mkdir analysis && cd $_ $ cp ../target/*.jar x.jar && jar -xf x.jar $ jdeps -s --multi-release=14 --recursive -cp BOOT-INF/lib/* x.jar classmate-1.5.1.jar -> java.base com.fasterxml.jackson.annotation -> java.base com.fasterxml.jackson.core -> java.base com.fasterxml.jackson.databind -> com.fasterxml.jackson.annotation ...
This gives you a giant map of which dependency needs which module, which are granular
details we’re not interested in. What we need is a comma-separated list of module names to
jlink command later. The
--print-module-deps argument does that. We will also
ignore split package warnings by
greping them out.
$ jdeps --ignore-missing-deps --print-module-deps --multi-release=14 --recursive -cp BOOT-INF/lib/* x.jar|grep -v Warning: java.base,java.desktop,java.instrument,java.logging,java.management,java.management.rmi,java.naming,java.prefs,java.rmi,java.scripting,java.security.jgss,java.sql,java.xml,jdk8internals,jdk.httpserver,jdk.unsupported
Much better. We’ll save that list for later.
Using the same JDK base image for stage 1, we run
jlink, feeding in the modules list
from the previous step and specifying an output directory in
/jvm. We then move the fat
JAR over, unpack it and create a
/app directory to group metadata and app code
BOOT-INF/classes). Dependencies remain in
In stage 2, we use a tiny base image
debian:stretch-slim and use 3
COPY statements to
divide these parts into separate layers: the JVM, the dependencies and the app code. We
then launch the app using the JVM, including the dependencies in the classpath.
FROM openjdk:14-slim RUN jlink --output /jvm --no-header-files --no-man-pages --compress=2 \ --strip-debug --add-modules java.base,java.desktop,jdk8internals,\ java.instrument,java.logging,java.management,java.management.rmi,\ java.naming,java.prefs,java.rmi,java.scripting,java.security.jgss,\ java.sql,java.xml,jdk.httpserver,jdk.unsupported WORKDIR /build COPY ./target/*.jar app.jar RUN jar -xf app.jar RUN mkdir /app && cp -r META-INF /app && cp -r BOOT-INF/classes/* /app FROM debian:stretch-slim COPY /jvm /jvm COPY /build/BOOT-INF/lib /lib COPY /app . ENTRYPOINT [ "/jvm/bin/java", "-cp", ".:/lib/*", "com.example.demo.DemoApplication" ]
At this point, you might encounter some failures running
docker build due to the
jlink command. There are two common things that can happen. First is that
doesn’t give you a definitive set of modules so you might see a
not found error like the
one below. Simply remove the offending module from the
Error: Module jdk8internals not found
Second thing that can happen is if you picked a stage 1 image that doesn’t have
installed. To fix, simply swap out the
--strip-debug argument for the alternative
Error: java.io.IOException: Cannot run program "objcopy": error=2, No such file or directory
In the change a variable test, only 12 kilobytes needed to move. That’s like a 99% reduction in bandwidth and a negligible amount of time needed to push the image.
7c76a9957d46: Pushing 0kB / 12.29kBcfa161abfacf: Pushing 0kB / 17.51MB 27eda33d0eae: Pushing 0kB / 50.26MB cde96efde55e: Pushing 0kB / 55.32MB
e7c728ed739b: Pushing 0kB / 12.29kBcfa161abfacf: Layer already exists 27eda33d0eae: Layer already exists cde96efde55e: Layer already exists
As a whole, it was a 71% reduction in total image size as a baseline
$ docker images|grep demo registry/demo before a90ae7133e69 2 hours ago 432MB registry/demo after 5165c278494d 2 minutes ago 123MB
Way, way faster.
Since I have vested interest in running this in my Pi cluster, I’ve spent some time
looking for compatible base images as well. This is where it gets confusing as there are
multiple architecture names that will run on the pi but it will refuse to pull any images
not marked for
linux/arm. I’ve also found that even though I’m running the 64-bit
kernel, images using the
linux/arm64 arch will not run. So I’ve scoped compatible images
down to those using
arm/v7 and adoptopenjdk
seems to have pretty good base images for stage 1 while arm32v7’s debian images work as a good stage 2 base.
FROM adoptopenjdk/openjdk14:armv7l-debian-jdk-14.0.1_7 ... FROM arm32v7/debian:stretch-slim ...
As you’ve noticed,
jdeps is not fool-proof. I’ve experienced a couple of scenarios
where the build process succeeds, only to fail at runtime due to a missing module. Hence,
you will need to test your app end-to-end to ensure that your optimised image is complete.
These are some common modules to add if your app does the following:
jdk.crypto.ec: If your app calls third-party REST APIs (that might use elliptic curve cryptography in their TLS certificates)
jdk.naming.dns: If your app connects to mongodb using
is hard to tell. Container technology moves so fast that this technique might become obsolete pretty quickly. The dream is that it’s something that happens automagically in a stable form. Did I miss out on any tips? Do drop me an @mention or DM.