Optimising Java Docker Images
May 2nd 2020, 5:24pm
8 min read
Containers have evolved very quickly from being an obscure technology to being confused with virtual machines to being mainstream production drivers today. Docker itself has been around for 7 years but many developers of the old world still do not understand it very much. This leads to sub-optimal implementations because people are forced to use it and take the easiest possible approach. In this post, we’ll explore how to optimise Docker images in the context of Java, using a Spring Boot example.
Build the app
Let’s start by generating a sample app from start.spring.io. We’ll pick the following options to create a bare bones web app:
- Maven Project
- Spring Boot 2.2.6
- Java 14
- Dependencies: Spring Web
Build the project using mvn clean package
and you will find a far JAR file weighing
about 17 megabytes. This standalone, self-executable format was made popular by Spring
Boot since it’s convenient to just deploy a single file and ignore app server
dependencies. Developers hate sharing app servers for good reason.
Create a simple Dockerfile
Within the project, create the simplest Dockerfile
FROM openjdk:14-slim
COPY ./target/*.jar app.jar
ENTRYPOINT [ "java", "-jar", "./app.jar" ]
Build the docker image and push it to your image registry.
docker build -t your-registry/demo .
docker push your-registry/demo
You’ll notice that it pushes 5 layers ranging from a miniscule 3 kilobytes to a much larger 336 megabytes. That first layer should have a size similar to your far JAR.
ea2b82629d91: Pushing 0kB / 17.6MB64dd9292f295: Pushing 0kB / 335.9MB
38ae49ae5249: Pushing 0kB / 3.6kB
2eaf0d58a380: Pushing 0kB / 8.8MB
c2adabaecedb: Pushing 0kB / 69.2MB
Docker layers
Docker works by caching layer data. If the registry already has an identical layer, the client skips pushing that layer. Thus, building images that reuse layers as much as possible reduces the net storage and bandwidth usage. If you don’t care about that, it also makes your pushes and pulls faster, which directly translates to app startup time in the Kubernetes world.
Why fat JARs are evil
Let’s make one tiny change to our code, maybe add a log statement, then rebuild the project and docker image again and push it.
aac3c35cde67: Pushing 0kB/ 17.6MB64dd9292f295: Layer already exists
38ae49ae5249: Layer already exists
2eaf0d58a380: Layer already exists
c2adabaecedb: Layer already exists
Hold on.. why did I have to spend time and use another 17.6 megabytes of bandwidth just for a single line of code that changed? Because fat JAR. While this number doesn’t seem very large by modern standards, it will grow as large as the number of dependencies your project requires. Imagine the cost of changing a single variable on a gigabyte-sized layer. There are people for whom this scenario needs no imagination.
Why standard base images are evil
If you do a docker images|grep demo
, you’ll find the image you just built weighs a hefty
432 megabytes while your fat JAR only contributed 17. The rest of that bulk is from
the openjdk:14-slim
base image (slim being.. relative).
When the Java world finally moved on from Java 8, the concept of Java modules was
introduced. It’s essentially an abstraction above packages so a module represents a bunch
of packages. Nothing fancy. What’s interesting is the introduction of the jlink
utility,
which lets you generate your own JVM based on what modules you need. This removes all the
unnecessary Java APIs you do not use into a truly slim build. This begs the question of
“How do I know which modules I use?”, which is answered by the jdeps
utility which can
analyse your compiled code.
Optimal strategy
Now that we have the background sorted, let’s work out what needs to be done:
- Write your code and build the far JAR as usual
- Use
jdeps
to analyse what modules your app requires - Build a two-stage
Dockerfile
- In Stage 1:
- Use the standard JDK base image
- Run
jlink
to build an optimised JVM - Unpack the far JAR
- Organise the unpacked artifacts between dependencies and app code
- In Stage 2:
- Use a tiny base image
- Move the optimised JVM over
- Move the unpacked dependencies over
- Move the app code over
- Launch your main class using the optimised JVM
In practice: module dependency analysis
After building your fat JAR, unpack it and run jdeps
on recursive mode against the
dependencies and the JAR itself. Some dependencies like log4j-api
are multi-release so
you might need to specify a release number based on your JDK. We’re using 14 here.
$ mkdir analysis && cd $_
$ cp ../target/*.jar x.jar && jar -xf x.jar
$ jdeps -s --multi-release=14 --recursive -cp BOOT-INF/lib/* x.jar
classmate-1.5.1.jar -> java.base
com.fasterxml.jackson.annotation -> java.base
com.fasterxml.jackson.core -> java.base
com.fasterxml.jackson.databind -> com.fasterxml.jackson.annotation
...
This gives you a giant map of which dependency needs which module, which are granular
details we’re not interested in. What we need is a comma-separated list of module names to
feed our jlink
command later. The --print-module-deps
argument does that. We will also
ignore split package warnings by grep
ing them out.
$ jdeps --ignore-missing-deps --print-module-deps --multi-release=14 --recursive -cp BOOT-INF/lib/* x.jar|grep -v Warning:
java.base,java.desktop,java.instrument,java.logging,java.management,java.management.rmi,java.naming,java.prefs,java.rmi,java.scripting,java.security.jgss,java.sql,java.xml,jdk8internals,jdk.httpserver,jdk.unsupported
Much better. We’ll save that list for later.
In practice: multi-stage Docker build
Using the same JDK base image for stage 1, we run jlink
, feeding in the modules list
from the previous step and specifying an output directory in /jvm
. We then move the fat
JAR over, unpack it and create a /app
directory to group metadata and app code
(META-INF
and BOOT-INF/classes
). Dependencies remain in BOOT-INF/lib
.
In stage 2, we use a tiny base image debian:stretch-slim
and use 3 COPY
statements to
divide these parts into separate layers: the JVM, the dependencies and the app code. We
then launch the app using the JVM, including the dependencies in the classpath.
FROM openjdk:14-slim
RUN jlink --output /jvm --no-header-files --no-man-pages --compress=2 \
--strip-debug --add-modules java.base,java.desktop,jdk8internals,\
java.instrument,java.logging,java.management,java.management.rmi,\
java.naming,java.prefs,java.rmi,java.scripting,java.security.jgss,\
java.sql,java.xml,jdk.httpserver,jdk.unsupported
WORKDIR /build
COPY ./target/*.jar app.jar
RUN jar -xf app.jar
RUN mkdir /app && cp -r META-INF /app && cp -r BOOT-INF/classes/* /app
FROM debian:stretch-slim
COPY /jvm /jvm
COPY /build/BOOT-INF/lib /lib
COPY /app .
ENTRYPOINT [ "/jvm/bin/java", "-cp", ".:/lib/*", "com.example.demo.DemoApplication" ]
At this point, you might encounter some failures running docker build
due to the
jlink
command. There are two common things that can happen. First is that jdeps
doesn’t give you a definitive set of modules so you might see a not found
error like the
one below. Simply remove the offending module from the --add-modules
list.
Error: Module jdk8internals not found
Second thing that can happen is if you picked a stage 1 image that doesn’t have objcopy
installed. To fix, simply swap out the --strip-debug
argument for the alternative
--strip-java-debug-attributes
instead.
Error: java.io.IOException: Cannot run program "objcopy": error=2, No such file or directory
What has changed?
In the change a variable test, only 12 kilobytes needed to move. That’s like a 99% reduction in bandwidth and a negligible amount of time needed to push the image.
7c76a9957d46: Pushing 0kB / 12.29kBcfa161abfacf: Pushing 0kB / 17.51MB
27eda33d0eae: Pushing 0kB / 50.26MB
cde96efde55e: Pushing 0kB / 55.32MB
e7c728ed739b: Pushing 0kB / 12.29kBcfa161abfacf: Layer already exists
27eda33d0eae: Layer already exists
cde96efde55e: Layer already exists
As a whole, it was a 71% reduction in total image size as a baseline
$ docker images|grep demo
registry/demo before a90ae7133e69 2 hours ago 432MB
registry/demo after 5165c278494d 2 minutes ago 123MB
Way, way faster.
ARM builds
Since I have vested interest in running this in my Pi cluster, I’ve spent some time
looking for compatible base images as well. This is where it gets confusing as there are
multiple architecture names that will run on the pi but it will refuse to pull any images
not marked for linux/arm
. I’ve also found that even though I’m running the 64-bit
kernel, images using the linux/arm64
arch will not run. So I’ve scoped compatible images
down to those using arm/v7
and adoptopenjdk
seems to have pretty good base images for stage 1 while arm32v7’s debian images work as a good stage 2 base.
FROM adoptopenjdk/openjdk14:armv7l-debian-jdk-14.0.1_7
...
FROM arm32v7/debian:stretch-slim
...
Caveats
As you’ve noticed,jdeps
is not fool-proof. I’ve experienced a couple of scenarios
where the build process succeeds, only to fail at runtime due to a missing module. Hence,
you will need to test your app end-to-end to ensure that your optimised image is complete.
These are some common modules to add if your app does the following:
jdk.crypto.ec
: If your app calls third-party REST APIs (that might use elliptic curve cryptography in their TLS certificates)jdk.naming.dns
: If your app connects to mongodb usingmongodb+srv://
The future..
is hard to tell. Container technology moves so fast that this technique might become obsolete pretty quickly. The dream is that it’s something that happens automagically in a stable form. Did I miss out on any tips? Do drop me an @mention or DM.