Container Image Size: Importance and Way to Optimize the Image Size

Vishwanath Dubey
6 min readJul 14, 2021

When designing container-based applications i.e. pure Docker based or Kubernates/OpenShift platform based applications, it is very important to design docker file that make sure it creates lightweight image or optimized image layers.

Docker Image Size — How does it Matter

It is obvious smaller Docker images take up less disk space, but image size and disk consumption are not necessarily directly correlated.

When all layers in an image are generated from the image build process, image size represents the amount of disk space occupied by that image [generated mean new layer created]. But this is very uncommon and not a good practice when containerized deployments are considered.

Examples are

1. There is an image having size of 500 GB image with 2 layers, one taking 20 MB, and the other one 480 MB.

2. The same Dockerfile is used for generating different images for different deployments, where the size of 20MB layer is a distinct layer generated while size of 480MB layer is a common layer or shared layer at the time docker build process. In this case if there are 10 images for different deployments, the total disk consumption is: *480 + 10*20 = 480 + 200 = 680 MB*.

3. If it is vice versa, in this case if there are 10 images for different deployments, the total disk consumption is: *20 + 10*480 = 20 + 4800 = 4820 MB*.

Practically, the ‘shared’ layer from the examples represents the image layers that seldom change (base image + top immutable layers). The layer that’s different in each image in the example above represents the frequently changing layers in images — the bottom layers that change in each build.

Hence total image size or based image size does not matter, the size of frequently changing layers only matters to the disk consumption.

While uploading (pushing) or downloading (pulling) Docker image, each layers of an image are transferred individually to the destination and if layers already exists on the destination, those layers are not transferred, hence size of data that needs to be transferred are not correlated with size of image. Hence data size to be transferred are the sum of size of layers those either do not exist on the destination or that have been changed. Here only first transfer takes the hit. In the following transfer of image, only the new (frequently changing) layers need to be transferred.

Hence in specially in CI/CD solution, make sure shared layers are cached on. This allows speeding up the pull-build-push cycle.

So in design consideration of disk consumption or speed of transferring the image, the base image size and reducing overall image size do not matter. the size of the frequently changing layers is only matter, hence make sure the frequently changing layers are smaller in size.

Then why having a smaller Docker image is generally desirable

Smaller Docker image is always desirable since smaller container image comes with lesser number of libraries inside container. This will have a smaller attack surface as compared to containers that are in bigger size.

E.g. In Container Registry’s built-in Vulnerability Scanning, only three “medium” vulnerabilities in the smaller container, compared with 16 critical and over 300 other vulnerabilities in the larger container.

Also since there are less dependencies and libraries installed, it also simplifies managing these libraries, keeping them up to date with operating system patches and etc.

Making the container image sizes small may not always be possible, but at least with small base image like Alpine (5 MB), building final image of an application greatly helps in eliminating risk on security and advantages regarding the overall maintenance.

Use multi-stage builds for leaner and more secure Docker images

[Multi-stage build was introduced in Docker Engine 17.05]

Multi-stage builds feature in Dockerfiles enables you to create smaller container images with better caching and smaller security footprint.

The release of Docker CE 17.05 (EE 17.06) introduced a new feature that helps to create thin Docker images (smaller docker image) by making it possible to divide the image building process into multiple stages. This allows the reuse of artifacts produced in one stage by another stage.

Below table describes the multi-stage — Dockerfile instructions where it is illustrated in two stages — Builder Image and Container (final image). The build image uses node:12.16.3 as base images

Builder image stage, it is aliased as build. At this stage, Angular build is performed where output generated from this is created in /usr/local/app/dist/angular-app [Here angular-app is the name of angular application]. The container size image is 1.5GB, since it includes all npm_modules as well libraries related with building the npm. The image generated from this stage used for building the final image which is in stage 2.

At stage2 i.e. container stage, it copies the build output generated in stage 1 (--from=build) to replace the default nginx contents. The size of final generated image is 80MB which is very small in size because it only includes only runtime libraries.

Option when multi-stage builds feature not available

Multi-stage build is not available with Docker 1.13 (pre-releases of docker 17.05) which comes with OpenShift 3.11. [Creating container image as part of DevOps on OCP 3.11 platform, multi-stage build will not be possible.]

Here it requires two docker files — one docker file to build the application and second docker file to slim-down container image containing only the application executable binaries.

Both docker files are managed by a shell script. it needs to build the first image, create a container from it to copy the artifact out, then build the second image. Both images take up room on your system.

Alpine has been shown to be a great way to get a small image for container images. But it does not always be true to use Alpine Linux distribution-based image.

Selecting a base image is quite similar to selecting a Linux distribution. Since container images are typically focused on application programming, you need to think about the programming languages and interpreters such as PHP, Ruby, Python, Bash, and even Node.js as well as the required dependencies for these programming languages such as glibc, libseccomp, zlib, openssl, libsasl, and tzdata. Even if your application is based on an interpreted language, it still relies on operating system libraries. The interpreted languages often rely on external implementations for things that are difficult to write or maintain like encryption, database drivers, or other complex algorithms.

The underlying libc system library being used for Alpine is muslwhile for Debian based distribution (e.g. Ubuntu) it is glibc. The libc implementation affects the performance of build and test run e.g. when using JVM based build and test run using Alpine, it takes more time (10–20%) than using in Debian based images.

Based on the benchmark available at etlabs, musl is slower in memory allocation.

So, in the context when container image is to be process intensive which requires many memory allocations e.g. Maven build and test runs (it internally uses JVM), there will be a performance differences with different Linux distribution based images.

--

--