#docker (2023-09)
All things docker
Archive: https://archive.sweetops.com/docker/
2023-09-24
How can I reduce the size of the image in this Dockerfile? It was generated by fly.io. The image size is 3.4 GB Dockerfile
# syntax = docker/dockerfile:1
# Make sure RUBY_VERSION matches the Ruby version in .ruby-version and Gemfile
ARG RUBY_VERSION=3.2.0
FROM ruby:$RUBY_VERSION-slim as base
LABEL fly_launch_runtime="rails"
# Rails app lives here
WORKDIR /rails
ARG RAILS_MASTER_KEY
# Set production environment
ENV RAILS_ENV="production" \
BUNDLE_WITHOUT="development:test" \
BUNDLE_DEPLOYMENT="1" \
RAILS_MASTER_KEY=${RAILS_MASTER_KEY}
# Update gems and bundler
RUN gem update --system --no-document && \
gem install -N bundler
# Install packages needed to install nodejs
RUN apt-get update -qq && \
apt-get install --no-install-recommends -y curl && \
rm -rf /var/lib/apt/lists /var/cache/apt/archives
# Install Node.js
ARG NODE_VERSION=14.21.1
ENV PATH=/usr/local/node/bin:$PATH
RUN curl -sL <https://github.com/nodenv/node-build/archive/master.tar.gz> | tar xz -C /tmp/ && \
/tmp/node-build-master/bin/node-build "${NODE_VERSION}" /usr/local/node && \
rm -rf /tmp/node-build-master
# Throw-away build stage to reduce size of final image
FROM base as build
# Install packages needed to build gems and node modules
RUN apt-get update -qq && \
apt-get install --no-install-recommends -y build-essential git libpq-dev libvips node-gyp pkg-config python-is-python3
# Install yarn
ARG YARN_VERSION=1.22.19
# node modules are installed in root of the image
ENV NODE_ENV="production" \
PREFIX=/usr/local \
PATH=/usr/local/node/bin:$PATH
RUN npm install -g yarn@$YARN_VERSION && \
rm -rf /usr/local/node/lib/node_modules/npm && \
yarn --version && \
apt-get remove -y node-gyp pkg-config && \
apt-get autoremove -y && \
rm -rf /tmp/npm* /tmp/gyp /tmp/node-* /tmp/node_modules/npm* /var/lib/apt/lists/* \
/usr/local/node/lib/node_modules/npm
# Build options
ENV PATH="/usr/local/node/bin:$PATH"
# Install application gems
COPY --link Gemfile Gemfile.lock ./
RUN bundle install && \
bundle exec bootsnap precompile --gemfile && \
rm -rf ~/.bundle/ $BUNDLE_PATH/ruby/*/cache $BUNDLE_PATH/ruby/*/bundler/gems/*/.git
# Install node modules
COPY --link package.json yarn.lock ./
RUN yarn install --frozen-lockfile
# Copy application code
COPY --link . .
# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/
# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN SECRET_KEY_BASE=DUMMY ./bin/rails assets:precompile
# Final stage for app image
FROM base
# Install packages needed for deployment
RUN apt-get update -qq && \
apt-get install --no-install-recommends -y curl imagemagick libvips postgresql-client mtr iputils-ping && \
rm -rf /var/lib/apt/lists /var/cache/apt/archives
# Copy built artifacts: gems, application
COPY --from=build /usr/local/bundle /usr/local/bundle
COPY --from=build /rails /rails
# Run and own only the runtime files as a non-root user for security
RUN useradd rails --create-home --shell /bin/bash && \
chown -R rails:rails db log tmp
USER rails:rails
# Deployment options
ENV RAILS_LOG_TO_STDOUT="1" \
RAILS_SERVE_STATIC_FILES="true"
# Entrypoint prepares the database.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]
# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
.dockerignore
# See <https://docs.docker.com/engine/reference/builder/#dockerignore-file> for more about ignoring files.
# Ignore git directory.
/.git/
# Ignore bundler config.
/.bundle
# Ignore all default key files.
/config/master.key
/config/credentials/*.key
# Ignore all environment files.
/.env*
!/.env.example
# Ignore all logfiles and tempfiles.
/log/*
/tmp/*
!/log/.keep
!/tmp/.keep
# Ignore all cache files.
/tmp/cache/*
# Ignore pidfiles, but keep the directory.
/tmp/pids/*
!/tmp/pids/
!/tmp/pids/.keep
# Ignore storage (uploaded files in development and any SQLite databases).
/storage/*
!/storage/.keep
/tmp/storage/*
!/tmp/storage/
!/tmp/storage/.keep
# Ignore assets.
/node_modules/
/app/assets/builds/*
!/app/assets/builds/.keep
/public/assets
# Ignore data_files/
/data_files/*
!/data_files/.keep
# Ignore coverage
/coverage/*
# Ignore spec files
/spec/*
!/spec/.keep
what is layer size docker history <IMAGE_NAME>
?
I got it down to 1.2GB /tmp/storage
was being ignored in the ignore file
➜ wevote git:(fly-io-config) ✗ docker history wevote
IMAGE CREATED CREATED BY SIZE COMMENT
f311122cf1d3 18 seconds ago EXPOSE map[3000/tcp:{}] 0B buildkit.dockerfile.v0
<missing> 18 seconds ago ENTRYPOINT ["/rails/bin/docker-entrypoint"] 0B buildkit.dockerfile.v0
<missing> 18 seconds ago ENV RAILS_LOG_TO_STDOUT=1 RAILS_SERVE_STATIC… 0B buildkit.dockerfile.v0
<missing> 18 seconds ago USER rails:rails 0B buildkit.dockerfile.v0
<missing> 18 seconds ago RUN /bin/sh -c useradd rails --create-home -… 59.5MB buildkit.dockerfile.v0
<missing> 32 seconds ago COPY /rails /rails # buildkit 673MB buildkit.dockerfile.v0
<missing> 39 seconds ago COPY /usr/local/bundle /usr/local/bundle # b… 7.9MB buildkit.dockerfile.v0
<missing> 11 minutes ago RUN /bin/sh -c apt-get update -qq && apt… 170MB buildkit.dockerfile.v0
cool, looks like there is nowhere to shrink
A tool for exploring each layer in a docker image
Containers usually break up the application logic. Your container is so massive because it’s doing more than one thing. You shouldn’t have ruby, node, and python all a single container. Those should be different containers, with different CMDs or ENTRYPOINTs, and they all should be intertwined via a docker-compose file.
This will help reduce the overall size of the image because you’ll be pulling in a cached version of the ruby/python/etc image, instead of building the entire image from scratch.
I highly recommend reading more on the best practices surrounding Docker. I used to shoot myself in the foot in similar ways before I got used to how Docker intended things to be done. https://docs.docker.com/develop/dev-best-practices/
Rules of thumb for making your life easier as a Docker application developer
You make good points, and it made me look at the dockerfile again. It’s a multi-stage docker image, but the final stage is derived from base
where node and some ruby gems are being installed. I’d do two things: combine the base
and build
stages into one, and then for the final stage use FROM ruby:$RUBY_VERSION-slim as base
instead of FROM base
. you may need to tweak that final image depending on what you need installed from the base
stage, but I think that’d get you to a better spot.
Yea, if he’s a rush to get this out the door, swapping to that and then adding in more COPY commands isn’t the worst idea.
Another thing to look at is making sure that your Docker layers actually are working with the Docker cache. https://docs.docker.com/build/cache/
That may or may not not help with the overall size, but will help with the build time.
Improve your build speed with effective use of the build cache
If python and node are only needed to build stuff, it may not need any other changes. If needing to run python and node stuff, then I agree that it probably needs to be split up. I only say probably because nothing is ever as simple as we want it to be.
I don’t see how I could get rid of node. It’s required to compile assets during the build. I’m not sure why python is there, probably a dependency of a gem. I’ll look into it
That’s why I was suggesting you combine the base
and build
stages and change what the final stage is based on. You only need node for building assets, but since it’s currently being installed in the base
stage and your final image is based on the base
stage, you’ll have node in your final image - but you don’t need it there.
@Antarr Byrd - you’d do something like this:
- make a node container
- in the image of that container, run your asset compilation commands, and generate these in a dir mounted between your container and local machine
- put both the ruby and the node container into a docker-compose file
- use
depends_on
to make the ruby container spin up after the node one - mount the assets generated on your local machine into your ruby container
- do the ruby things in the ruby container
That’s seems like a much more complicated approach than using a multi-stage image where each stage has only exactly what it needs?
Yes, though it follows Docker best practices, which helps the image stand up faster, reduces the chance of errors thanks to resource mgmt, and is much easier to debug.
I do think for now, he should follow your solution bradym, as it will get his product out the door quickly. Tho, sometime probably soon if my experience is worth anything, he’s gonna hit a very bizarre problem (probably on someone’s local machine as they try to stand it up for localdev purposes) and it will be related to how this just isn’t quite correct.
This is one of those, “you can get away with this for now, but it has merely bought you some time” kind of things. I used to do remarkably similar architectures, and I got burned very hard thanks to it.
We’re gonna have to agree to disagree here. We use multi-stage dockerfiles in several places and have not run into any issues. It probably depends on how you run your local environment.
I hope that didn’t sound condescending or dismissive of your experience, that was not at all intended if it did.
Thanks both of you
No one uses docker locally so probably won’t have to deal with that for a while
I have nothing against multi stage dockerfiles. If you try to spin up a container that is multiple gigabytes in size because you’re breaking best practices and putting multiple technologies into a singular image, it puts you at a higher likelihood of erring out. That’s the specific part of this I am trying to warn about. Docker themselves says not do this.
@Antarr Byrd yea, I say go with bradym’s solution then. Unless you’re working with serverless or some other system that will charge you based on time, or local machines, this is fine.
I agree completely on avoiding multiple gigabyte sized images.
2023-09-25
Hey folks, Docker recently removed the buildinfo
metadata from buildx outputs because they were moving stuff like that to attestations
and provenance
outputs. Has anyone been able to actually get that data back out, though? I recently went about changing our build action to try pulling that data and I’m only seeing partial results. In particular, we have a multi-platform build that should output both amd64
and arm64
outputs. The output below shows the former, but the latter renders as just unknown
:
{
"mediaType": "application/vnd.oci.image.index.v1+json",
"schemaVersion": 2,
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:819b2e9960f85e1667e924ebba1cd9e0bca7924a7afdaf5a194630bb2ed1e750",
"size": 1800,
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:8de8b55a1b364eacb4b2b89e839f52120492a4c6c05110e982b3f3e9b663eb51",
"size": 566,
"annotations": {
"vnd.docker.reference.digest": "sha256:819b2e9960f85e1667e924ebba1cd9e0bca7924a7afdaf5a194630bb2ed1e750",
"vnd.docker.reference.type": "attestation-manifest"
},
"platform": {
"architecture": "unknown",
"os": "unknown"
}
}
]
}
I just joined this slack so excuse the delayed reply. I love these tests in the code you linked. So few teams seem to test their Docker images.
If I can find time these next few days, I am gonna kick this around. Tho do respond if you’ve fixed it already haha
I do like that the point of attestations and provenance is to help secure containers using standardized metadata outputs. I regret that in practice the tooling is a barrier. More specifically, you can easily get them to work if you don’t need multi-platform builds. It’s the intersection of multi-platform and auditing which breaks things down (one seems to not concern itself with the needs of the other). As such, I’ve not made much headway on this issue, but I’ve found *a lot of ways to -not- get the attestations to actually render properly with multi-platform builds*… which is more and more convincing me that maybe they are working properly and somehow the docker images are actually not built the way latest buildx wants them to be built.
my next plan is to simply punt on multi-platform builds, and instead have the machine build both imagines in parallel. It’s less efficient, sure, but either you have correct or you have efficient… choose one.
oh! See above, So basically, the amd64 (the ec2 instance) comes through loud and clear. But the arm64 is listed as unknown
. It must be a part of how I think the system uses qemu…
oop, I replied before I saw you edited, but yeah
hahaha you caught my deleted comment oh no
out of curiosity, are you using docker desktop? I really wanted to replicate what you were doing though my linux box is giving me grief when trying to run buildx
gotcha. It’s all through github action workflows. The one in question is here
name: Test docker multi-platform
on:
# # Uncomment when test added first time to register workflow and comment it back after workflow would be registered
# #
# # Added pull_request to register workflow from the PR.
# # Read more <https://stackoverflow.com/questions/63362126/github-actions-how-to-run-a-workflow-created-on-a-non-master-branch-from-the-wo>
# pull_request: {}
workflow_dispatch: {}
jobs:
setup:
runs-on: ubuntu-latest
steps:
- name: Setup
run: echo "Do setup"
test:
runs-on: ubuntu-latest
needs: [setup]
steps:
- name: Checkout
uses: actions/checkout@v3
- uses: ./
id: current
with:
workdir: ./test/custom
file: Dockerfile_multi_platform
organization: ${{ github.event.repository.owner.login }}
repository: ${{ github.event.repository.name }}
registry: registry.hub.docker.com
login: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
platforms: linux/amd64,linux/arm64
- uses: nick-fields/assert-action@v1
with:
expected: 'registry.hub.docker.com/cloudposse/github-action-docker-build-push'
actual: ${{ steps.current.outputs.image }}
- uses: nick-fields/assert-action@v1
with:
expected: sha-${{ github.sha }}
actual: ${{ steps.current.outputs.tag }}
- uses: nick-fields/assert-action@v1
with:
expected: 'containerimage.buildinfo/linux/amd64'
actual: ${{ steps.current.outputs.metadata }}
comparison: contains
- uses: nick-fields/assert-action@v1
with:
expected: 'containerimage.buildinfo/linux/arm64'
actual: ${{ steps.current.outputs.metadata }}
comparison: contains
teardown:
runs-on: ubuntu-latest
needs: [test]
if: ${{ always() }}
steps:
- name: Tear down
run: echo "Do Tear down"
testing locally has been…. inconclusive. I think I can torture the toolchain into giving me what I want, but I’m trying to avoid that
buildx hasn’t been too much of a hassle for me on either linux or macos… but I admit I fart around with docker quite a lot. If you have any opaque errors, I can try to give some guesses on the cause/root of the issue
Understandable. Once you get into “things Docker Engine doesn’t have built-in for free” it gets daunting.
I can’t help but wonder if docker buildx build --attest=type=provenance
gets you what you need, or if you could just build each architecture individually and assign the build the correct label via the --label
parameter.
Oh, you’re in this channel doing multi-platform builds. You don’t gotta worry about showing the nerd card off. I personally have never gotten to work with multi-platform stuff and find this fascinating
well, at the risk of implicitly chanting “one of us. one of us.”, holler if you need to do more buildx. I wouldn’t describe it ‘fascinating’ as much as ‘training a process to run through the flaming json hoops of local and remote ci’
I think I might have a solution, assuming --push
also adds the image to docker images
. If this doesn’t work tho, I’ll kick ghcr.io around. I’ve never had an excuse to use one outside of my old job, but hey, this can suffice.
this is some stuff to at least spin up one of the platforms provided locally. It seems --push
needs to be used if you want to test images on more than just the host machine.
we build it up, two platforms…
docker buildx build --output type=local,dest=. -t test:latest --platform linux/amd64,linux/386 --file images/api.Dockerfile .
[+] Building 7.6s (16/16) FINISHED docker-container:wonderful_lamport
=> [internal] load build definition from api.Dockerfile 0.0s
=> => transferring dockerfile: 347B 0.0s
=> [linux/amd64 internal] load metadata for docker.io/library/python:3.11 0.1s
=> [linux/386 internal] load metadata for docker.io/library/python:3.11 0.1s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [linux/386 1/5] FROM docker.io/library/python:3.11@sha256:2e376990a11f1c1e03796d08db0e99c36eadb4bb6491372b227f1e53c3482914 0.0s
=> => resolve docker.io/library/python:3.11@sha256:2e376990a11f1c1e03796d08db0e99c36eadb4bb6491372b227f1e53c3482914 0.0s
=> [linux/amd64 1/5] FROM docker.io/library/python:3.11@sha256:2e376990a11f1c1e03796d08db0e99c36eadb4bb6491372b227f1e53c3482914 0.0s
=> => resolve docker.io/library/python:3.11@sha256:2e376990a11f1c1e03796d08db0e99c36eadb4bb6491372b227f1e53c3482914 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 430B 0.0s
=> CACHED [linux/amd64 2/5] WORKDIR /api 0.0s
=> CACHED [linux/amd64 3/5] COPY ./api/requirements.txt requirements.txt 0.0s
=> CACHED [linux/amd64 4/5] RUN pip install --requirement requirements.txt 0.0s
=> [linux/amd64 5/5] COPY ./api /api 0.0s
=> CACHED [linux/386 2/5] WORKDIR /api 0.0s
=> CACHED [linux/386 3/5] COPY ./api/requirements.txt requirements.txt 0.0s
=> CACHED [linux/386 4/5] RUN pip install --requirement requirements.txt 0.0s
=> [linux/386 5/5] COPY ./api /api 0.0s
=> exporting to client directory 7.5s
=> => copying files linux/386 1.00GB 7.4s
=> => copying files linux/amd64 1.03GB
we load presumably the linux/amd64 image into docker images
…
kim@kimtalkstech:~/Personal/spa-with-rest-api-container-stack $ docker buildx build -t test:latest --load --file images/api.Dockerfile .
[+] Building 13.1s (11/11) FINISHED docker-container:wonderful_lamport
=> [internal] load build definition from api.Dockerfile 0.0s
=> => transferring dockerfile: 347B 0.0s
=> [internal] load metadata for docker.io/library/python:3.11 5.1s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [linux/amd64 1/5] FROM docker.io/library/python:3.11@sha256:2e376990a11f1c1e03796d08db0e99c36eadb4bb6491372b227f1e53c3482914 0.0s
=> => resolve docker.io/library/python:3.11@sha256:2e376990a11f1c1e03796d08db0e99c36eadb4bb6491372b227f1e53c3482914 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 95B 0.0s
=> CACHED [linux/amd64 2/5] WORKDIR /api 0.0s
=> CACHED [3/5] COPY ./api/requirements.txt requirements.txt 0.0s
=> CACHED [4/5] RUN pip install --requirement requirements.txt 0.0s
=> CACHED [5/5] COPY ./api /api 0.0s
=> exporting to docker image format 7.9s
=> => exporting layers 0.0s
=> => exporting manifest sha256:8fcd494099e2d8c53a589c76005454ac419a3811ce40e91eea41b7b4b3405412 0.0s
=> => exporting config sha256:71e7aa29f61f291733436bae2f84643878b84ff81abb396d83b28db6406baf95 0.0s
=> => sending tarball 7.9s
=> importing to docker 6.2s
kim@kimtalkstech:~/Personal/spa-with-rest-api-container-stack $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
test latest 71e7aa29f61f About a minute ago 1.04GB
techlab-project latest 623d7f848cd6 27 hours ago 329MB
moby/buildkit buildx-stable-1 9291fad3b41c 5 weeks ago 172MB
and then we inspect that image to see what architecture it’s using, and fetch it via good ol’ jq
:
$ docker inspect test | jq ".[] | .Architecture"
"amd64"
were you able to get both amd64 and arm64 at the same time from that? I think that’s where the problem arises
I don’t have a docker hub or ghrc setup to play with, but I did some digging and realized you guys have a public docker hub. Scratch everything I wrote above. I can just pull data from your own container registry stuff. Hello world.
Ok, so, looking at https://hub.docker.com/r/cloudposse/github-action-docker-build-push/tags?page=1, I see that there’s a tag feat-caching-support-no-cache
that successfully created two diff architectures.
Looking through the docs over at Docker, there’s this https://docs.docker.com/build/attestations/slsa-definitions/#buildconfig.
So if we do a little something like this:
docker buildx imagetools inspect cloudposse/github-action-docker-build-push:feat-caching-support-no-cache --format "{{json .Provenance}}" | jq ".[] | .SLSA.buildConfig"
We can pull in that platform blurb successfully for the two architectures.
hopefully that’s more inline with what you’re trying to do.
SLSA definitions BuildKit supports the creation of SLSA Provenance for builds that it runs. The provenance format generated by BuildKit is defined by the SLSA Provenance formatopen_in_new. This page describes how BuildKit populate each field, and whether the field gets included when you generate attestations mode=min and mode=max. builder.id Corresponds to SLSA builder.idopen_in_new. Included with mode=min and mode=max. The builder.id field is set to the URL of the build, if available.
If you run that command on your most recent stuff and the platform is once again blank, you may have to set --attest type=provenance,mode=max
https://docs.docker.com/build/attestations/slsa-provenance/#create-provenance-attestations
Provenance build attestations describe how and where your image was built.