Most Dockerfiles for python projects will have a line to install their python dependencies though.
COPY requirements.txt ./
RUN pip install -r requirements.txt
If you're building the image on a CI server, docker can't cache that step because the files won't match the cache due to timestamps/permissions/etc... The same is true for other developer's machines.
This is a problem if your requirements includes anything that uses C extensions, like mysql/postgresql libs or PIL.
To be clear, the only file in question is requirements.txt; Docker has no idea what files `pip install ...` is pulling and doesn't factor them into any kind of cache check. Beyond that, I didn't realize that timestamps were factored into the hash, or at least if they were, I would expect git or similar to set them "correctly" such that Docker does the right thing (I still think Docker's build tooling is insane, but I'm surprised that it breaks in this case)?
> For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.
Not been an issue for me using Gitlab CI runners, at least..? Which may be because Gitlab CI keeps working copies of your repos.
If the CI system keeps the source tree the Dockerfile is being built from around rather than removing it all after every build, it caches stuff as normal.
This is a problem if your requirements includes anything that uses C extensions, like mysql/postgresql libs or PIL.