If by "checking" you mean to examine the archive structure to determine whether ...

hinkley · on July 5, 2019

Have you ever done any work in this area? Because it sounds like you know what you’re talking about, except it’s all nonsense.

Zip format can be de/compressed progressively, which is one reason why it’s nice for HTTP transport encoding. The file format is decompressed one record at a time and many or most libraries can give you this as a stream, so it never has to hit disk or be “sent to dev/null”.

If you take responsibility for streaming the records to disk (trivial), then you can check the canonical path before writing, and any other filesystem sanity tests you want to do.

charonn0 · on July 5, 2019

Last year I implemented zip reading and zip writing in a hobby project of mine. I'm not an expert, but I know enough to write a working zip reader/writer.

> Zip format can be de/compressed progressively, which is one reason why it’s nice for HTTP transport encoding.

Do you mean HTTP transfer encoding? If so then it's not the zip archive format that's used, but rather the deflate compression algorithm (which zip also uses.)

> The file format is decompressed one record at a time

But not necessarily in the order they appear.

> many or most libraries can give you this as a stream, so it never has to hit disk or be “sent to dev/null”.

My point is that the compressed bytes have to be decompressed and checksummed in both extraction and checking, but after that the bytes may either be written or discarded.

> If you take responsibility for streaming the records to disk (trivial), then you can check the canonical path before writing, and any other filesystem sanity tests you want to do.

That's true but there's nothing wrong with the paths in this case.