From: "René Scharfe" <l.s.r@web.de>
To: Justin Tobler <jltobler@gmail.com>, git@vger.kernel.org
Cc: toon@iotcl.com
Subject: Re: [PATCH] archive: flush deflate stream until Z_STREAM_END
Date: Sun, 3 Aug 2025 11:52:19 +0200 [thread overview]
Message-ID: <5817e9ed-5b5c-4539-85ea-07c601c0ac7e@web.de> (raw)
In-Reply-To: <20250802220803.95137-1-jltobler@gmail.com>
On 8/3/25 12:08 AM, Justin Tobler wrote:
> In `archive-zip.c:write_zip_entry()` when using a stream as input for
> deflating a file, the call to `git_deflate()` with Z_FINISH always
> expects Z_STREAM_END to be returned. Per zlib documentation[1]:
>
> If the parameter flush is set to Z_FINISH, pending input is
> processed, pending output is flushed and deflate returns with
> Z_STREAM_END if there was enough output space. If deflate
> returns with Z_OK or Z_BUF_ERROR, this function must be called
> again with Z_FINISH and more output space (updated avail_out)
> but no more input data, until it returns with Z_STREAM_END or an
> error. After deflate has returned Z_STREAM_END, the only
> possible operations on the stream are deflateReset or
> deflateEnd.
>
> In scenarios where the output buffer is not large enough to write all
> the compressed data, it is perfectly valid for the underlying
> `deflate()` to return Z_OK. Thus, expecting a single pass of `deflate()`
> here to always return Z_STREAM_END is a bug. Update the code to flush
> the deflate stream until Z_STREAM_END is returned.
>
> [1]: https://zlib.net/manual.html
Good find. I guess back then I thought making the output buffer twice
as big as the input buffer was sufficient, as deflateBound() guarantees
compression is possible with a much lower ratio. But this doesn't take
the internal state of a stream into account. Oof!
> Helped-by: Toon Claes <toon@iotcl.com>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
>
> Greetings,
>
> At GitLab, we received a report of a user getting the following error
> when generating a zip archive of their repository via git-archive(1):
>
> fatal: deflate error (0)
>
> I've so far only been able to reproduce this issue in the chromium.git
> repository with a specific file:
>
> git clone --depth=1 https://github.com/chromium/chromium.git
> cd chromium
> git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
> ui/events/ozone/evdev/touch_filter/palm_model/onedevice_train_palm_detection_filter_inference.cc
>
> In the above example, `core.bigFileThreshold` is set to a low value to
> cause more files to use a stream as input while being deflated. This is
> the codepath that produces the specific error.
>
> I've tested the patch against this specific file, and it fixes the
> issue, but I'm uncertain how to reproduce and test this issue more
> generically. I'm open to suggestions if anyone has some ideas :)
Not sure how to fill up zlib's pending buffer most efficiently.
Reducing the size of the output buffer would make the bug easier to
trigger, though.
> Thanks,
> -Justin
>
> ---
> archive-zip.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/archive-zip.c b/archive-zip.c
> index df8866d5bae..29e7c9f5e3f 100644
> --- a/archive-zip.c
> +++ b/archive-zip.c
> @@ -492,14 +492,22 @@ static int write_zip_entry(struct archiver_args *args,
>
> zstream.next_in = buf;
> zstream.avail_in = 0;
> - result = git_deflate(&zstream, Z_FINISH);
> - if (result != Z_STREAM_END)
> - die("deflate error (%d)", result);
> +
> + do {
> + result = git_deflate(&zstream, Z_FINISH);
> + if (result != Z_OK && result != Z_STREAM_END)
> + die("deflate error (%d)", result);
> +
> + out_len = zstream.next_out - compressed;
> + if (out_len > 0) {
> + write_or_die(1, compressed, out_len);
> + compressed_size += out_len;
> + zstream.next_out = compressed;
> + zstream.avail_out = sizeof(compressed);
> + }
> + } while (result != Z_STREAM_END);
>
> git_deflate_end(&zstream);
> - out_len = zstream.next_out - compressed;
> - write_or_die(1, compressed, out_len);
> - compressed_size += out_len;
Looks good. Could probably rolled into the first loop, but that just
would make this fix more complicated.
> zip_offset += compressed_size;
>
> write_zip_data_desc(size, compressed_size, crc);
next prev parent reply other threads:[~2025-08-03 9:52 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-02 22:08 [PATCH] archive: flush deflate stream until Z_STREAM_END Justin Tobler
2025-08-03 9:52 ` René Scharfe [this message]
2025-08-03 15:34 ` Justin Tobler
2025-08-04 17:00 ` Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5817e9ed-5b5c-4539-85ea-07c601c0ac7e@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).