git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Justin Tobler <jltobler@gmail.com>, git@vger.kernel.org
Cc: toon@iotcl.com
Subject: Re: [PATCH] archive: flush deflate stream until Z_STREAM_END
Date: Sun, 3 Aug 2025 11:52:19 +0200	[thread overview]
Message-ID: <5817e9ed-5b5c-4539-85ea-07c601c0ac7e@web.de> (raw)
In-Reply-To: <20250802220803.95137-1-jltobler@gmail.com>

On 8/3/25 12:08 AM, Justin Tobler wrote:
> In `archive-zip.c:write_zip_entry()` when using a stream as input for
> deflating a file, the call to `git_deflate()` with Z_FINISH always
> expects Z_STREAM_END to be returned. Per zlib documentation[1]:
> 
>         If the parameter flush is set to Z_FINISH, pending input is
>         processed, pending output is flushed and deflate returns with
>         Z_STREAM_END if there was enough output space. If deflate
>         returns with Z_OK or Z_BUF_ERROR, this function must be called
>         again with Z_FINISH and more output space (updated avail_out)
>         but no more input data, until it returns with Z_STREAM_END or an
>         error. After deflate has returned Z_STREAM_END, the only
>         possible operations on the stream are deflateReset or
>         deflateEnd.
> 
> In scenarios where the output buffer is not large enough to write all
> the compressed data, it is perfectly valid for the underlying
> `deflate()` to return Z_OK. Thus, expecting a single pass of `deflate()`
> here to always return Z_STREAM_END is a bug. Update the code to flush
> the deflate stream until Z_STREAM_END is returned.
> 
> [1]: https://zlib.net/manual.html

Good find.  I guess back then I thought making the output buffer twice
as big as the input buffer was sufficient, as deflateBound() guarantees
compression is possible with a much lower ratio.  But this doesn't take
the internal state of a stream into account.  Oof!

> Helped-by: Toon Claes <toon@iotcl.com>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
> 
> Greetings,
> 
> At GitLab, we received a report of a user getting the following error
> when generating a zip archive of their repository via git-archive(1):
> 
>         fatal: deflate error (0)
> 
> I've so far only been able to reproduce this issue in the chromium.git
> repository with a specific file:
> 
>         git clone --depth=1 https://github.com/chromium/chromium.git
>         cd chromium
>         git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
>                 ui/events/ozone/evdev/touch_filter/palm_model/onedevice_train_palm_detection_filter_inference.cc
> 
> In the above example, `core.bigFileThreshold` is set to a low value to
> cause more files to use a stream as input while being deflated. This is
> the codepath that produces the specific error.
> 
> I've tested the patch against this specific file, and it fixes the
> issue, but I'm uncertain how to reproduce and test this issue more
> generically. I'm open to suggestions if anyone has some ideas :)

Not sure how to fill up zlib's pending buffer most efficiently.
Reducing the size of the output buffer would make the bug easier to
trigger, though.

> Thanks,
> -Justin
> 
> ---
>  archive-zip.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/archive-zip.c b/archive-zip.c
> index df8866d5bae..29e7c9f5e3f 100644
> --- a/archive-zip.c
> +++ b/archive-zip.c
> @@ -492,14 +492,22 @@ static int write_zip_entry(struct archiver_args *args,
>  
>  		zstream.next_in = buf;
>  		zstream.avail_in = 0;
> -		result = git_deflate(&zstream, Z_FINISH);
> -		if (result != Z_STREAM_END)
> -			die("deflate error (%d)", result);
> +
> +		do {
> +			result = git_deflate(&zstream, Z_FINISH);
> +			if (result != Z_OK && result != Z_STREAM_END)
> +				die("deflate error (%d)", result);
> +
> +			out_len = zstream.next_out - compressed;
> +			if (out_len > 0) {
> +				write_or_die(1, compressed, out_len);
> +				compressed_size += out_len;
> +				zstream.next_out = compressed;
> +				zstream.avail_out = sizeof(compressed);
> +			}
> +		} while (result != Z_STREAM_END);
>  
>  		git_deflate_end(&zstream);
> -		out_len = zstream.next_out - compressed;
> -		write_or_die(1, compressed, out_len);
> -		compressed_size += out_len;

Looks good.  Could probably rolled into the first loop, but that just
would make this fix more complicated.

>  		zip_offset += compressed_size;
>  
>  		write_zip_data_desc(size, compressed_size, crc);


  reply	other threads:[~2025-08-03  9:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-02 22:08 [PATCH] archive: flush deflate stream until Z_STREAM_END Justin Tobler
2025-08-03  9:52 ` René Scharfe [this message]
2025-08-03 15:34   ` Justin Tobler
2025-08-04 17:00     ` Toon Claes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5817e9ed-5b5c-4539-85ea-07c601c0ac7e@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).