git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Toon Claes <toon@iotcl.com>
To: "Justin Tobler" <jltobler@gmail.com>, "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] archive: flush deflate stream until Z_STREAM_END
Date: Mon, 04 Aug 2025 19:00:53 +0200	[thread overview]
Message-ID: <87cy9buim2.fsf@iotcl.com> (raw)
In-Reply-To: <t4rkigpst2h42r7iouwli7lj74lydz6nmehojzajt6impibpui@kvgu7pfkuf6k>

Justin Tobler <jltobler@gmail.com> writes:

> On 25/08/03 11:52AM, René Scharfe wrote:
>> On 8/3/25 12:08 AM, Justin Tobler wrote:
>> > I've tested the patch against this specific file, and it fixes the
>> > issue, but I'm uncertain how to reproduce and test this issue more
>> > generically. I'm open to suggestions if anyone has some ideas :)
>> 
>> Not sure how to fill up zlib's pending buffer most efficiently.
>> Reducing the size of the output buffer would make the bug easier to
>> trigger, though.

Yes, in my local testing I also have been reducing the size of the
output buffer.

> Ya, I was able to trigger this issue more frequently by making the
> output buffer smaller than the input buffer. I was really hoping though
> to find a way to reproduce this without code changes so we could add a
> test. Not sure if that is really feasible though in this case.

I think it's really nice you were able find a repository that reproduces
this issue. Having an existing example in the "wild" is better proof to
verify this fix work.

>> > I've so far only been able to reproduce this issue in the chromium.git
>> > repository with a specific file:
>> >
>> >         git clone --depth=1 https://github.com/chromium/chromium.git
>> >         cd chromium
>> >         git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
>> >                 ui/events/ozone/evdev/touch_filter/palm_model/onedevice_train_palm_detection_filter_inference.cc

Weird, for me this file didn't trigger the error, but the following file
did:

    chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js

But also with this file, I can confirm this fix works.

>> > diff --git a/archive-zip.c b/archive-zip.c
>> > index df8866d5bae..29e7c9f5e3f 100644
>> > --- a/archive-zip.c
>> > +++ b/archive-zip.c
>> > @@ -492,14 +492,22 @@ static int write_zip_entry(struct archiver_args *args,
>> >  
>> >  		zstream.next_in = buf;
>> >  		zstream.avail_in = 0;
>> > -		result = git_deflate(&zstream, Z_FINISH);
>> > -		if (result != Z_STREAM_END)
>> > -			die("deflate error (%d)", result);
>> > +
>> > +		do {
>> > +			result = git_deflate(&zstream, Z_FINISH);
>> > +			if (result != Z_OK && result != Z_STREAM_END)
>> > +				die("deflate error (%d)", result);
>> > +
>> > +			out_len = zstream.next_out - compressed;
>> > +			if (out_len > 0) {
>> > +				write_or_die(1, compressed, out_len);
>> > +				compressed_size += out_len;
>> > +				zstream.next_out = compressed;
>> > +				zstream.avail_out = sizeof(compressed);
>> > +			}
>> > +		} while (result != Z_STREAM_END);
>> >  
>> >  		git_deflate_end(&zstream);
>> > -		out_len = zstream.next_out - compressed;
>> > -		write_or_die(1, compressed, out_len);
>> > -		compressed_size += out_len;
>> 
>> Looks good.  Could probably rolled into the first loop, but that just
>> would make this fix more complicated.
>
> I was also considering rolling this into the first loop, but ultimately
> went with the minimal patch to fix the issue. I don't mind rerolling if
> we prefer it the other way though. :)

When I was researching, I have been working on a version that rolls the
changes into the first loop, but that diff is a lot more substantial.
The thing is, you cannot seem to avoid introducing another (nested)
loop.

My refactoring is based on the zlib example page[1] page and example
source code[2], and also there two nested loops are used. Ideally, we'd
modify our code so it follows the example code more closely, but I'm
worried this might introduce new breakage, so it's probably not worth
it.

On the other hand, I was able to trigger failure when I made the output
buffer equally large as the input buffer. In this case git-archive
completes successfully, but when inflated, the content is mangled.
It seems my version is protected against that.

I've submitted my version too[3]. I'm on the fence which solution is
better, so I'd love to get input from the both of you. So I'm happy to
hear any thoughts.

[1]: https://zlib.net/zlib_how.html
[2]: https://zlib.net/zpipe.c
[3]: https://lore.kernel.org/git/20250804-toon-archive-zip-fix-v1-0-ca89858e5eaa@iotcl.com/

-- 
Cheers,
Toon

      reply	other threads:[~2025-08-04 17:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-02 22:08 [PATCH] archive: flush deflate stream until Z_STREAM_END Justin Tobler
2025-08-03  9:52 ` René Scharfe
2025-08-03 15:34   ` Justin Tobler
2025-08-04 17:00     ` Toon Claes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cy9buim2.fsf@iotcl.com \
    --to=toon@iotcl.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).