git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] archive: flush deflate stream until Z_STREAM_END
@ 2025-08-02 22:08 Justin Tobler
  2025-08-03  9:52 ` René Scharfe
  0 siblings, 1 reply; 4+ messages in thread
From: Justin Tobler @ 2025-08-02 22:08 UTC (permalink / raw)
  To: git; +Cc: toon, Justin Tobler

In `archive-zip.c:write_zip_entry()` when using a stream as input for
deflating a file, the call to `git_deflate()` with Z_FINISH always
expects Z_STREAM_END to be returned. Per zlib documentation[1]:

        If the parameter flush is set to Z_FINISH, pending input is
        processed, pending output is flushed and deflate returns with
        Z_STREAM_END if there was enough output space. If deflate
        returns with Z_OK or Z_BUF_ERROR, this function must be called
        again with Z_FINISH and more output space (updated avail_out)
        but no more input data, until it returns with Z_STREAM_END or an
        error. After deflate has returned Z_STREAM_END, the only
        possible operations on the stream are deflateReset or
        deflateEnd.

In scenarios where the output buffer is not large enough to write all
the compressed data, it is perfectly valid for the underlying
`deflate()` to return Z_OK. Thus, expecting a single pass of `deflate()`
here to always return Z_STREAM_END is a bug. Update the code to flush
the deflate stream until Z_STREAM_END is returned.

[1]: https://zlib.net/manual.html

Helped-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
---

Greetings,

At GitLab, we received a report of a user getting the following error
when generating a zip archive of their repository via git-archive(1):

        fatal: deflate error (0)

I've so far only been able to reproduce this issue in the chromium.git
repository with a specific file:

        git clone --depth=1 https://github.com/chromium/chromium.git
        cd chromium
        git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
                ui/events/ozone/evdev/touch_filter/palm_model/onedevice_train_palm_detection_filter_inference.cc

In the above example, `core.bigFileThreshold` is set to a low value to
cause more files to use a stream as input while being deflated. This is
the codepath that produces the specific error.

I've tested the patch against this specific file, and it fixes the
issue, but I'm uncertain how to reproduce and test this issue more
generically. I'm open to suggestions if anyone has some ideas :)

Thanks,
-Justin

---
 archive-zip.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index df8866d5bae..29e7c9f5e3f 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -492,14 +492,22 @@ static int write_zip_entry(struct archiver_args *args,
 
 		zstream.next_in = buf;
 		zstream.avail_in = 0;
-		result = git_deflate(&zstream, Z_FINISH);
-		if (result != Z_STREAM_END)
-			die("deflate error (%d)", result);
+
+		do {
+			result = git_deflate(&zstream, Z_FINISH);
+			if (result != Z_OK && result != Z_STREAM_END)
+				die("deflate error (%d)", result);
+
+			out_len = zstream.next_out - compressed;
+			if (out_len > 0) {
+				write_or_die(1, compressed, out_len);
+				compressed_size += out_len;
+				zstream.next_out = compressed;
+				zstream.avail_out = sizeof(compressed);
+			}
+		} while (result != Z_STREAM_END);
 
 		git_deflate_end(&zstream);
-		out_len = zstream.next_out - compressed;
-		write_or_die(1, compressed, out_len);
-		compressed_size += out_len;
 		zip_offset += compressed_size;
 
 		write_zip_data_desc(size, compressed_size, crc);
-- 
2.50.1.214.ga30f80fde9


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] archive: flush deflate stream until Z_STREAM_END
  2025-08-02 22:08 [PATCH] archive: flush deflate stream until Z_STREAM_END Justin Tobler
@ 2025-08-03  9:52 ` René Scharfe
  2025-08-03 15:34   ` Justin Tobler
  0 siblings, 1 reply; 4+ messages in thread
From: René Scharfe @ 2025-08-03  9:52 UTC (permalink / raw)
  To: Justin Tobler, git; +Cc: toon

On 8/3/25 12:08 AM, Justin Tobler wrote:
> In `archive-zip.c:write_zip_entry()` when using a stream as input for
> deflating a file, the call to `git_deflate()` with Z_FINISH always
> expects Z_STREAM_END to be returned. Per zlib documentation[1]:
> 
>         If the parameter flush is set to Z_FINISH, pending input is
>         processed, pending output is flushed and deflate returns with
>         Z_STREAM_END if there was enough output space. If deflate
>         returns with Z_OK or Z_BUF_ERROR, this function must be called
>         again with Z_FINISH and more output space (updated avail_out)
>         but no more input data, until it returns with Z_STREAM_END or an
>         error. After deflate has returned Z_STREAM_END, the only
>         possible operations on the stream are deflateReset or
>         deflateEnd.
> 
> In scenarios where the output buffer is not large enough to write all
> the compressed data, it is perfectly valid for the underlying
> `deflate()` to return Z_OK. Thus, expecting a single pass of `deflate()`
> here to always return Z_STREAM_END is a bug. Update the code to flush
> the deflate stream until Z_STREAM_END is returned.
> 
> [1]: https://zlib.net/manual.html

Good find.  I guess back then I thought making the output buffer twice
as big as the input buffer was sufficient, as deflateBound() guarantees
compression is possible with a much lower ratio.  But this doesn't take
the internal state of a stream into account.  Oof!

> Helped-by: Toon Claes <toon@iotcl.com>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
> 
> Greetings,
> 
> At GitLab, we received a report of a user getting the following error
> when generating a zip archive of their repository via git-archive(1):
> 
>         fatal: deflate error (0)
> 
> I've so far only been able to reproduce this issue in the chromium.git
> repository with a specific file:
> 
>         git clone --depth=1 https://github.com/chromium/chromium.git
>         cd chromium
>         git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
>                 ui/events/ozone/evdev/touch_filter/palm_model/onedevice_train_palm_detection_filter_inference.cc
> 
> In the above example, `core.bigFileThreshold` is set to a low value to
> cause more files to use a stream as input while being deflated. This is
> the codepath that produces the specific error.
> 
> I've tested the patch against this specific file, and it fixes the
> issue, but I'm uncertain how to reproduce and test this issue more
> generically. I'm open to suggestions if anyone has some ideas :)

Not sure how to fill up zlib's pending buffer most efficiently.
Reducing the size of the output buffer would make the bug easier to
trigger, though.

> Thanks,
> -Justin
> 
> ---
>  archive-zip.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/archive-zip.c b/archive-zip.c
> index df8866d5bae..29e7c9f5e3f 100644
> --- a/archive-zip.c
> +++ b/archive-zip.c
> @@ -492,14 +492,22 @@ static int write_zip_entry(struct archiver_args *args,
>  
>  		zstream.next_in = buf;
>  		zstream.avail_in = 0;
> -		result = git_deflate(&zstream, Z_FINISH);
> -		if (result != Z_STREAM_END)
> -			die("deflate error (%d)", result);
> +
> +		do {
> +			result = git_deflate(&zstream, Z_FINISH);
> +			if (result != Z_OK && result != Z_STREAM_END)
> +				die("deflate error (%d)", result);
> +
> +			out_len = zstream.next_out - compressed;
> +			if (out_len > 0) {
> +				write_or_die(1, compressed, out_len);
> +				compressed_size += out_len;
> +				zstream.next_out = compressed;
> +				zstream.avail_out = sizeof(compressed);
> +			}
> +		} while (result != Z_STREAM_END);
>  
>  		git_deflate_end(&zstream);
> -		out_len = zstream.next_out - compressed;
> -		write_or_die(1, compressed, out_len);
> -		compressed_size += out_len;

Looks good.  Could probably rolled into the first loop, but that just
would make this fix more complicated.

>  		zip_offset += compressed_size;
>  
>  		write_zip_data_desc(size, compressed_size, crc);


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] archive: flush deflate stream until Z_STREAM_END
  2025-08-03  9:52 ` René Scharfe
@ 2025-08-03 15:34   ` Justin Tobler
  2025-08-04 17:00     ` Toon Claes
  0 siblings, 1 reply; 4+ messages in thread
From: Justin Tobler @ 2025-08-03 15:34 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, toon

On 25/08/03 11:52AM, René Scharfe wrote:
> On 8/3/25 12:08 AM, Justin Tobler wrote:
> > I've tested the patch against this specific file, and it fixes the
> > issue, but I'm uncertain how to reproduce and test this issue more
> > generically. I'm open to suggestions if anyone has some ideas :)
> 
> Not sure how to fill up zlib's pending buffer most efficiently.
> Reducing the size of the output buffer would make the bug easier to
> trigger, though.

Ya, I was able to trigger this issue more frequently by making the
output buffer smaller than the input buffer. I was really hoping though
to find a way to reproduce this without code changes so we could add a
test. Not sure if that is really feasible though in this case.

> > diff --git a/archive-zip.c b/archive-zip.c
> > index df8866d5bae..29e7c9f5e3f 100644
> > --- a/archive-zip.c
> > +++ b/archive-zip.c
> > @@ -492,14 +492,22 @@ static int write_zip_entry(struct archiver_args *args,
> >  
> >  		zstream.next_in = buf;
> >  		zstream.avail_in = 0;
> > -		result = git_deflate(&zstream, Z_FINISH);
> > -		if (result != Z_STREAM_END)
> > -			die("deflate error (%d)", result);
> > +
> > +		do {
> > +			result = git_deflate(&zstream, Z_FINISH);
> > +			if (result != Z_OK && result != Z_STREAM_END)
> > +				die("deflate error (%d)", result);
> > +
> > +			out_len = zstream.next_out - compressed;
> > +			if (out_len > 0) {
> > +				write_or_die(1, compressed, out_len);
> > +				compressed_size += out_len;
> > +				zstream.next_out = compressed;
> > +				zstream.avail_out = sizeof(compressed);
> > +			}
> > +		} while (result != Z_STREAM_END);
> >  
> >  		git_deflate_end(&zstream);
> > -		out_len = zstream.next_out - compressed;
> > -		write_or_die(1, compressed, out_len);
> > -		compressed_size += out_len;
> 
> Looks good.  Could probably rolled into the first loop, but that just
> would make this fix more complicated.

I was also considering rolling this into the first loop, but ultimately
went with the minimal patch to fix the issue. I don't mind rerolling if
we prefer it the other way though. :)

Thanks for the review!

-Justin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] archive: flush deflate stream until Z_STREAM_END
  2025-08-03 15:34   ` Justin Tobler
@ 2025-08-04 17:00     ` Toon Claes
  0 siblings, 0 replies; 4+ messages in thread
From: Toon Claes @ 2025-08-04 17:00 UTC (permalink / raw)
  To: Justin Tobler, René Scharfe; +Cc: git

Justin Tobler <jltobler@gmail.com> writes:

> On 25/08/03 11:52AM, René Scharfe wrote:
>> On 8/3/25 12:08 AM, Justin Tobler wrote:
>> > I've tested the patch against this specific file, and it fixes the
>> > issue, but I'm uncertain how to reproduce and test this issue more
>> > generically. I'm open to suggestions if anyone has some ideas :)
>> 
>> Not sure how to fill up zlib's pending buffer most efficiently.
>> Reducing the size of the output buffer would make the bug easier to
>> trigger, though.

Yes, in my local testing I also have been reducing the size of the
output buffer.

> Ya, I was able to trigger this issue more frequently by making the
> output buffer smaller than the input buffer. I was really hoping though
> to find a way to reproduce this without code changes so we could add a
> test. Not sure if that is really feasible though in this case.

I think it's really nice you were able find a repository that reproduces
this issue. Having an existing example in the "wild" is better proof to
verify this fix work.

>> > I've so far only been able to reproduce this issue in the chromium.git
>> > repository with a specific file:
>> >
>> >         git clone --depth=1 https://github.com/chromium/chromium.git
>> >         cd chromium
>> >         git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
>> >                 ui/events/ozone/evdev/touch_filter/palm_model/onedevice_train_palm_detection_filter_inference.cc

Weird, for me this file didn't trigger the error, but the following file
did:

    chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js

But also with this file, I can confirm this fix works.

>> > diff --git a/archive-zip.c b/archive-zip.c
>> > index df8866d5bae..29e7c9f5e3f 100644
>> > --- a/archive-zip.c
>> > +++ b/archive-zip.c
>> > @@ -492,14 +492,22 @@ static int write_zip_entry(struct archiver_args *args,
>> >  
>> >  		zstream.next_in = buf;
>> >  		zstream.avail_in = 0;
>> > -		result = git_deflate(&zstream, Z_FINISH);
>> > -		if (result != Z_STREAM_END)
>> > -			die("deflate error (%d)", result);
>> > +
>> > +		do {
>> > +			result = git_deflate(&zstream, Z_FINISH);
>> > +			if (result != Z_OK && result != Z_STREAM_END)
>> > +				die("deflate error (%d)", result);
>> > +
>> > +			out_len = zstream.next_out - compressed;
>> > +			if (out_len > 0) {
>> > +				write_or_die(1, compressed, out_len);
>> > +				compressed_size += out_len;
>> > +				zstream.next_out = compressed;
>> > +				zstream.avail_out = sizeof(compressed);
>> > +			}
>> > +		} while (result != Z_STREAM_END);
>> >  
>> >  		git_deflate_end(&zstream);
>> > -		out_len = zstream.next_out - compressed;
>> > -		write_or_die(1, compressed, out_len);
>> > -		compressed_size += out_len;
>> 
>> Looks good.  Could probably rolled into the first loop, but that just
>> would make this fix more complicated.
>
> I was also considering rolling this into the first loop, but ultimately
> went with the minimal patch to fix the issue. I don't mind rerolling if
> we prefer it the other way though. :)

When I was researching, I have been working on a version that rolls the
changes into the first loop, but that diff is a lot more substantial.
The thing is, you cannot seem to avoid introducing another (nested)
loop.

My refactoring is based on the zlib example page[1] page and example
source code[2], and also there two nested loops are used. Ideally, we'd
modify our code so it follows the example code more closely, but I'm
worried this might introduce new breakage, so it's probably not worth
it.

On the other hand, I was able to trigger failure when I made the output
buffer equally large as the input buffer. In this case git-archive
completes successfully, but when inflated, the content is mangled.
It seems my version is protected against that.

I've submitted my version too[3]. I'm on the fence which solution is
better, so I'd love to get input from the both of you. So I'm happy to
hear any thoughts.

[1]: https://zlib.net/zlib_how.html
[2]: https://zlib.net/zpipe.c
[3]: https://lore.kernel.org/git/20250804-toon-archive-zip-fix-v1-0-ca89858e5eaa@iotcl.com/

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-04 17:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-02 22:08 [PATCH] archive: flush deflate stream until Z_STREAM_END Justin Tobler
2025-08-03  9:52 ` René Scharfe
2025-08-03 15:34   ` Justin Tobler
2025-08-04 17:00     ` Toon Claes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).