* [PATCH] zlib: properly clamp to uLong
@ 2026-06-18 13:50 Johannes Schindelin via GitGitGadget
2026-06-18 17:03 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2026-06-18 13:50 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Johannes Schindelin
From: Johannes Schindelin <johannes.schindelin@gmx.de>
On platforms where `unsigned long` and `size_t` differ in bit size, we
want to clamp the buffers we pass to zlib to the former's size, as per
d05d666977 (git-zlib: handle data streams larger than 4GB, 2026-05-08).
The logic introduced in that commit performs a clamping to the bits,
though, which fails to do what is needed here: If too many bytes are
available in the buffers, we need to clamp to the maximum value of an
`unsigned long`. Otherwise, we ask zlib to use too small buffers, in the
worst case using 0 as the size (think: a value whose 32 lowest bits are
all zero).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
zlib: properly clamp to uLong
I re-read this logic earlier this week... and I am quite convinced that
it needs to be fixed.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2153%2Fdscho%2Ffix-ulong-clamping-for-zlib-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2153/dscho/fix-ulong-clamping-for-zlib-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2153
git-zlib.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/git-zlib.c b/git-zlib.c
index b91cb323ae..d21adb3bf5 100644
--- a/git-zlib.c
+++ b/git-zlib.c
@@ -38,12 +38,17 @@ static inline uInt zlib_buf_cap(unsigned long len)
return (ZLIB_BUF_MAX < len) ? ZLIB_BUF_MAX : len;
}
+static inline uLong zlib_uLong_cap(size_t s)
+{
+ return s < ULONG_MAX_VALUE ? (uLong)s : ULONG_MAX_VALUE;
+}
+
static void zlib_pre_call(git_zstream *s)
{
s->z.next_in = s->next_in;
s->z.next_out = s->next_out;
- s->z.total_in = (uLong)(s->total_in & ULONG_MAX_VALUE);
- s->z.total_out = (uLong)(s->total_out & ULONG_MAX_VALUE);
+ s->z.total_in = zlib_uLong_cap(s->total_in);
+ s->z.total_out = zlib_uLong_cap(s->total_out);
s->z.avail_in = zlib_buf_cap(s->avail_in);
s->z.avail_out = zlib_buf_cap(s->avail_out);
}
@@ -60,7 +65,7 @@ static void zlib_post_call(git_zstream *s, int status)
* We track our own totals and verify only the low bits match.
*/
if ((s->z.total_out & ULONG_MAX_VALUE) !=
- ((s->total_out + bytes_produced) & ULONG_MAX_VALUE))
+ ((zlib_uLong_cap(s->total_out) + bytes_produced) & ULONG_MAX_VALUE))
BUG("total_out mismatch");
/*
* zlib does not update total_in when it returns Z_NEED_DICT,
@@ -68,7 +73,7 @@ static void zlib_post_call(git_zstream *s, int status)
*/
if (status != Z_NEED_DICT &&
(s->z.total_in & ULONG_MAX_VALUE) !=
- ((s->total_in + bytes_consumed) & ULONG_MAX_VALUE))
+ ((zlib_uLong_cap(s->total_in) + bytes_consumed) & ULONG_MAX_VALUE))
BUG("total_in mismatch");
s->total_out += bytes_produced;
base-commit: 7a094d68a27e321a99c8ab6b700909e503904bd9
--
gitgitgadget
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] zlib: properly clamp to uLong 2026-06-18 13:50 [PATCH] zlib: properly clamp to uLong Johannes Schindelin via GitGitGadget @ 2026-06-18 17:03 ` Junio C Hamano 2026-06-19 7:41 ` Johannes Schindelin 0 siblings, 1 reply; 3+ messages in thread From: Junio C Hamano @ 2026-06-18 17:03 UTC (permalink / raw) To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin The original in js/objects-larger-than-4gb-on-windows says things like: + s->z.total_in = (uLong)(s->total_in & ULONG_MAX_VALUE); + s->z.total_out = (uLong)(s->total_out & ULONG_MAX_VALUE); Your patch ... > +static inline uLong zlib_uLong_cap(size_t s) > +{ > + return s < ULONG_MAX_VALUE ? (uLong)s : ULONG_MAX_VALUE; > +} > + > static void zlib_pre_call(git_zstream *s) > { > s->z.next_in = s->next_in; > s->z.next_out = s->next_out; > - s->z.total_in = (uLong)(s->total_in & ULONG_MAX_VALUE); > - s->z.total_out = (uLong)(s->total_out & ULONG_MAX_VALUE); > + s->z.total_in = zlib_uLong_cap(s->total_in); > + s->z.total_out = zlib_uLong_cap(s->total_out); ... is an obvious fix for that. > @@ -60,7 +65,7 @@ static void zlib_post_call(git_zstream *s, int status) > * We track our own totals and verify only the low bits match. > */ > if ((s->z.total_out & ULONG_MAX_VALUE) != > - ((s->total_out + bytes_produced) & ULONG_MAX_VALUE)) > + ((zlib_uLong_cap(s->total_out) + bytes_produced) & ULONG_MAX_VALUE)) > BUG("total_out mismatch"); Because we now clamp (not "taking lower bits of") s->total_out to a value between 0..4GB and store it in s->z.total_out in pre-call, let zlib do its thing that increments s->z.total_out modulo 4GB, and we clamp the s->total_out (before incrementing) the same way in post_call here, both sides of "!=" above even out. But the comment before this comparison that claims that "we ... verify only the low bits match" is a bit off the reality, I suspect. > @@ -68,7 +73,7 @@ static void zlib_post_call(git_zstream *s, int status) > */ > if (status != Z_NEED_DICT && > (s->z.total_in & ULONG_MAX_VALUE) != > - ((s->total_in + bytes_consumed) & ULONG_MAX_VALUE)) > + ((zlib_uLong_cap(s->total_in) + bytes_consumed) & ULONG_MAX_VALUE)) > BUG("total_in mismatch"); > > s->total_out += bytes_produced; > > base-commit: 7a094d68a27e321a99c8ab6b700909e503904bd9 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] zlib: properly clamp to uLong 2026-06-18 17:03 ` Junio C Hamano @ 2026-06-19 7:41 ` Johannes Schindelin 0 siblings, 0 replies; 3+ messages in thread From: Johannes Schindelin @ 2026-06-19 7:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git Hi Junio, On Thu, 18 Jun 2026, Junio C Hamano wrote: > [...] > > > @@ -60,7 +65,7 @@ static void zlib_post_call(git_zstream *s, int status) > > * We track our own totals and verify only the low bits match. > > */ > > if ((s->z.total_out & ULONG_MAX_VALUE) != > > - ((s->total_out + bytes_produced) & ULONG_MAX_VALUE)) > > + ((zlib_uLong_cap(s->total_out) + bytes_produced) & ULONG_MAX_VALUE)) > > BUG("total_out mismatch"); > > Because we now clamp (not "taking lower bits of") s->total_out to a > value between 0..4GB and store it in s->z.total_out in pre-call, let > zlib do its thing that increments s->z.total_out modulo 4GB, and we > clamp the s->total_out (before incrementing) the same way in post_call > here, both sides of "!=" above even out. Technically, the range is 0..(4GB-1), but yes, that's exactly the idea. If we clamped bit-wise, i.e. to the lower bits as is currently done, we would _also_ stay within that range, but we'd restrict the total size unnecessarily in most cases (i.e. in all cases where `total_out` isn't one less than an exact multiple of 4GB). In the worst case, we'd restrict to 0 bytes, in which case we would run into an infinite loop because zlib has no space to work with and we'd try again and again to whittle away a chunk of that large input. > But the comment before this comparison that claims that "we ... verify > only the low bits match" is a bit off the reality, I suspect. I am afraid that the comment is still true. The thing is, we're trying to compare the _real_ `total_out + bytes_produced` to zlib's necessarily restricted `total_out` (we cannot change the data type of that attribute of `struct z_stream_s`, it's not ours to change, it'll remain `uLong` because zlib made the same mistake as Git to choose that imprecise data type for memory size calculations). The sum `total_out + bytes_produced` is of type `size_t`, the attribute `s->z.total_out` is of type `uLong`. Therefore, we still need to clamp bit-wise, as the _real_ `total_out + bytes_produced` may very well exceed the maximal value of `s->z.total_out`, and the zlib operation will _still_ have produced the expected number of bytes, i.e. that sanity check should _pass_. If anything, we _could_ consider dropping that masking of `s->z.total_out` to the maximal `unsigned long` value, seeing as `s->z.total_out` _is_ of that data type and therefore cannot reasonably exceed that. But then, there might emerge a zlib variant in the future that recapitulates Git's effort to use `size_t` where `size_t` is due, and compiling/linking against _that_ zlib variant would need this mask, otherwise the sanity check could fail for completely bogus reasons. So: The comment is still correct, even with the adjusted logic. Ciao, Johannes > > > @@ -68,7 +73,7 @@ static void zlib_post_call(git_zstream *s, int status) > > */ > > if (status != Z_NEED_DICT && > > (s->z.total_in & ULONG_MAX_VALUE) != > > - ((s->total_in + bytes_consumed) & ULONG_MAX_VALUE)) > > + ((zlib_uLong_cap(s->total_in) + bytes_consumed) & ULONG_MAX_VALUE)) > > BUG("total_in mismatch"); > > > > s->total_out += bytes_produced; > > > > base-commit: 7a094d68a27e321a99c8ab6b700909e503904bd9 > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-19 7:41 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-18 13:50 [PATCH] zlib: properly clamp to uLong Johannes Schindelin via GitGitGadget 2026-06-18 17:03 ` Junio C Hamano 2026-06-19 7:41 ` Johannes Schindelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox