From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Subject: Re: [PATCH] zlib: properly clamp to uLong
Date: Fri, 19 Jun 2026 09:41:03 +0200 (CEST) [thread overview]
Message-ID: <ef345cb3-e54b-65ce-9a41-bde56ea7108f@gmx.de> (raw)
In-Reply-To: <xmqqzf0rrdbp.fsf@gitster.g>
Hi Junio,
On Thu, 18 Jun 2026, Junio C Hamano wrote:
> [...]
>
> > @@ -60,7 +65,7 @@ static void zlib_post_call(git_zstream *s, int status)
> > * We track our own totals and verify only the low bits match.
> > */
> > if ((s->z.total_out & ULONG_MAX_VALUE) !=
> > - ((s->total_out + bytes_produced) & ULONG_MAX_VALUE))
> > + ((zlib_uLong_cap(s->total_out) + bytes_produced) & ULONG_MAX_VALUE))
> > BUG("total_out mismatch");
>
> Because we now clamp (not "taking lower bits of") s->total_out to a
> value between 0..4GB and store it in s->z.total_out in pre-call, let
> zlib do its thing that increments s->z.total_out modulo 4GB, and we
> clamp the s->total_out (before incrementing) the same way in post_call
> here, both sides of "!=" above even out.
Technically, the range is 0..(4GB-1), but yes, that's exactly the idea.
If we clamped bit-wise, i.e. to the lower bits as is currently done, we
would _also_ stay within that range, but we'd restrict the total size
unnecessarily in most cases (i.e. in all cases where `total_out` isn't one
less than an exact multiple of 4GB). In the worst case, we'd restrict to 0
bytes, in which case we would run into an infinite loop because zlib has
no space to work with and we'd try again and again to whittle away a chunk
of that large input.
> But the comment before this comparison that claims that "we ... verify
> only the low bits match" is a bit off the reality, I suspect.
I am afraid that the comment is still true. The thing is, we're trying to
compare the _real_ `total_out + bytes_produced` to zlib's necessarily
restricted `total_out` (we cannot change the data type of that attribute
of `struct z_stream_s`, it's not ours to change, it'll remain `uLong`
because zlib made the same mistake as Git to choose that imprecise data
type for memory size calculations). The sum `total_out + bytes_produced`
is of type `size_t`, the attribute `s->z.total_out` is of type `uLong`.
Therefore, we still need to clamp bit-wise, as the _real_ `total_out +
bytes_produced` may very well exceed the maximal value of
`s->z.total_out`, and the zlib operation will _still_ have produced the
expected number of bytes, i.e. that sanity check should _pass_.
If anything, we _could_ consider dropping that masking of `s->z.total_out`
to the maximal `unsigned long` value, seeing as `s->z.total_out` _is_ of
that data type and therefore cannot reasonably exceed that. But then,
there might emerge a zlib variant in the future that recapitulates Git's
effort to use `size_t` where `size_t` is due, and compiling/linking
against _that_ zlib variant would need this mask, otherwise the sanity
check could fail for completely bogus reasons.
So: The comment is still correct, even with the adjusted logic.
Ciao,
Johannes
>
> > @@ -68,7 +73,7 @@ static void zlib_post_call(git_zstream *s, int status)
> > */
> > if (status != Z_NEED_DICT &&
> > (s->z.total_in & ULONG_MAX_VALUE) !=
> > - ((s->total_in + bytes_consumed) & ULONG_MAX_VALUE))
> > + ((zlib_uLong_cap(s->total_in) + bytes_consumed) & ULONG_MAX_VALUE))
> > BUG("total_in mismatch");
> >
> > s->total_out += bytes_produced;
> >
> > base-commit: 7a094d68a27e321a99c8ab6b700909e503904bd9
>
prev parent reply other threads:[~2026-06-19 7:41 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-18 13:50 [PATCH] zlib: properly clamp to uLong Johannes Schindelin via GitGitGadget
2026-06-18 17:03 ` Junio C Hamano
2026-06-19 7:41 ` Johannes Schindelin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ef345cb3-e54b-65ce-9a41-bde56ea7108f@gmx.de \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox