Git development
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
	 git@vger.kernel.org
Subject: Re: [PATCH] zlib: properly clamp to uLong
Date: Fri, 19 Jun 2026 09:41:03 +0200 (CEST)	[thread overview]
Message-ID: <ef345cb3-e54b-65ce-9a41-bde56ea7108f@gmx.de> (raw)
In-Reply-To: <xmqqzf0rrdbp.fsf@gitster.g>

Hi Junio,

On Thu, 18 Jun 2026, Junio C Hamano wrote:

> [...]
>
> > @@ -60,7 +65,7 @@ static void zlib_post_call(git_zstream *s, int status)
> >  	 * We track our own totals and verify only the low bits match.
> >  	 */
> >  	if ((s->z.total_out & ULONG_MAX_VALUE) !=
> > -	    ((s->total_out + bytes_produced) & ULONG_MAX_VALUE))
> > +	    ((zlib_uLong_cap(s->total_out) + bytes_produced) & ULONG_MAX_VALUE))
> >  		BUG("total_out mismatch");
> 
> Because we now clamp (not "taking lower bits of") s->total_out to a
> value between 0..4GB and store it in s->z.total_out in pre-call, let
> zlib do its thing that increments s->z.total_out modulo 4GB, and we
> clamp the s->total_out (before incrementing) the same way in post_call
> here, both sides of "!=" above even out.

Technically, the range is 0..(4GB-1), but yes, that's exactly the idea.

If we clamped bit-wise, i.e. to the lower bits as is currently done, we
would _also_ stay within that range, but we'd restrict the total size
unnecessarily in most cases (i.e. in all cases where `total_out` isn't one
less than an exact multiple of 4GB). In the worst case, we'd restrict to 0
bytes, in which case we would run into an infinite loop because zlib has
no space to work with and we'd try again and again to whittle away a chunk
of that large input.

> But the comment before this comparison that claims that "we ... verify
> only the low bits match" is a bit off the reality, I suspect.

I am afraid that the comment is still true. The thing is, we're trying to
compare the _real_ `total_out + bytes_produced` to zlib's necessarily
restricted `total_out` (we cannot change the data type of that attribute
of `struct z_stream_s`, it's not ours to change, it'll remain `uLong`
because zlib made the same mistake as Git to choose that imprecise data
type for memory size calculations). The sum `total_out + bytes_produced`
is of type `size_t`, the attribute `s->z.total_out` is of type `uLong`.

Therefore, we still need to clamp bit-wise, as the _real_ `total_out +
bytes_produced` may very well exceed the maximal value of
`s->z.total_out`, and the zlib operation will _still_ have produced the
expected number of bytes, i.e. that sanity check should _pass_.

If anything, we _could_ consider dropping that masking of `s->z.total_out`
to the maximal `unsigned long` value, seeing as `s->z.total_out` _is_ of
that data type and therefore cannot reasonably exceed that. But then,
there might emerge a zlib variant in the future that recapitulates Git's
effort to use `size_t` where `size_t` is due, and compiling/linking
against _that_ zlib variant would need this mask, otherwise the sanity
check could fail for completely bogus reasons.

So: The comment is still correct, even with the adjusted logic.

Ciao,
Johannes

> 
> > @@ -68,7 +73,7 @@ static void zlib_post_call(git_zstream *s, int status)
> >  	 */
> >  	if (status != Z_NEED_DICT &&
> >  	    (s->z.total_in & ULONG_MAX_VALUE) !=
> > -	    ((s->total_in + bytes_consumed) & ULONG_MAX_VALUE))
> > +	    ((zlib_uLong_cap(s->total_in) + bytes_consumed) & ULONG_MAX_VALUE))
> >  		BUG("total_in mismatch");
> >  
> >  	s->total_out += bytes_produced;
> >
> > base-commit: 7a094d68a27e321a99c8ab6b700909e503904bd9
> 

      reply	other threads:[~2026-06-19  7:41 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18 13:50 [PATCH] zlib: properly clamp to uLong Johannes Schindelin via GitGitGadget
2026-06-18 17:03 ` Junio C Hamano
2026-06-19  7:41   ` Johannes Schindelin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ef345cb3-e54b-65ce-9a41-bde56ea7108f@gmx.de \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox