Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "René Scharfe" <l.s.r@web.de>
Cc: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
Date: Fri, 15 May 2026 22:51:19 -0400	[thread overview]
Message-ID: <20260516025119.GA832077@coredump.intra.peff.net> (raw)
In-Reply-To: <1d79d7bc-5441-4e72-9cb0-e8900f57172c@web.de>

On Sat, May 16, 2026 at 01:01:05AM +0200, René Scharfe wrote:

> > I think as long as the behavior remains "slow, but we do not overflow
> > any buffers" when you reach these limits, that's OK. Nobody is going to
> > do it in practice, and we just want to make sure that malicious inputs
> > cannot get out-of-bounds writes. It might be worth adding a comment,
> > though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
> > "alloc" in that macro.
> There is no overflow check in either version (yet), so neither is safe
> to operate close to the boundary.  Close meaning the intermediate term
> (alloc + 16) * 3 being bigger than the maximum value.

Yes, but for some definition of safe. Both before and after your patch,
as we get close to the boundary the allocation will grow slower than it
should, but we'll never write out of bounds. The behavior for the "git
foo" I showed earlier is slightly different:

  - before your patch, ~2GB we stop doubling and instead start growing
    the array by one at each ALLOC_GROW() call. This is because
    alloc_nr() overflows to a small value, but the:

      if (alloc_nr(alloc) < (nr))
              alloc = (nr);

    check kicks in.

  - after your patch we grow to ~4GB, and then things get super slow.
    This is because we correctly compute the new allocation as a size_t,
    but then truncate it while assigning to alloc. So on the next
    ALLOC_GROW() call, we'll think the buffer is way too small and try
    to realloc again. I don't know why this is so much slower than the
    grow-by-one above, but it is.

Neither is really correct, but both are in the realm of OK: stupidly
large input doesn't perform well, but there's no buffer overflow
vulnerability.

What I was worried about is what happens if you tweak your patch like
this:

diff --git a/git-compat-util.h b/git-compat-util.h
index 2bc1f43f48..0730dd24ad 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -870,7 +870,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
 		size_t alloc_grow_new_alloc_; \
 		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
 			alloc = alloc_grow_new_alloc_; \
-			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
+			REALLOC_ARRAY(x, alloc); \
 		} \
 	} while (0)
 

In that case we really do end up with too-small allocations and
out-of-bounds writes.

Maybe you saw that coming and that's why you wrote it as you did. But it
is definitely subtle enough that I think it would merit a big warning
comment that "alloc" and "alloc_grow_new_alloc_" are not necessarily the
same type, and hence not necessarily the same value.

> Here's a demo program exercising the arithmetic part of the macros:

I think the difference isn't in the arithmetic values that come out, but
in what is fed to realloc() itself. And in your harness, realloc is just
"x = true". If you actually store the value that would be passed to
realloc() like this:

diff --git a/foo.c.orig b/foo.c
index 2fbce8c..7498f36 100644
--- a/foo.c.orig
+++ b/foo.c
@@ -11,7 +11,7 @@
 				alloc = (nr); \
 			else \
 				alloc = alloc_nr(alloc); \
-			x = true; \
+			x = alloc; \
 		} \
 	} while (0)
 
@@ -31,7 +31,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
 		size_t alloc_grow_new_alloc_; \
 		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
 			alloc = alloc_grow_new_alloc_; \
-			x = true; \
+			x = alloc_grow_new_alloc_; \
 		} \
 	} while (0)
 
@@ -44,7 +44,7 @@ int main(int argc, char **argv)
 	for (T i = 0;; i++) {
 		for (T j = MIN;; j++) {
 			T alloc1 = j, alloc2 = j;
-			bool allocated1 = false, allocated2 = false;
+			size_t allocated1 = 0, allocated2 = 0;
 			ALLOC_GROW1(allocated1, i, alloc1);
 			ALLOC_GROW2(allocated2, i, alloc2);
 			if (alloc1 != alloc2 || allocated1 != allocated2)

then you see the differences. For negative values, yeah, you end up with
big size_t values. But for an unsigned type you get different small
allocations.

-Peff

  reply	other threads:[~2026-05-16  2:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
2026-05-15 19:08 ` Jeff King
2026-05-15 19:50   ` Jeff King
2026-05-15 23:01     ` René Scharfe
2026-05-16  2:51       ` Jeff King [this message]
2026-05-16  6:55     ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516025119.GA832077@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox