From: Jeff King <peff@peff.net>
To: "René Scharfe" <l.s.r@web.de>
Cc: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
Date: Fri, 15 May 2026 15:50:49 -0400 [thread overview]
Message-ID: <20260515195049.GA149960@coredump.intra.peff.net> (raw)
In-Reply-To: <20260515190818.GA98370@coredump.intra.peff.net>
On Fri, May 15, 2026 at 03:08:18PM -0400, Jeff King wrote:
> On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:
>
> > + size_t alloc_grow_new_alloc_; \
> > + if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
> > + alloc = alloc_grow_new_alloc_; \
> > + REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
> > } \
>
> What happens if a caller passes in an argument that isn't a size_t?
> We'll check for overflow in the size_t space, and then truncate it when
> we assign to alloc, I think.
>
> I think we generally try to hold allocations in size_t these days, but
> I'd be surprised if there weren't a few "int" holdouts. Grepping around,
> alloc_node() seems to be an example.
>
> BTW, non-size_t arguments nullifies my earlier hand-waving around "nr +
> 1 overflowing implies we've filled up the address space". But we are
> still protected in the existing code by the:
>
> if (alloc_nr(alloc) < (nr))
> alloc = (nr);
>
> logic. But with your patch, that all happens in the size_t space, so I
> think it would actually introduce possible array overflows when the
> caller is using a smaller type.
Hmm, playing with it and looking a little closer, I think we don't end
up overflowing the buffer because you use the size_t for
REALLOC_ARRAY(). So the result is big, but then "alloc" is truncated.
And then on the next call, we think "oh no, the allocation is way too
small" because we are using the truncated value. So we try to size up
for every single allocation, even though it's actually big enough, and
the program slows to a crawl. ;)
For reference, IU was using this hack to play around and demonstrate:
diff --git a/git.c b/git.c
index 5a40eab8a2..638bbc69e4 100644
--- a/git.c
+++ b/git.c
@@ -969,6 +969,19 @@ int cmd_main(int argc, const char **argv)
cmd = argv[0];
+ if (!strcmp(cmd, "foo")) {
+ unsigned char *buf = NULL;
+ unsigned nr = 0, alloc = 0;
+ for (unsigned i = 0; i < UINT_MAX; i++) {
+ ALLOC_GROW(buf, nr + 1, alloc);
+ if (i % 313370 == 0)
+ warning("at i=%u, alloc=%u, nr=%u", i, alloc, nr);
+ buf[nr++] = i % 256;
+ }
+ printf("done, final nr=%u, alloc=%u\n", nr, alloc);
+ return 0;
+ }
+
/*
* We use PATH to find git commands, but we prepend some higher
* precedence paths: the "--exec-path" option, the GIT_EXEC_PATH
The same problem exists in several places in actual code, but I'm not
sure how practical it is to trigger. The alloc_node() is counting not
just objects, but blocks of objects. So you'd need 2*31 * 1024 objects
of one type, which is probably going to run afoul of other limitations.
Other cases are similar; for example "yes | git fetch-pack --stdin foo"
will grow an array indefinitely, but at one strbuf per line it starts
swapping on my 64GB machine at only 350M entries.
I think as long as the behavior remains "slow, but we do not overflow
any buffers" when you reach these limits, that's OK. Nobody is going to
do it in practice, and we just want to make sure that malicious inputs
cannot get out-of-bounds writes. It might be worth adding a comment,
though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
"alloc" in that macro.
-Peff
next prev parent reply other threads:[~2026-05-15 19:50 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
2026-05-15 19:08 ` Jeff King
2026-05-15 19:50 ` Jeff King [this message]
2026-05-15 23:01 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260515195049.GA149960@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.