From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
Date: Sat, 16 May 2026 01:01:05 +0200 [thread overview]
Message-ID: <1d79d7bc-5441-4e72-9cb0-e8900f57172c@web.de> (raw)
In-Reply-To: <20260515195049.GA149960@coredump.intra.peff.net>
On 5/15/26 9:50 PM, Jeff King wrote:
> On Fri, May 15, 2026 at 03:08:18PM -0400, Jeff King wrote:
>
>> On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:
>>
>>> + size_t alloc_grow_new_alloc_; \
>>> + if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>>> + alloc = alloc_grow_new_alloc_; \
>>> + REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
>>> } \
>>
>> What happens if a caller passes in an argument that isn't a size_t?
>> We'll check for overflow in the size_t space, and then truncate it when
>> we assign to alloc, I think.
>>
>> I think we generally try to hold allocations in size_t these days, but
>> I'd be surprised if there weren't a few "int" holdouts. Grepping around,
>> alloc_node() seems to be an example.
>>
>> BTW, non-size_t arguments nullifies my earlier hand-waving around "nr +
>> 1 overflowing implies we've filled up the address space". But we are
>> still protected in the existing code by the:
>>
>> if (alloc_nr(alloc) < (nr))
>> alloc = (nr);
>>
>> logic. But with your patch, that all happens in the size_t space, so I
>> think it would actually introduce possible array overflows when the
>> caller is using a smaller type.
>
> Hmm, playing with it and looking a little closer, I think we don't end
> up overflowing the buffer because you use the size_t for
> REALLOC_ARRAY(). So the result is big, but then "alloc" is truncated.
>
> And then on the next call, we think "oh no, the allocation is way too
> small" because we are using the truncated value. So we try to size up
> for every single allocation, even though it's actually big enough, and
> the program slows to a crawl. ;)
>
> For reference, IU was using this hack to play around and demonstrate:
>
> diff --git a/git.c b/git.c
> index 5a40eab8a2..638bbc69e4 100644
> --- a/git.c
> +++ b/git.c
> @@ -969,6 +969,19 @@ int cmd_main(int argc, const char **argv)
>
> cmd = argv[0];
>
> + if (!strcmp(cmd, "foo")) {
> + unsigned char *buf = NULL;
> + unsigned nr = 0, alloc = 0;
> + for (unsigned i = 0; i < UINT_MAX; i++) {
> + ALLOC_GROW(buf, nr + 1, alloc);
> + if (i % 313370 == 0)
> + warning("at i=%u, alloc=%u, nr=%u", i, alloc, nr);
> + buf[nr++] = i % 256;
> + }
> + printf("done, final nr=%u, alloc=%u\n", nr, alloc);
> + return 0;
> + }
> +
> /*
> * We use PATH to find git commands, but we prepend some higher
> * precedence paths: the "--exec-path" option, the GIT_EXEC_PATH
>
>
> The same problem exists in several places in actual code, but I'm not
> sure how practical it is to trigger. The alloc_node() is counting not
> just objects, but blocks of objects. So you'd need 2*31 * 1024 objects
> of one type, which is probably going to run afoul of other limitations.
> Other cases are similar; for example "yes | git fetch-pack --stdin foo"
> will grow an array indefinitely, but at one strbuf per line it starts
> swapping on my 64GB machine at only 350M entries.
>
> I think as long as the behavior remains "slow, but we do not overflow
> any buffers" when you reach these limits, that's OK. Nobody is going to
> do it in practice, and we just want to make sure that malicious inputs
> cannot get out-of-bounds writes. It might be worth adding a comment,
> though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
> "alloc" in that macro.
There is no overflow check in either version (yet), so neither is safe
to operate close to the boundary. Close meaning the intermediate term
(alloc + 16) * 3 being bigger than the maximum value.
Does the size_t arithmetic make matters worse? The only change I can
see is that it interprets negative values as big unsigned ones and
then doesn't reallocate. The outcome for positive values is the same,
overflow and all, no?
Here's a demo program exercising the arithmetic part of the macros:
#include <limits.h>
#include <stdbool.h>
#include <stdio.h>
#define alloc_nr(x) (((x)+16)*3/2)
#define ALLOC_GROW1(x, nr, alloc) \
do { \
if ((nr) > alloc) { \
if (alloc_nr(alloc) < (nr)) \
alloc = (nr); \
else \
alloc = alloc_nr(alloc); \
x = true; \
} \
} while (0)
static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
{
if (nr > alloc) {
size_t out = alloc_nr(alloc);
*outp = out < nr ? nr : out;
return true;
}
return false;
}
#define ALLOC_GROW2(x, nr, alloc) \
do { \
size_t alloc_grow_new_alloc_; \
if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
alloc = alloc_grow_new_alloc_; \
x = true; \
} \
} while (0)
#define T signed char
#define MIN 0
#define MAX SCHAR_MAX
int main(int argc, char **argv)
{
for (T i = 0;; i++) {
for (T j = MIN;; j++) {
T alloc1 = j, alloc2 = j;
bool allocated1 = false, allocated2 = false;
ALLOC_GROW1(allocated1, i, alloc1);
ALLOC_GROW2(allocated2, i, alloc2);
if (alloc1 != alloc2 || allocated1 != allocated2)
printf("%zu %zu %d %zu %d %zu\n",
(size_t)i, (size_t)j,
allocated1, (size_t)alloc1,
allocated2, (size_t)alloc2);
if (j == MAX)
break;
}
if (i == MAX)
break;
}
return 0;
}
prev parent reply other threads:[~2026-05-15 23:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
2026-05-15 19:08 ` Jeff King
2026-05-15 19:50 ` Jeff King
2026-05-15 23:01 ` René Scharfe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1d79d7bc-5441-4e72-9cb0-e8900f57172c@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.