From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
Date: Sat, 16 May 2026 01:01:05 +0200 [thread overview]
Message-ID: <1d79d7bc-5441-4e72-9cb0-e8900f57172c@web.de> (raw)
In-Reply-To: <20260515195049.GA149960@coredump.intra.peff.net>
On 5/15/26 9:50 PM, Jeff King wrote:
> On Fri, May 15, 2026 at 03:08:18PM -0400, Jeff King wrote:
>
>> On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:
>>
>>> + size_t alloc_grow_new_alloc_; \
>>> + if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>>> + alloc = alloc_grow_new_alloc_; \
>>> + REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
>>> } \
>>
>> What happens if a caller passes in an argument that isn't a size_t?
>> We'll check for overflow in the size_t space, and then truncate it when
>> we assign to alloc, I think.
>>
>> I think we generally try to hold allocations in size_t these days, but
>> I'd be surprised if there weren't a few "int" holdouts. Grepping around,
>> alloc_node() seems to be an example.
>>
>> BTW, non-size_t arguments nullifies my earlier hand-waving around "nr +
>> 1 overflowing implies we've filled up the address space". But we are
>> still protected in the existing code by the:
>>
>> if (alloc_nr(alloc) < (nr))
>> alloc = (nr);
>>
>> logic. But with your patch, that all happens in the size_t space, so I
>> think it would actually introduce possible array overflows when the
>> caller is using a smaller type.
>
> Hmm, playing with it and looking a little closer, I think we don't end
> up overflowing the buffer because you use the size_t for
> REALLOC_ARRAY(). So the result is big, but then "alloc" is truncated.
>
> And then on the next call, we think "oh no, the allocation is way too
> small" because we are using the truncated value. So we try to size up
> for every single allocation, even though it's actually big enough, and
> the program slows to a crawl. ;)
>
> For reference, IU was using this hack to play around and demonstrate:
>
> diff --git a/git.c b/git.c
> index 5a40eab8a2..638bbc69e4 100644
> --- a/git.c
> +++ b/git.c
> @@ -969,6 +969,19 @@ int cmd_main(int argc, const char **argv)
>
> cmd = argv[0];
>
> + if (!strcmp(cmd, "foo")) {
> + unsigned char *buf = NULL;
> + unsigned nr = 0, alloc = 0;
> + for (unsigned i = 0; i < UINT_MAX; i++) {
> + ALLOC_GROW(buf, nr + 1, alloc);
> + if (i % 313370 == 0)
> + warning("at i=%u, alloc=%u, nr=%u", i, alloc, nr);
> + buf[nr++] = i % 256;
> + }
> + printf("done, final nr=%u, alloc=%u\n", nr, alloc);
> + return 0;
> + }
> +
> /*
> * We use PATH to find git commands, but we prepend some higher
> * precedence paths: the "--exec-path" option, the GIT_EXEC_PATH
>
>
> The same problem exists in several places in actual code, but I'm not
> sure how practical it is to trigger. The alloc_node() is counting not
> just objects, but blocks of objects. So you'd need 2*31 * 1024 objects
> of one type, which is probably going to run afoul of other limitations.
> Other cases are similar; for example "yes | git fetch-pack --stdin foo"
> will grow an array indefinitely, but at one strbuf per line it starts
> swapping on my 64GB machine at only 350M entries.
>
> I think as long as the behavior remains "slow, but we do not overflow
> any buffers" when you reach these limits, that's OK. Nobody is going to
> do it in practice, and we just want to make sure that malicious inputs
> cannot get out-of-bounds writes. It might be worth adding a comment,
> though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
> "alloc" in that macro.
There is no overflow check in either version (yet), so neither is safe
to operate close to the boundary. Close meaning the intermediate term
(alloc + 16) * 3 being bigger than the maximum value.
Does the size_t arithmetic make matters worse? The only change I can
see is that it interprets negative values as big unsigned ones and
then doesn't reallocate. The outcome for positive values is the same,
overflow and all, no?
Here's a demo program exercising the arithmetic part of the macros:
#include <limits.h>
#include <stdbool.h>
#include <stdio.h>
#define alloc_nr(x) (((x)+16)*3/2)
#define ALLOC_GROW1(x, nr, alloc) \
do { \
if ((nr) > alloc) { \
if (alloc_nr(alloc) < (nr)) \
alloc = (nr); \
else \
alloc = alloc_nr(alloc); \
x = true; \
} \
} while (0)
static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
{
if (nr > alloc) {
size_t out = alloc_nr(alloc);
*outp = out < nr ? nr : out;
return true;
}
return false;
}
#define ALLOC_GROW2(x, nr, alloc) \
do { \
size_t alloc_grow_new_alloc_; \
if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
alloc = alloc_grow_new_alloc_; \
x = true; \
} \
} while (0)
#define T signed char
#define MIN 0
#define MAX SCHAR_MAX
int main(int argc, char **argv)
{
for (T i = 0;; i++) {
for (T j = MIN;; j++) {
T alloc1 = j, alloc2 = j;
bool allocated1 = false, allocated2 = false;
ALLOC_GROW1(allocated1, i, alloc1);
ALLOC_GROW2(allocated2, i, alloc2);
if (alloc1 != alloc2 || allocated1 != allocated2)
printf("%zu %zu %d %zu %d %zu\n",
(size_t)i, (size_t)j,
allocated1, (size_t)alloc1,
allocated2, (size_t)alloc2);
if (j == MAX)
break;
}
if (i == MAX)
break;
}
return 0;
}
prev parent reply other threads:[~2026-05-15 23:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
2026-05-15 19:08 ` Jeff King
2026-05-15 19:50 ` Jeff King
2026-05-15 23:01 ` René Scharfe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1d79d7bc-5441-4e72-9cb0-e8900f57172c@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox