Re: [PATCH] evaluate the second argument of ALLOC_GROW only once

Git development
 help / color / mirror / Atom feed

From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
Date: Sat, 16 May 2026 01:01:05 +0200	[thread overview]
Message-ID: <1d79d7bc-5441-4e72-9cb0-e8900f57172c@web.de> (raw)
In-Reply-To: <20260515195049.GA149960@coredump.intra.peff.net>

On 5/15/26 9:50 PM, Jeff King wrote:
> On Fri, May 15, 2026 at 03:08:18PM -0400, Jeff King wrote:
> 
>> On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:
>>
>>> +		size_t alloc_grow_new_alloc_; \
>>> +		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>>> +			alloc = alloc_grow_new_alloc_; \
>>> +			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
>>>  		} \
>>
>> What happens if a caller passes in an argument that isn't a size_t?
>> We'll check for overflow in the size_t space, and then truncate it when
>> we assign to alloc, I think.
>>
>> I think we generally try to hold allocations in size_t these days, but
>> I'd be surprised if there weren't a few "int" holdouts. Grepping around,
>> alloc_node() seems to be an example.
>>
>> BTW, non-size_t arguments nullifies my earlier hand-waving around "nr +
>> 1 overflowing implies we've filled up the address space". But we are
>> still protected in the existing code by the:
>>
>>   if (alloc_nr(alloc) < (nr))
>> 	alloc = (nr);
>>
>> logic. But with your patch, that all happens in the size_t space, so I
>> think it would actually introduce possible array overflows when the
>> caller is using a smaller type.
> 
> Hmm, playing with it and looking a little closer, I think we don't end
> up overflowing the buffer because you use the size_t for
> REALLOC_ARRAY(). So the result is big, but then "alloc" is truncated.
> 
> And then on the next call, we think "oh no, the allocation is way too
> small" because we are using the truncated value. So we try to size up
> for every single allocation, even though it's actually big enough, and
> the program slows to a crawl. ;)
> 
> For reference, IU was using this hack to play around and demonstrate:
> 
> diff --git a/git.c b/git.c
> index 5a40eab8a2..638bbc69e4 100644
> --- a/git.c
> +++ b/git.c
> @@ -969,6 +969,19 @@ int cmd_main(int argc, const char **argv)
>  
>  	cmd = argv[0];
>  
> +	if (!strcmp(cmd, "foo")) {
> +		unsigned char *buf = NULL;
> +		unsigned nr = 0, alloc = 0;
> +		for (unsigned i = 0; i < UINT_MAX; i++) {
> +			ALLOC_GROW(buf, nr + 1, alloc);
> +			if (i % 313370 == 0)
> +				warning("at i=%u, alloc=%u, nr=%u", i, alloc, nr);
> +			buf[nr++] = i % 256;
> +		}
> +		printf("done, final nr=%u, alloc=%u\n", nr, alloc);
> +		return 0;
> +	}
> +
>  	/*
>  	 * We use PATH to find git commands, but we prepend some higher
>  	 * precedence paths: the "--exec-path" option, the GIT_EXEC_PATH
> 
> 
> The same problem exists in several places in actual code, but I'm not
> sure how practical it is to trigger. The alloc_node() is counting not
> just objects, but blocks of objects. So you'd need 2*31 * 1024 objects
> of one type, which is probably going to run afoul of other limitations.
> Other cases are similar; for example "yes | git fetch-pack --stdin foo"
> will grow an array indefinitely, but at one strbuf per line it starts
> swapping on my 64GB machine at only 350M entries.
> 
> I think as long as the behavior remains "slow, but we do not overflow
> any buffers" when you reach these limits, that's OK. Nobody is going to
> do it in practice, and we just want to make sure that malicious inputs
> cannot get out-of-bounds writes. It might be worth adding a comment,
> though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
> "alloc" in that macro.
There is no overflow check in either version (yet), so neither is safe
to operate close to the boundary.  Close meaning the intermediate term
(alloc + 16) * 3 being bigger than the maximum value.

Does the size_t arithmetic make matters worse?  The only change I can
see is that it interprets negative values as big unsigned ones and
then doesn't reallocate.  The outcome for positive values is the same,
overflow and all, no?

Here's a demo program exercising the arithmetic part of the macros:


#include <limits.h>
#include <stdbool.h>
#include <stdio.h>

#define alloc_nr(x) (((x)+16)*3/2)

#define ALLOC_GROW1(x, nr, alloc) \
	do { \
		if ((nr) > alloc) { \
			if (alloc_nr(alloc) < (nr)) \
				alloc = (nr); \
			else \
				alloc = alloc_nr(alloc); \
			x = true; \
		} \
	} while (0)

static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
{
	if (nr > alloc) {
		size_t out = alloc_nr(alloc);
		*outp = out < nr ? nr : out;
		return true;
	}
	return false;
}


#define ALLOC_GROW2(x, nr, alloc) \
	do { \
		size_t alloc_grow_new_alloc_; \
		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
			alloc = alloc_grow_new_alloc_; \
			x = true; \
		} \
	} while (0)

#define T signed char
#define MIN 0
#define MAX SCHAR_MAX

int main(int argc, char **argv)
{
	for (T i = 0;; i++) {
		for (T j = MIN;; j++) {
			T alloc1 = j, alloc2 = j;
			bool allocated1 = false, allocated2 = false;
			ALLOC_GROW1(allocated1, i, alloc1);
			ALLOC_GROW2(allocated2, i, alloc2);
			if (alloc1 != alloc2 || allocated1 != allocated2)
				printf("%zu %zu %d %zu %d %zu\n",
				       (size_t)i, (size_t)j,
				       allocated1, (size_t)alloc1,
				       allocated2, (size_t)alloc2);
			if (j == MAX)
				break;
		}
		if (i == MAX)
			break;
	}
	return 0;
}

     prev parent reply	other threads:[~2026-05-15 23:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
2026-05-15 19:08 ` Jeff King
2026-05-15 19:50   ` Jeff King
2026-05-15 23:01     ` René Scharfe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d79d7bc-5441-4e72-9cb0-e8900f57172c@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox