All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] evaluate the second argument of ALLOC_GROW only once
@ 2026-05-15 18:16 René Scharfe
  2026-05-15 19:08 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: René Scharfe @ 2026-05-15 18:16 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

Allow the new element count passed to ALLOC_GROW to be a complex
expression with side-effects by evaluating it only once, as a parameter
to a new helper function.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 git-compat-util.h | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index ae1bdc90a4..2bc1f43f48 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -812,6 +812,16 @@ static inline void move_array(void *dst, const void *src, size_t n, size_t size)
 
 #define alloc_nr(x) (((x)+16)*3/2)
 
+static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
+{
+	if (nr > alloc) {
+		size_t out = alloc_nr(alloc);
+		*outp = out < nr ? nr : out;
+		return true;
+	}
+	return false;
+}
+
 /**
  * Dynamically growing an array using realloc() is error prone and boring.
  *
@@ -857,12 +867,10 @@ static inline void move_array(void *dst, const void *src, size_t n, size_t size)
  */
 #define ALLOC_GROW(x, nr, alloc) \
 	do { \
-		if ((nr) > alloc) { \
-			if (alloc_nr(alloc) < (nr)) \
-				alloc = (nr); \
-			else \
-				alloc = alloc_nr(alloc); \
-			REALLOC_ARRAY(x, alloc); \
+		size_t alloc_grow_new_alloc_; \
+		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
+			alloc = alloc_grow_new_alloc_; \
+			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
 		} \
 	} while (0)
 
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
  2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
@ 2026-05-15 19:08 ` Jeff King
  2026-05-15 19:50   ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2026-05-15 19:08 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano

On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:

> +		size_t alloc_grow_new_alloc_; \
> +		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
> +			alloc = alloc_grow_new_alloc_; \
> +			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
>  		} \

What happens if a caller passes in an argument that isn't a size_t?
We'll check for overflow in the size_t space, and then truncate it when
we assign to alloc, I think.

I think we generally try to hold allocations in size_t these days, but
I'd be surprised if there weren't a few "int" holdouts. Grepping around,
alloc_node() seems to be an example.

BTW, non-size_t arguments nullifies my earlier hand-waving around "nr +
1 overflowing implies we've filled up the address space". But we are
still protected in the existing code by the:

  if (alloc_nr(alloc) < (nr))
	alloc = (nr);

logic. But with your patch, that all happens in the size_t space, so I
think it would actually introduce possible array overflows when the
caller is using a smaller type.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
  2026-05-15 19:08 ` Jeff King
@ 2026-05-15 19:50   ` Jeff King
  2026-05-15 23:01     ` René Scharfe
  2026-05-16  6:55     ` Johannes Sixt
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff King @ 2026-05-15 19:50 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano

On Fri, May 15, 2026 at 03:08:18PM -0400, Jeff King wrote:

> On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:
> 
> > +		size_t alloc_grow_new_alloc_; \
> > +		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
> > +			alloc = alloc_grow_new_alloc_; \
> > +			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
> >  		} \
> 
> What happens if a caller passes in an argument that isn't a size_t?
> We'll check for overflow in the size_t space, and then truncate it when
> we assign to alloc, I think.
> 
> I think we generally try to hold allocations in size_t these days, but
> I'd be surprised if there weren't a few "int" holdouts. Grepping around,
> alloc_node() seems to be an example.
> 
> BTW, non-size_t arguments nullifies my earlier hand-waving around "nr +
> 1 overflowing implies we've filled up the address space". But we are
> still protected in the existing code by the:
> 
>   if (alloc_nr(alloc) < (nr))
> 	alloc = (nr);
> 
> logic. But with your patch, that all happens in the size_t space, so I
> think it would actually introduce possible array overflows when the
> caller is using a smaller type.

Hmm, playing with it and looking a little closer, I think we don't end
up overflowing the buffer because you use the size_t for
REALLOC_ARRAY(). So the result is big, but then "alloc" is truncated.

And then on the next call, we think "oh no, the allocation is way too
small" because we are using the truncated value. So we try to size up
for every single allocation, even though it's actually big enough, and
the program slows to a crawl. ;)

For reference, IU was using this hack to play around and demonstrate:

diff --git a/git.c b/git.c
index 5a40eab8a2..638bbc69e4 100644
--- a/git.c
+++ b/git.c
@@ -969,6 +969,19 @@ int cmd_main(int argc, const char **argv)
 
 	cmd = argv[0];
 
+	if (!strcmp(cmd, "foo")) {
+		unsigned char *buf = NULL;
+		unsigned nr = 0, alloc = 0;
+		for (unsigned i = 0; i < UINT_MAX; i++) {
+			ALLOC_GROW(buf, nr + 1, alloc);
+			if (i % 313370 == 0)
+				warning("at i=%u, alloc=%u, nr=%u", i, alloc, nr);
+			buf[nr++] = i % 256;
+		}
+		printf("done, final nr=%u, alloc=%u\n", nr, alloc);
+		return 0;
+	}
+
 	/*
 	 * We use PATH to find git commands, but we prepend some higher
 	 * precedence paths: the "--exec-path" option, the GIT_EXEC_PATH


The same problem exists in several places in actual code, but I'm not
sure how practical it is to trigger. The alloc_node() is counting not
just objects, but blocks of objects. So you'd need 2*31 * 1024 objects
of one type, which is probably going to run afoul of other limitations.
Other cases are similar; for example "yes | git fetch-pack --stdin foo"
will grow an array indefinitely, but at one strbuf per line it starts
swapping on my 64GB machine at only 350M entries.

I think as long as the behavior remains "slow, but we do not overflow
any buffers" when you reach these limits, that's OK. Nobody is going to
do it in practice, and we just want to make sure that malicious inputs
cannot get out-of-bounds writes. It might be worth adding a comment,
though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
"alloc" in that macro.

-Peff

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
  2026-05-15 19:50   ` Jeff King
@ 2026-05-15 23:01     ` René Scharfe
  2026-05-16  2:51       ` Jeff King
  2026-05-16  6:55     ` Johannes Sixt
  1 sibling, 1 reply; 6+ messages in thread
From: René Scharfe @ 2026-05-15 23:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano

On 5/15/26 9:50 PM, Jeff King wrote:
> On Fri, May 15, 2026 at 03:08:18PM -0400, Jeff King wrote:
> 
>> On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:
>>
>>> +		size_t alloc_grow_new_alloc_; \
>>> +		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>>> +			alloc = alloc_grow_new_alloc_; \
>>> +			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
>>>  		} \
>>
>> What happens if a caller passes in an argument that isn't a size_t?
>> We'll check for overflow in the size_t space, and then truncate it when
>> we assign to alloc, I think.
>>
>> I think we generally try to hold allocations in size_t these days, but
>> I'd be surprised if there weren't a few "int" holdouts. Grepping around,
>> alloc_node() seems to be an example.
>>
>> BTW, non-size_t arguments nullifies my earlier hand-waving around "nr +
>> 1 overflowing implies we've filled up the address space". But we are
>> still protected in the existing code by the:
>>
>>   if (alloc_nr(alloc) < (nr))
>> 	alloc = (nr);
>>
>> logic. But with your patch, that all happens in the size_t space, so I
>> think it would actually introduce possible array overflows when the
>> caller is using a smaller type.
> 
> Hmm, playing with it and looking a little closer, I think we don't end
> up overflowing the buffer because you use the size_t for
> REALLOC_ARRAY(). So the result is big, but then "alloc" is truncated.
> 
> And then on the next call, we think "oh no, the allocation is way too
> small" because we are using the truncated value. So we try to size up
> for every single allocation, even though it's actually big enough, and
> the program slows to a crawl. ;)
> 
> For reference, IU was using this hack to play around and demonstrate:
> 
> diff --git a/git.c b/git.c
> index 5a40eab8a2..638bbc69e4 100644
> --- a/git.c
> +++ b/git.c
> @@ -969,6 +969,19 @@ int cmd_main(int argc, const char **argv)
>  
>  	cmd = argv[0];
>  
> +	if (!strcmp(cmd, "foo")) {
> +		unsigned char *buf = NULL;
> +		unsigned nr = 0, alloc = 0;
> +		for (unsigned i = 0; i < UINT_MAX; i++) {
> +			ALLOC_GROW(buf, nr + 1, alloc);
> +			if (i % 313370 == 0)
> +				warning("at i=%u, alloc=%u, nr=%u", i, alloc, nr);
> +			buf[nr++] = i % 256;
> +		}
> +		printf("done, final nr=%u, alloc=%u\n", nr, alloc);
> +		return 0;
> +	}
> +
>  	/*
>  	 * We use PATH to find git commands, but we prepend some higher
>  	 * precedence paths: the "--exec-path" option, the GIT_EXEC_PATH
> 
> 
> The same problem exists in several places in actual code, but I'm not
> sure how practical it is to trigger. The alloc_node() is counting not
> just objects, but blocks of objects. So you'd need 2*31 * 1024 objects
> of one type, which is probably going to run afoul of other limitations.
> Other cases are similar; for example "yes | git fetch-pack --stdin foo"
> will grow an array indefinitely, but at one strbuf per line it starts
> swapping on my 64GB machine at only 350M entries.
> 
> I think as long as the behavior remains "slow, but we do not overflow
> any buffers" when you reach these limits, that's OK. Nobody is going to
> do it in practice, and we just want to make sure that malicious inputs
> cannot get out-of-bounds writes. It might be worth adding a comment,
> though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
> "alloc" in that macro.
There is no overflow check in either version (yet), so neither is safe
to operate close to the boundary.  Close meaning the intermediate term
(alloc + 16) * 3 being bigger than the maximum value.

Does the size_t arithmetic make matters worse?  The only change I can
see is that it interprets negative values as big unsigned ones and
then doesn't reallocate.  The outcome for positive values is the same,
overflow and all, no?

Here's a demo program exercising the arithmetic part of the macros:


#include <limits.h>
#include <stdbool.h>
#include <stdio.h>

#define alloc_nr(x) (((x)+16)*3/2)

#define ALLOC_GROW1(x, nr, alloc) \
	do { \
		if ((nr) > alloc) { \
			if (alloc_nr(alloc) < (nr)) \
				alloc = (nr); \
			else \
				alloc = alloc_nr(alloc); \
			x = true; \
		} \
	} while (0)

static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
{
	if (nr > alloc) {
		size_t out = alloc_nr(alloc);
		*outp = out < nr ? nr : out;
		return true;
	}
	return false;
}


#define ALLOC_GROW2(x, nr, alloc) \
	do { \
		size_t alloc_grow_new_alloc_; \
		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
			alloc = alloc_grow_new_alloc_; \
			x = true; \
		} \
	} while (0)

#define T signed char
#define MIN 0
#define MAX SCHAR_MAX

int main(int argc, char **argv)
{
	for (T i = 0;; i++) {
		for (T j = MIN;; j++) {
			T alloc1 = j, alloc2 = j;
			bool allocated1 = false, allocated2 = false;
			ALLOC_GROW1(allocated1, i, alloc1);
			ALLOC_GROW2(allocated2, i, alloc2);
			if (alloc1 != alloc2 || allocated1 != allocated2)
				printf("%zu %zu %d %zu %d %zu\n",
				       (size_t)i, (size_t)j,
				       allocated1, (size_t)alloc1,
				       allocated2, (size_t)alloc2);
			if (j == MAX)
				break;
		}
		if (i == MAX)
			break;
	}
	return 0;
}



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
  2026-05-15 23:01     ` René Scharfe
@ 2026-05-16  2:51       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2026-05-16  2:51 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano

On Sat, May 16, 2026 at 01:01:05AM +0200, René Scharfe wrote:

> > I think as long as the behavior remains "slow, but we do not overflow
> > any buffers" when you reach these limits, that's OK. Nobody is going to
> > do it in practice, and we just want to make sure that malicious inputs
> > cannot get out-of-bounds writes. It might be worth adding a comment,
> > though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
> > "alloc" in that macro.
> There is no overflow check in either version (yet), so neither is safe
> to operate close to the boundary.  Close meaning the intermediate term
> (alloc + 16) * 3 being bigger than the maximum value.

Yes, but for some definition of safe. Both before and after your patch,
as we get close to the boundary the allocation will grow slower than it
should, but we'll never write out of bounds. The behavior for the "git
foo" I showed earlier is slightly different:

  - before your patch, ~2GB we stop doubling and instead start growing
    the array by one at each ALLOC_GROW() call. This is because
    alloc_nr() overflows to a small value, but the:

      if (alloc_nr(alloc) < (nr))
              alloc = (nr);

    check kicks in.

  - after your patch we grow to ~4GB, and then things get super slow.
    This is because we correctly compute the new allocation as a size_t,
    but then truncate it while assigning to alloc. So on the next
    ALLOC_GROW() call, we'll think the buffer is way too small and try
    to realloc again. I don't know why this is so much slower than the
    grow-by-one above, but it is.

Neither is really correct, but both are in the realm of OK: stupidly
large input doesn't perform well, but there's no buffer overflow
vulnerability.

What I was worried about is what happens if you tweak your patch like
this:

diff --git a/git-compat-util.h b/git-compat-util.h
index 2bc1f43f48..0730dd24ad 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -870,7 +870,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
 		size_t alloc_grow_new_alloc_; \
 		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
 			alloc = alloc_grow_new_alloc_; \
-			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
+			REALLOC_ARRAY(x, alloc); \
 		} \
 	} while (0)
 

In that case we really do end up with too-small allocations and
out-of-bounds writes.

Maybe you saw that coming and that's why you wrote it as you did. But it
is definitely subtle enough that I think it would merit a big warning
comment that "alloc" and "alloc_grow_new_alloc_" are not necessarily the
same type, and hence not necessarily the same value.

> Here's a demo program exercising the arithmetic part of the macros:

I think the difference isn't in the arithmetic values that come out, but
in what is fed to realloc() itself. And in your harness, realloc is just
"x = true". If you actually store the value that would be passed to
realloc() like this:

diff --git a/foo.c.orig b/foo.c
index 2fbce8c..7498f36 100644
--- a/foo.c.orig
+++ b/foo.c
@@ -11,7 +11,7 @@
 				alloc = (nr); \
 			else \
 				alloc = alloc_nr(alloc); \
-			x = true; \
+			x = alloc; \
 		} \
 	} while (0)
 
@@ -31,7 +31,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
 		size_t alloc_grow_new_alloc_; \
 		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
 			alloc = alloc_grow_new_alloc_; \
-			x = true; \
+			x = alloc_grow_new_alloc_; \
 		} \
 	} while (0)
 
@@ -44,7 +44,7 @@ int main(int argc, char **argv)
 	for (T i = 0;; i++) {
 		for (T j = MIN;; j++) {
 			T alloc1 = j, alloc2 = j;
-			bool allocated1 = false, allocated2 = false;
+			size_t allocated1 = 0, allocated2 = 0;
 			ALLOC_GROW1(allocated1, i, alloc1);
 			ALLOC_GROW2(allocated2, i, alloc2);
 			if (alloc1 != alloc2 || allocated1 != allocated2)

then you see the differences. For negative values, yeah, you end up with
big size_t values. But for an unsigned type you get different small
allocations.

-Peff

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
  2026-05-15 19:50   ` Jeff King
  2026-05-15 23:01     ` René Scharfe
@ 2026-05-16  6:55     ` Johannes Sixt
  1 sibling, 0 replies; 6+ messages in thread
From: Johannes Sixt @ 2026-05-16  6:55 UTC (permalink / raw)
  To: Jeff King, René Scharfe; +Cc: Git List, Junio C Hamano

Am 15.05.26 um 21:50 schrieb Jeff King:
> On Fri, May 15, 2026 at 03:08:18PM -0400, Jeff King wrote:
> 
>> On Fri, May 15, 2026 at 08:16:50PM +0200, René Scharfe wrote:
>>
>>> +		size_t alloc_grow_new_alloc_; \
>>> +		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>>> +			alloc = alloc_grow_new_alloc_; \
>>> +			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
>>>  		} \
>>
>> What happens if a caller passes in an argument that isn't a size_t?
>> We'll check for overflow in the size_t space, and then truncate it when
>> we assign to alloc, I think.

> 
> Hmm, playing with it and looking a little closer, I think we don't end
> up overflowing the buffer because you use the size_t for
> REALLOC_ARRAY(). So the result is big, but then "alloc" is truncated.

Protect against double-evaluation of "alloc", too, using

	size_t *palloc = &(alloc);

and use *palloc in the two places, then all callers are forced to work
with a size_t as third argument. Don't know what the damage would be,
though.

-- Hannes


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-16  6:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
2026-05-15 19:08 ` Jeff King
2026-05-15 19:50   ` Jeff King
2026-05-15 23:01     ` René Scharfe
2026-05-16  2:51       ` Jeff King
2026-05-16  6:55     ` Johannes Sixt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.