* [PATCH 1/2] strbuf: use st_add3() in strbuf_grow()
@ 2026-05-14 15:11 René Scharfe
2026-05-14 15:13 ` [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang René Scharfe
2026-05-14 19:07 ` [PATCH 1/2] strbuf: use st_add3() in strbuf_grow() Junio C Hamano
0 siblings, 2 replies; 8+ messages in thread
From: René Scharfe @ 2026-05-14 15:11 UTC (permalink / raw)
To: Git List
Simplify the code by calling st_add3() to do overflow checks instead of
open-coding it. This changes the error message to include the offending
summands, which can be helpful when tracking down the cause.
Signed-off-by: René Scharfe <l.s.r@web.de>
---
strbuf.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 3e04addc22..bb04d3910e 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -106,12 +106,9 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc)
void strbuf_grow(struct strbuf *sb, size_t extra)
{
int new_buf = !sb->alloc;
- if (unsigned_add_overflows(extra, 1) ||
- unsigned_add_overflows(sb->len, extra + 1))
- die("you want to use way too much memory");
if (new_buf)
sb->buf = NULL;
- ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
+ ALLOC_GROW(sb->buf, st_add3(sb->len, extra, 1), sb->alloc);
if (new_buf)
sb->buf[0] = '\0';
}
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang
2026-05-14 15:11 [PATCH 1/2] strbuf: use st_add3() in strbuf_grow() René Scharfe
@ 2026-05-14 15:13 ` René Scharfe
2026-05-14 19:12 ` Junio C Hamano
2026-05-15 4:40 ` Jeff King
2026-05-14 19:07 ` [PATCH 1/2] strbuf: use st_add3() in strbuf_grow() Junio C Hamano
1 sibling, 2 replies; 8+ messages in thread
From: René Scharfe @ 2026-05-14 15:13 UTC (permalink / raw)
To: Git List; +Cc: Jeff King
Clang and GCC optimize away comparisons of overflow checks by checking
the carry flag on x64. GCC does the same on ARM64, but Clang currently
(version 22.1) doesn't.
Provide a variant of st_add() that wraps __builtin_add_overflow() to
help Clang optimize it. Use it on all platforms for simplicity.
On an Apple M1 I get a nice speedup for a command that builds lots of
strings using a strbuf, which exercises the st_add3() in strbuf_grow()
for every line of output:
Benchmark 1: ./git_main cat-file --batch-all-objects --batch-check='%(objectname)'
Time (mean ± σ): 119.8 ms ± 0.2 ms [User: 113.0 ms, System: 5.8 ms]
Range (min … max): 119.6 ms … 120.4 ms 24 runs
Benchmark 2: ./git cat-file --batch-all-objects --batch-check='%(objectname)'
Time (mean ± σ): 114.6 ms ± 0.1 ms [User: 107.6 ms, System: 6.0 ms]
Range (min … max): 114.4 ms … 114.9 ms 25 runs
Summary
./git cat-file --batch-all-objects --batch-check='%(objectname)' ran
1.05 ± 0.00 times faster than ./git_main cat-file --batch-all-objects --batch-check='%(objectname)'
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
git-compat-util.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/git-compat-util.h b/git-compat-util.h
index ae1bdc90a4..aa088d04bb 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -614,6 +614,17 @@ static inline bool strip_suffix(const char *str, const char *suffix,
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
+/* Help Clang; GCC generates the same code for both variants. */
+#if defined(__clang__)
+static inline size_t st_add(size_t a, size_t b)
+{
+ size_t sum;
+ if (__builtin_add_overflow(a, b, &sum))
+ die("size_t overflow: %"PRIuMAX" + %"PRIuMAX,
+ (uintmax_t)a, (uintmax_t)b);
+ return sum;
+}
+#else
static inline size_t st_add(size_t a, size_t b)
{
if (unsigned_add_overflows(a, b))
@@ -621,6 +632,7 @@ static inline size_t st_add(size_t a, size_t b)
(uintmax_t)a, (uintmax_t)b);
return a + b;
}
+#endif
#define st_add3(a,b,c) st_add(st_add((a),(b)),(c))
#define st_add4(a,b,c,d) st_add(st_add3((a),(b),(c)),(d))
--
2.54.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] strbuf: use st_add3() in strbuf_grow()
2026-05-14 15:11 [PATCH 1/2] strbuf: use st_add3() in strbuf_grow() René Scharfe
2026-05-14 15:13 ` [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang René Scharfe
@ 2026-05-14 19:07 ` Junio C Hamano
2026-05-14 20:13 ` René Scharfe
1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2026-05-14 19:07 UTC (permalink / raw)
To: René Scharfe; +Cc: Git List
René Scharfe <l.s.r@web.de> writes:
> Simplify the code by calling st_add3() to do overflow checks instead of
> open-coding it. This changes the error message to include the offending
> summands, which can be helpful when tracking down the cause.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> strbuf.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/strbuf.c b/strbuf.c
> index 3e04addc22..bb04d3910e 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -106,12 +106,9 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc)
> void strbuf_grow(struct strbuf *sb, size_t extra)
> {
> int new_buf = !sb->alloc;
> - if (unsigned_add_overflows(extra, 1) ||
> - unsigned_add_overflows(sb->len, extra + 1))
> - die("you want to use way too much memory");
> if (new_buf)
> sb->buf = NULL;
> - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
> + ALLOC_GROW(sb->buf, st_add3(sb->len, extra, 1), sb->alloc);
ALLOC_GROW() being a macro that references its second argument three
times, doesn't this rewrite rely on the compiler being clever enough
to notice that checking for the same overflow three times is
pointless and does it only once? I guess the original has the same
issue already, so this may not be so bad but it makes me feel a bit
queasy.
> if (new_buf)
> sb->buf[0] = '\0';
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang
2026-05-14 15:13 ` [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang René Scharfe
@ 2026-05-14 19:12 ` Junio C Hamano
2026-05-14 20:17 ` René Scharfe
2026-05-15 4:40 ` Jeff King
1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2026-05-14 19:12 UTC (permalink / raw)
To: René Scharfe; +Cc: Git List, Jeff King
René Scharfe <l.s.r@web.de> writes:
> Provide a variant of st_add() that wraps __builtin_add_overflow() to
> help Clang optimize it. Use it on all platforms for simplicity.
> ...
> +/* Help Clang; GCC generates the same code for both variants. */
> +#if defined(__clang__)
> +static inline size_t st_add(size_t a, size_t b)
> +{
> + size_t sum;
> + if (__builtin_add_overflow(a, b, &sum))
> + die("size_t overflow: %"PRIuMAX" + %"PRIuMAX,
> + (uintmax_t)a, (uintmax_t)b);
> + return sum;
> +}
> +#else
> static inline size_t st_add(size_t a, size_t b)
> {
> if (unsigned_add_overflows(a, b))
> @@ -621,6 +632,7 @@ static inline size_t st_add(size_t a, size_t b)
> (uintmax_t)a, (uintmax_t)b);
> return a + b;
> }
> +#endif
Makes me wonder if we tweaked unsigned_add_overflows() to take an
extra *dst parameter to match __builtin_add_overflow(), which of
course requires us to all of 18 callsites, it might make the whole
thing a bit simpler. New uses of unsigned_add_overflows(), if we
ever add them, would automatically benefit, right?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] strbuf: use st_add3() in strbuf_grow()
2026-05-14 19:07 ` [PATCH 1/2] strbuf: use st_add3() in strbuf_grow() Junio C Hamano
@ 2026-05-14 20:13 ` René Scharfe
2026-05-15 4:36 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: René Scharfe @ 2026-05-14 20:13 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git List
On 5/14/26 9:07 PM, Junio C Hamano wrote:
> René Scharfe <l.s.r@web.de> writes:
>
>> Simplify the code by calling st_add3() to do overflow checks instead of
>> open-coding it. This changes the error message to include the offending
>> summands, which can be helpful when tracking down the cause.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>> strbuf.c | 5 +----
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/strbuf.c b/strbuf.c
>> index 3e04addc22..bb04d3910e 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -106,12 +106,9 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc)
>> void strbuf_grow(struct strbuf *sb, size_t extra)
>> {
>> int new_buf = !sb->alloc;
>> - if (unsigned_add_overflows(extra, 1) ||
>> - unsigned_add_overflows(sb->len, extra + 1))
>> - die("you want to use way too much memory");
>> if (new_buf)
>> sb->buf = NULL;
>> - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>> + ALLOC_GROW(sb->buf, st_add3(sb->len, extra, 1), sb->alloc);
>
> ALLOC_GROW() being a macro that references its second argument three
> times, doesn't this rewrite rely on the compiler being clever enough
> to notice that checking for the same overflow three times is
> pointless and does it only once? I guess the original has the same
> issue already, so this may not be so bad but it makes me feel a bit
> queasy.
As long as it has no side-effect (as is the case for addition) and we
keep compiler optimization enabled we should be fine.
I guess the reason for having multiple references as opposed to
loading the value once into a private variable is to support arbitrary
types. In the end it needs to fit into a size_t, though. Something
like this could bring the reference count down to one:
--- 8< ---
diff --git a/git-compat-util.h b/git-compat-util.h
index ae1bdc90a4..ca89cfb0b3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -812,6 +812,15 @@ static inline void move_array(void *dst, const void *src, size_t n, size_t size)
#define alloc_nr(x) (((x)+16)*3/2)
+static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *out)
+{
+ if (nr > alloc) {
+ *out = alloc_nr(alloc) < nr ? nr : alloc_nr(alloc);
+ return true;
+ }
+ return false;
+}
+
/**
* Dynamically growing an array using realloc() is error prone and boring.
*
@@ -857,12 +866,10 @@ static inline void move_array(void *dst, const void *src, size_t n, size_t size)
*/
#define ALLOC_GROW(x, nr, alloc) \
do { \
- if ((nr) > alloc) { \
- if (alloc_nr(alloc) < (nr)) \
- alloc = (nr); \
- else \
- alloc = alloc_nr(alloc); \
- REALLOC_ARRAY(x, alloc); \
+ size_t alloc_grow_new_alloc_; \
+ if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
+ alloc = alloc_grow_new_alloc_; \
+ REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
} \
} while (0)
--- >8 ---
Hmm, alloc_nr() doesn't do any overflow checking. It should, though,
shouldn't it?
René
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang
2026-05-14 19:12 ` Junio C Hamano
@ 2026-05-14 20:17 ` René Scharfe
0 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2026-05-14 20:17 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git List, Jeff King
On 5/14/26 9:12 PM, Junio C Hamano wrote:
> René Scharfe <l.s.r@web.de> writes:
>
>> Provide a variant of st_add() that wraps __builtin_add_overflow() to
>> help Clang optimize it. Use it on all platforms for simplicity.
>> ...
>> +/* Help Clang; GCC generates the same code for both variants. */
>> +#if defined(__clang__)
>> +static inline size_t st_add(size_t a, size_t b)
>> +{
>> + size_t sum;
>> + if (__builtin_add_overflow(a, b, &sum))
>> + die("size_t overflow: %"PRIuMAX" + %"PRIuMAX,
>> + (uintmax_t)a, (uintmax_t)b);
>> + return sum;
>> +}
>> +#else
>> static inline size_t st_add(size_t a, size_t b)
>> {
>> if (unsigned_add_overflows(a, b))
>> @@ -621,6 +632,7 @@ static inline size_t st_add(size_t a, size_t b)
>> (uintmax_t)a, (uintmax_t)b);
>> return a + b;
>> }
>> +#endif
>
> Makes me wonder if we tweaked unsigned_add_overflows() to take an
> extra *dst parameter to match __builtin_add_overflow(), which of
> course requires us to all of 18 callsites, it might make the whole
> thing a bit simpler. New uses of unsigned_add_overflows(), if we
> ever add them, would automatically benefit, right?
Hmm. It sounds like a lot of churn, but it would make sure that
we use the checked result and not check a + b and then go on and
use x + y because the code de-synced at some point.
How to do it, though? It needs to be generic and evaluate its
arguments only once. Perhaps like this?
diff --git a/git-compat-util.h b/git-compat-util.h
index ca89cfb0b3..27fbb622d7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -103,6 +103,21 @@ struct strbuf;
#define unsigned_add_overflows(a, b) \
((b) > maximum_unsigned_value_of_type(a) - (a))
+static bool uint_add_overflow(uintmax_t a, uintmax_t b,
+ uintmax_t *out, size_t out_size)
+{
+ if (b > UINTMAX_MAX - a)
+ return true;
+ a += b;
+ if (a > (UINTMAX_MAX >> (bitsizeof(uintmax_t) - CHAR_BIT * out_size)))
+ return true;
+ *out = a;
+ return false;
+}
+
+#define UINT_ADD_OVERFLOW(a, b, out) \
+ uint_add_overflow((a), (b), (out), sizeof(a))
+
/*
* Returns true if the multiplication of "a" and "b" will
* overflow. The types of "a" and "b" must match and must be unsigned.
@@ -616,10 +631,11 @@ int git_open_cloexec(const char *name, int flags);
static inline size_t st_add(size_t a, size_t b)
{
- if (unsigned_add_overflows(a, b))
+ size_t ret;
+ if (UINT_ADD_OVERFLOW(a, b, &ret))
die("size_t overflow: %"PRIuMAX" + %"PRIuMAX,
(uintmax_t)a, (uintmax_t)b);
- return a + b;
+ return ret;
}
#define st_add3(a,b,c) st_add(st_add((a),(b)),(c))
#define st_add4(a,b,c,d) st_add(st_add3((a),(b),(c)),(d))
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] strbuf: use st_add3() in strbuf_grow()
2026-05-14 20:13 ` René Scharfe
@ 2026-05-15 4:36 ` Jeff King
0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2026-05-15 4:36 UTC (permalink / raw)
To: René Scharfe; +Cc: Junio C Hamano, Git List
On Thu, May 14, 2026 at 10:13:19PM +0200, René Scharfe wrote:
> Hmm, alloc_nr() doesn't do any overflow checking. It should, though,
> shouldn't it?
Yes, probably. It's a known blind spot in the overflow checking, but
I think is OK in practice because:
1. We are growing an existing buffer by ~3/2. So even with ordering
the multiplication first, an overflow implies that you have a
single buffer consuming ~1/3 of your address space.
On 64-bit systems that's impractically large, and on 32-bit systems I
think you generally run into fragmentation and address-space issues
first.
2. If alloc_nr(alloc) is less than the desired nr, we just use that nr
directly. So even if we did overflow, I think the result is
too-slow allocation, and not a buffer overflow.
But it would be nice to be less hand-wavy. One of the reasons I hadn't
dug into it further is that I wanted to start making use of intrinsics
to avoid slowdowns. But since you're already doing that (and finding
that the compiler was doing the fast thing anyway!) it might be a good
time to make the jump.
That's all assuming that no overflow happens before ALLOC_GROW() gets
the values. We also tend to do unchecked computions for the "nr" field
there, but it's usually just "nr_foo + 1", so the same logic applies:
you'd have to have an existing array consuming the entire address space
minus one byte to trigger an overflow.
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang
2026-05-14 15:13 ` [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang René Scharfe
2026-05-14 19:12 ` Junio C Hamano
@ 2026-05-15 4:40 ` Jeff King
1 sibling, 0 replies; 8+ messages in thread
From: Jeff King @ 2026-05-15 4:40 UTC (permalink / raw)
To: René Scharfe; +Cc: Git List
On Thu, May 14, 2026 at 05:13:46PM +0200, René Scharfe wrote:
> Clang and GCC optimize away comparisons of overflow checks by checking
> the carry flag on x64. GCC does the same on ARM64, but Clang currently
> (version 22.1) doesn't.
>
> Provide a variant of st_add() that wraps __builtin_add_overflow() to
> help Clang optimize it. Use it on all platforms for simplicity.
OK. I probably would have just used the intrinsic everywhere with
__GNUC__, but if gcc is already figuring it out, it doesn't matter in
practice.
> +/* Help Clang; GCC generates the same code for both variants. */
> +#if defined(__clang__)
> +static inline size_t st_add(size_t a, size_t b)
> +{
> + size_t sum;
> + if (__builtin_add_overflow(a, b, &sum))
> + die("size_t overflow: %"PRIuMAX" + %"PRIuMAX,
> + (uintmax_t)a, (uintmax_t)b);
> + return sum;
> +}
> +#else
> static inline size_t st_add(size_t a, size_t b)
> {
> if (unsigned_add_overflows(a, b))
It's a shame we can't share more code here, especially the die message.
I guess the ideal primitive is probably a wrapper with the same
interface as __builtin_add_overflow(), which could then be used
everywhere that unsigned_add_overflows() with some minor conversion.
But it gets awkward to do as a macro, and using an inline function runs
into type questions.
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-15 4:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 15:11 [PATCH 1/2] strbuf: use st_add3() in strbuf_grow() René Scharfe
2026-05-14 15:13 ` [PATCH 2/2] use __builtin_add_overflow() in st_add() with Clang René Scharfe
2026-05-14 19:12 ` Junio C Hamano
2026-05-14 20:17 ` René Scharfe
2026-05-15 4:40 ` Jeff King
2026-05-14 19:07 ` [PATCH 1/2] strbuf: use st_add3() in strbuf_grow() Junio C Hamano
2026-05-14 20:13 ` René Scharfe
2026-05-15 4:36 ` Jeff King
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.