[PATCH] pretty: drop strbuf pre-sizing from add

Git development
 help / color / mirror / Atom feed

* [PATCH] pretty: drop strbuf pre-sizing from add_rfc2047()
@ 2026-05-12 16:20 Jeff King
  2026-05-13  1:03 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2026-05-12 16:20 UTC (permalink / raw)
  To: git; +Cc: Luke Martin

At the top of add_rfc2047() we do this:

  strbuf_grow(sb, len * 3 + strlen(encoding) + 100);

where "len" is the size of the header (like an author name) we are about
to encode into the buffer. This pre-sizing is purely an optimization; we
use strbuf_addf() and friends to actually write into the buffer, and
they will grow the buffer as necessary.

But there's a problem with the code above: the input can be arbitrarily
large, so we might overflow a size_t while doing that computation,
ending up with a too-small allocation request. Overflowing requires an
impractically large input on a 64-bit system, but is easy to demonstrate
on a 32-bit system with a commit whose author name is ~1.4GB.

Because this pre-sizing is just an optimization, there's no real harm.
We'll start with a smaller buffer and grow it as necessary. But it
_looks_ like a vulnerability, since some other code may pre-size a
strbuf and then write directly into its buffer. So it's worth avoiding
the overflow in the first place.

The obvious way to do that is via checked operations like st_add() and
friends. But taking a step back, is this pre-sizing actually helping
anything?

The computation goes all the way back to 4234a76167 (Extend
--pretty=oneline to cover the first paragraph,, 2007-06-11), but back
then we really were sizing the array to write into directly! In
674d172730 (Rework pretty_print_commit to use strbufs instead of custom
buffers., 2007-09-10) that switched to a strbuf, and at that point it
was a pure optimization.

Is the optimization helping? I don't think so. Even for a gigantic case
like the 1.4GB author name, I couldn't measure any slowdown when
removing it. And most input will be much smaller, and added to a running
strbuf containing the rest of the email-header output. We can just rely
on strbuf's usual amortized-linear growth.

So deleting the line seems like the best way to go. It eliminates the
integer overflow and makes the code a tiny bit simpler.

Reported-by: Luke Martin <lmartin@paramenoeng.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 pretty.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/pretty.c b/pretty.c
index 814803980b..7328aecf5d 100644
--- a/pretty.c
+++ b/pretty.c
@@ -399,7 +399,6 @@ static void add_rfc2047(struct strbuf *sb, const char *line, size_t len,
 	int i;
 	int line_len = last_line_length(sb);

-	strbuf_grow(sb, len * 3 + strlen(encoding) + 100);
 	strbuf_addf(sb, "=?%s?q?", encoding);
 	line_len += strlen(encoding) + 5; /* 5 for =??q? */

-- 
2.54.0.420.gf0bcdff42b

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] pretty: drop strbuf pre-sizing from add_rfc2047()
  2026-05-12 16:20 [PATCH] pretty: drop strbuf pre-sizing from add_rfc2047() Jeff King
@ 2026-05-13  1:03 ` Junio C Hamano
  2026-05-13 18:54   ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2026-05-13  1:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Luke Martin

Jeff King <peff@peff.net> writes:

> At the top of add_rfc2047() we do this:
>
>   strbuf_grow(sb, len * 3 + strlen(encoding) + 100);
>
> where "len" is the size of the header (like an author name) we are about
> to encode into the buffer. This pre-sizing is purely an optimization; we
> use strbuf_addf() and friends to actually write into the buffer, and
> they will grow the buffer as necessary.
> ...
> Is the optimization helping? I don't think so. Even for a gigantic case
> like the 1.4GB author name, I couldn't measure any slowdown when
> removing it. And most input will be much smaller, and added to a running
> strbuf containing the rest of the email-header output. We can just rely
> on strbuf's usual amortized-linear growth.
>
> So deleting the line seems like the best way to go. It eliminates the
> integer overflow and makes the code a tiny bit simpler.

Very nice.

Someday we may want to go through the output from

    $ git grep -e 'strbuf_grow(' \*.c

and remove this ineffective presizing.  I think any call to
strbuf_grow() that is immediately followed by a call to
strbuf_addX() is suspect, like in the following illustration (there
are others in these files).

diff --git c/apply.c w/apply.c
index 3de4aa4d2e..b0f146276e 100644
--- c/apply.c
+++ w/apply.c
@@ -3305,7 +3305,6 @@ static int apply_fragments(struct apply_state *state, struct image *img, struct
 static int read_blob_object(struct strbuf *buf, const struct object_id *oid, unsigned mode)
 {
 	if (S_ISGITLINK(mode)) {
-		strbuf_grow(buf, 100);
 		strbuf_addf(buf, "Subproject commit %s\n", oid_to_hex(oid));
 	} else {
 		enum object_type type;
diff --git c/archive.c w/archive.c
index fcd474c682..47b8725f0a 100644
--- c/archive.c
+++ w/archive.c
@@ -164,7 +164,6 @@ static int write_archive_entry(const struct object_id *oid, const char *base,
 
 	args->convert = 0;
 	strbuf_reset(&path);
-	strbuf_grow(&path, PATH_MAX);
 	strbuf_add(&path, args->base, args->baselen);
 	strbuf_add(&path, base, baselen);
 	strbuf_addstr(&path, filename);






^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] pretty: drop strbuf pre-sizing from add_rfc2047()
  2026-05-13  1:03 ` Junio C Hamano
@ 2026-05-13 18:54   ` Jeff King
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff King @ 2026-05-13 18:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Luke Martin

On Wed, May 13, 2026 at 10:03:31AM +0900, Junio C Hamano wrote:

> Someday we may want to go through the output from
> 
>     $ git grep -e 'strbuf_grow(' \*.c
> 
> and remove this ineffective presizing.  I think any call to
> strbuf_grow() that is immediately followed by a call to
> strbuf_addX() is suspect, like in the following illustration (there
> are others in these files).

Yup. I think this could be #leftoverbits material, but anybody who wants
to pick this up should be careful to read through the whole function and
make sure there's no subtle dependency on the grown buffer.

Skimming through, it looks like most are just leftovers from when old
code was converted to strbuf, and the pre-growth was kept mostly out of
conservatism.

Some of them are truly ugly to look at, like:

  http-backend.c: strbuf_grow(&buf, cnt * 53 + 2);

and I think in some cases we can even drop some surrounding code, like:

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 070a5af3e4..aa32ebc8ab 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1337,21 +1337,14 @@ static int tecmp1 (const void *_a, const void *_b)
 
 static void mktree(struct tree_content *t, int v, struct strbuf *b)
 {
-	size_t maxlen = 0;
 	unsigned int i;
 
 	if (!v)
 		QSORT(t->entries, t->entry_count, tecmp0);
 	else
 		QSORT(t->entries, t->entry_count, tecmp1);
 
-	for (i = 0; i < t->entry_count; i++) {
-		if (t->entries[i]->versions[v].mode)
-			maxlen += t->entries[i]->name->str_len + 34;
-	}
-
 	strbuf_reset(b);
-	strbuf_grow(b, maxlen);
 	for (i = 0; i < t->entry_count; i++) {
 		struct tree_entry *e = t->entries[i];
 		if (!e->versions[v].mode)

So probably some satisfying cleanup opportunities available for somebody
who wants to spend a little time with it. ;)

-Peff

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-13 18:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 16:20 [PATCH] pretty: drop strbuf pre-sizing from add_rfc2047() Jeff King
2026-05-13  1:03 ` Junio C Hamano
2026-05-13 18:54   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox