* [PATCH] rerere: update to modern representation of empty strbufs
@ 2026-03-19 7:15 Junio C Hamano
2026-03-19 7:57 ` Patrick Steinhardt
2026-03-19 22:14 ` [RFC] cocci: .buf in a strbuf object can never be NULL Junio C Hamano
0 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2026-03-19 7:15 UTC (permalink / raw)
To: git
Back when b4833a2c (rerere: Fix use of an empty strbuf.buf,
2007-09-26) was written, a freshly initialized empty strbuf
had NULL in its .buf member, with .len set to 0. The code this
patch touches in rerere.c was written to _fix_ the original code
that assumed that the .buf member is always pointing at a NUL-terminated
string, even for an empty string, which did not hold back then.
That changed in b315c5c0 (strbuf change: be sure ->buf is never ever
NULL., 2007-09-27), and it has again become safe to assume that .buf
is never NULL, and .buf[0] has '\0' for an empty string (i.e., a
strbuf with its .len member set to 0).
A funny thing is, this piece of code has been moved around from
builtin-rerere.c to rerere.c and also adjusted for updates to the
hash function API over the years, but nobody bothered to question
if this special casing for an empty strbuf was still necessary:
b4833a2c62 (rerere: Fix use of an empty strbuf.buf, 2007-09-26)
5b2fd95606 (rerere: Separate libgit and builtin functions, 2008-07-09)
9126f0091f (fix openssl headers conflicting with custom SHA1 implementations, 2008-10-01)
c0f16f8e14 (rerere: factor out handle_conflict function, 2018-08-05)
0d7c419a94 (rerere: convert to use the_hash_algo, 2018-10-15)
0578f1e66a (global: adapt callers to use generic hash context helpers, 2025-01-31)
Finally get rid of the special casing that was unnecessary for the
last 19 years.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
rerere.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/rerere.c b/rerere.c
index 6ec55964e2..0296700f9f 100644
--- a/rerere.c
+++ b/rerere.c
@@ -403,12 +403,8 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
strbuf_addbuf(out, &two);
rerere_strbuf_putconflict(out, '>', marker_size);
if (ctx) {
- git_hash_update(ctx, one.buf ?
- one.buf : "",
- one.len + 1);
- git_hash_update(ctx, two.buf ?
- two.buf : "",
- two.len + 1);
+ git_hash_update(ctx, one.buf, one.len + 1);
+ git_hash_update(ctx, two.buf, two.len + 1);
}
break;
} else if (hunk == RR_SIDE_1)
--
2.53.0-781-gf5b2cca52b
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH] rerere: update to modern representation of empty strbufs 2026-03-19 7:15 [PATCH] rerere: update to modern representation of empty strbufs Junio C Hamano @ 2026-03-19 7:57 ` Patrick Steinhardt 2026-03-19 22:14 ` [RFC] cocci: .buf in a strbuf object can never be NULL Junio C Hamano 1 sibling, 0 replies; 20+ messages in thread From: Patrick Steinhardt @ 2026-03-19 7:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, Mar 19, 2026 at 12:15:59AM -0700, Junio C Hamano wrote: > Back when b4833a2c (rerere: Fix use of an empty strbuf.buf, > 2007-09-26) was written, a freshly initialized empty strbuf > had NULL in its .buf member, with .len set to 0. The code this > patch touches in rerere.c was written to _fix_ the original code > that assumed that the .buf member is always pointing at a NUL-terminated > string, even for an empty string, which did not hold back then. > > That changed in b315c5c0 (strbuf change: be sure ->buf is never ever > NULL., 2007-09-27), and it has again become safe to assume that .buf > is never NULL, and .buf[0] has '\0' for an empty string (i.e., a > strbuf with its .len member set to 0). Right. We always require strbufs to be initialized, and that will always cause us to set the `.buf` member. > diff --git a/rerere.c b/rerere.c > index 6ec55964e2..0296700f9f 100644 > --- a/rerere.c > +++ b/rerere.c > @@ -403,12 +403,8 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io, > strbuf_addbuf(out, &two); > rerere_strbuf_putconflict(out, '>', marker_size); > if (ctx) { > - git_hash_update(ctx, one.buf ? > - one.buf : "", > - one.len + 1); > - git_hash_update(ctx, two.buf ? > - two.buf : "", > - two.len + 1); > + git_hash_update(ctx, one.buf, one.len + 1); > + git_hash_update(ctx, two.buf, two.len + 1); > } Yup, this should be indeed equivalent given that we declare the `.buf` thing as `char strbuf_slopbuf[1]`. So we'll get a single NUL byte here, which is the same as the empty string. So this looks good to me, thanks! Patrick ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-19 7:15 [PATCH] rerere: update to modern representation of empty strbufs Junio C Hamano 2026-03-19 7:57 ` Patrick Steinhardt @ 2026-03-19 22:14 ` Junio C Hamano 2026-03-19 23:35 ` Jeff King 2026-03-21 16:24 ` Junio C Hamano 1 sibling, 2 replies; 20+ messages in thread From: Junio C Hamano @ 2026-03-19 22:14 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > Subject: Re: [PATCH] rerere: update to modern representation of empty strbufs > > Finally get rid of the special casing that was unnecessary for the > last 19 years. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > rerere.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/rerere.c b/rerere.c > index 6ec55964e2..0296700f9f 100644 > --- a/rerere.c > +++ b/rerere.c > @@ -403,12 +403,8 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io, > strbuf_addbuf(out, &two); > rerere_strbuf_putconflict(out, '>', marker_size); > if (ctx) { > - git_hash_update(ctx, one.buf ? > - one.buf : "", > - one.len + 1); > - git_hash_update(ctx, two.buf ? > - two.buf : "", > - two.len + 1); > + git_hash_update(ctx, one.buf, one.len + 1); > + git_hash_update(ctx, two.buf, two.len + 1); > } > break; > } else if (hunk == RR_SIDE_1) I wrote a trivial Coccinele rule (attached at the end) to rewrite SB.buf ? SB.buf : "" into SB.buf and this found only the above instance, which is good. However, a related rule, "it is nonsense to expect that SB.buf could sometimes be false", finds two questionable instances. One is in list-objects-filter-options.c::parse_list_objects_filter() void parse_list_objects_filter( struct list_objects_filter_options *filter_options, const char *arg) { struct strbuf errbuf = STRBUF_INIT; if (!filter_options->filter_spec.buf) BUG("filter_options not properly initialized"); The filter_options variable points at a list_objects_filter_options structure, which has an embedded "struct strbuf". This BUG() is unnecessary if the structure is properly initialized, either by the LIST_OBJECTS_FILTER_INIT macro or a list_objects_filter_init() call. But it is easy to memset(&lofo, 0, sizeof(lofo)) or zero initialize with "= {0}", so I think it is OK to special case and allow for checking the possibility that .buf might be NULL. The other exception comes from use of getdelim() in strbuf_getwholeline(), whose early part reads like this: int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) { ... /* Translate slopbuf to NULL, as we cannot call realloc on it */ if (!sb->alloc) sb->buf = NULL; errno = 0; r = getdelim(&sb->buf, &sb->alloc, term, fp); if (r > 0) { sb->len = r; return 0; } Before calling getdelim(), we deliberately break the strbuf invariant ".buf is never NULL; it can point at the slopbuf if .len is 0". If we read even a single byte, we are OK, as the invariant is restored. Upon EOF, later in the function we have if (!sb->buf) strbuf_init(sb, 0); else strbuf_reset(sb); return EOF; to recover the strbuf invariant. Because strbuf_getwholeline() discards what is originally in sb and replaces it with what getdelim() returns, I have a suspicion that working with bare char * and size_t to interact with getdelim() and then using strbuf_attach() on the success case would be simpler to read and maintain. Once such a rewrite of this function is done (#leftoverbits), the special case we see in the Coccinelle rule can be lifted. Thoughts? contrib/coccinelle/strbuf.cocci | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git c/contrib/coccinelle/strbuf.cocci w/contrib/coccinelle/strbuf.cocci index 5f06105df6..3dc5cd02a3 100644 --- c/contrib/coccinelle/strbuf.cocci +++ w/contrib/coccinelle/strbuf.cocci @@ -60,3 +60,18 @@ expression E1, E2; @@ - strbuf_addstr(E1, real_path(E2)); + strbuf_add_real_path(E1, E2); + +@@ +struct strbuf SB; +@@ +- SB.buf ? SB.buf : "" ++ SB.buf + +@@ +identifier funcname != { strbuf_getwholeline, parse_list_objects_filter }; +struct strbuf SB; +@@ + funcname(...) {<... +- !SB.buf ++ 0 + ...>} ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-19 22:14 ` [RFC] cocci: .buf in a strbuf object can never be NULL Junio C Hamano @ 2026-03-19 23:35 ` Jeff King 2026-03-20 1:46 ` Junio C Hamano 2026-03-21 20:47 ` René Scharfe 2026-03-21 16:24 ` Junio C Hamano 1 sibling, 2 replies; 20+ messages in thread From: Jeff King @ 2026-03-19 23:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, Mar 19, 2026 at 03:14:27PM -0700, Junio C Hamano wrote: > One is in list-objects-filter-options.c::parse_list_objects_filter() > > void parse_list_objects_filter( > struct list_objects_filter_options *filter_options, > const char *arg) > { > struct strbuf errbuf = STRBUF_INIT; > > if (!filter_options->filter_spec.buf) > BUG("filter_options not properly initialized"); > > The filter_options variable points at a list_objects_filter_options > structure, which has an embedded "struct strbuf". This BUG() is > unnecessary if the structure is properly initialized, either by the > LIST_OBJECTS_FILTER_INIT macro or a list_objects_filter_init() call. > But it is easy to memset(&lofo, 0, sizeof(lofo)) or zero initialize > with "= {0}", so I think it is OK to special case and allow for > checking the possibility that .buf might be NULL. Yeah, this is about catching _other_ code which accidentally violates the invariant. I don't think there is any choice between special-casing it or just removing the BUG() check. It is probably OK to do the latter at this point. As part of the transition to LIST_OBJECTS_FILTER_INIT it was a bigger risk, but that is less likely now. So I am OK either way. > Because strbuf_getwholeline() discards what is originally in sb and > replaces it with what getdelim() returns, I have a suspicion that > working with bare char * and size_t to interact with getdelim() and > then using strbuf_attach() on the success case would be simpler to > read and maintain. Once such a rewrite of this function is done > (#leftoverbits), the special case we see in the Coccinelle rule can > be lifted. Hmm. I think that is something like this: diff --git a/strbuf.c b/strbuf.c index 3939863cf3..0333aea261 100644 --- a/strbuf.c +++ b/strbuf.c @@ -631,6 +631,8 @@ int strbuf_getcwd(struct strbuf *sb) #ifdef HAVE_GETDELIM int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) { + char *buf; + size_t alloc; ssize_t r; if (feof(fp)) @@ -639,12 +641,14 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) strbuf_reset(sb); /* Translate slopbuf to NULL, as we cannot call realloc on it */ - if (!sb->alloc) - sb->buf = NULL; + alloc = sb->alloc; + buf = alloc ? sb->buf : NULL; errno = 0; - r = getdelim(&sb->buf, &sb->alloc, term, fp); + r = getdelim(&buf, &alloc, term, fp); if (r > 0) { + sb->buf = buf; + sb->alloc = alloc; sb->len = r; return 0; } @@ -669,10 +673,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) * we can just re-init, but otherwise we should make sure that our * length is empty, and that the result is NUL-terminated. */ - if (!sb->buf) + if (!buf) strbuf_init(sb, 0); - else - strbuf_reset(sb); + else { + sb->buf = buf; + sb->alloc = alloc; + strbuf_reset(&sb); + } return EOF; } #else So I don't know that it makes anything simpler. We have to copy the values back into the strbuf either way, and we still have to handle restoring the strbuf invariants. Even the strbuf_init() case is still needed, because we don't know whether getdelim() just didn't allocate (in which case we could leave the strbuf alone) or if it actually ate the allocation we passed in (which was just a copy of sb->buf). -Peff ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-19 23:35 ` Jeff King @ 2026-03-20 1:46 ` Junio C Hamano 2026-03-20 4:18 ` Jeff King 2026-03-21 20:47 ` René Scharfe 1 sibling, 1 reply; 20+ messages in thread From: Junio C Hamano @ 2026-03-20 1:46 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: >> Because strbuf_getwholeline() discards what is originally in sb and >> replaces it with what getdelim() returns, I have a suspicion that >> working with bare char * and size_t to interact with getdelim() and >> then using strbuf_attach() on the success case would be simpler to >> read and maintain. Once such a rewrite of this function is done >> (#leftoverbits), the special case we see in the Coccinelle rule can >> be lifted. > > Hmm. I think that is something like this: > > diff --git a/strbuf.c b/strbuf.c > index 3939863cf3..0333aea261 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -631,6 +631,8 @@ int strbuf_getcwd(struct strbuf *sb) > #ifdef HAVE_GETDELIM > int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > { > + char *buf; > + size_t alloc; > ssize_t r; > > if (feof(fp)) > @@ -639,12 +641,14 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > strbuf_reset(sb); > > /* Translate slopbuf to NULL, as we cannot call realloc on it */ > - if (!sb->alloc) > - sb->buf = NULL; > + alloc = sb->alloc; > + buf = alloc ? sb->buf : NULL; > errno = 0; I actually was hoping that all lines in this hunk before this point can be removed, i.e., strbuf_release(sb), buf = NULL, alloc = 0. > - r = getdelim(&sb->buf, &sb->alloc, term, fp); > + r = getdelim(&buf, &alloc, term, fp); > > if (r > 0) { > + sb->buf = buf; > + sb->alloc = alloc; > sb->len = r; > return 0; > } Yes. > @@ -669,10 +673,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > * we can just re-init, but otherwise we should make sure that our > * length is empty, and that the result is NUL-terminated. > */ > - if (!sb->buf) > + if (!buf) > strbuf_init(sb, 0); > - else > - strbuf_reset(sb); > + else { > + sb->buf = buf; > + sb->alloc = alloc; > + strbuf_reset(&sb); > + } I do not get all these conditionals. This is an EOF code path; we have no data in buf to return. We resetted the caller's strbuf already. Can't we return buf (if allocated) to the system and return without doing any further damage to sb at this point? > return EOF; > } > #else > > So I don't know that it makes anything simpler. We have to copy the > values back into the strbuf either way, and we still have to handle > restoring the strbuf invariants. Even the strbuf_init() case is still > needed, because we don't know whether getdelim() just didn't allocate > (in which case we could leave the strbuf alone) or if it actually ate > the allocation we passed in (which was just a copy of sb->buf). > > -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-20 1:46 ` Junio C Hamano @ 2026-03-20 4:18 ` Jeff King 2026-03-20 5:45 ` Junio C Hamano 2026-03-21 13:14 ` René Scharfe 0 siblings, 2 replies; 20+ messages in thread From: Jeff King @ 2026-03-20 4:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, Mar 19, 2026 at 06:46:33PM -0700, Junio C Hamano wrote: > > diff --git a/strbuf.c b/strbuf.c > > index 3939863cf3..0333aea261 100644 > > --- a/strbuf.c > > +++ b/strbuf.c > > @@ -631,6 +631,8 @@ int strbuf_getcwd(struct strbuf *sb) > > #ifdef HAVE_GETDELIM > > int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > > { > > + char *buf; > > + size_t alloc; > > ssize_t r; > > > > if (feof(fp)) > > @@ -639,12 +641,14 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > > strbuf_reset(sb); > > > > /* Translate slopbuf to NULL, as we cannot call realloc on it */ > > - if (!sb->alloc) > > - sb->buf = NULL; > > + alloc = sb->alloc; > > + buf = alloc ? sb->buf : NULL; > > errno = 0; > > I actually was hoping that all lines in this hunk before this point > can be removed, i.e., strbuf_release(sb), buf = NULL, alloc = 0. I'm not quite sure what you mean. The function right now looks like: ssize_t r; if (feof(fp)) return EOF; strbuf_reset(sb); /* Translate slopbuf to NULL, as we cannot call realloc on it */ if (!sb->alloc) sb->buf = NULL; errno = 0; I think the strbuf_reset() could go away even without any other changes. We always adjust sb->len in the end to match what happened with getdelim(), so there is no point in doing it up front. We could strbuf_release() and set buf to NULL, but that would defeat the purpose of the function, wouldn't it? We want to reuse sb->buf in each call, not allocate it fresh each time. I.e., in a loop like: while (strbuf_getline(&sb) != EOF) { ...look at sb.buf... } we want to use the same buffer over and over. > > @@ -669,10 +673,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > > * we can just re-init, but otherwise we should make sure that our > > * length is empty, and that the result is NUL-terminated. > > */ > > - if (!sb->buf) > > + if (!buf) > > strbuf_init(sb, 0); > > - else > > - strbuf_reset(sb); > > + else { > > + sb->buf = buf; > > + sb->alloc = alloc; > > + strbuf_reset(&sb); > > + } > > I do not get all these conditionals. This is an EOF code path; we > have no data in buf to return. We resetted the caller's strbuf > already. Can't we return buf (if allocated) to the system and > return without doing any further damage to sb at this point? The conditional is trying to keep any allocated buffer returned from getdelim() attached to the strbuf. Since this is EOF (or error), I agree it would probably be OK to just free it. Even in a loop like the one above, the loop will generally end at EOF, and we don't care about reusing the buffer further. But it's not quite enough to just do: free(buf); Because "buf" is a copy of sb->buf, and we handed "buf" off to getdelim(), we need to make sure sb->buf is not still pointing there. I think it would be enough to do: free(buf); strbuf_init(sb, 0); If we did a strbuf_release() at the top of the function then that is not a concern (you know that sb->buf is pointing at the slopbuf). But I don't think that is a good idea for the reason I gave above. -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-20 4:18 ` Jeff King @ 2026-03-20 5:45 ` Junio C Hamano 2026-03-20 5:57 ` Jeff King 2026-03-21 13:14 ` René Scharfe 1 sibling, 1 reply; 20+ messages in thread From: Junio C Hamano @ 2026-03-20 5:45 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > I'm not quite sure what you mean. The function right now looks like: > ... > But it's not quite enough to just do: > > free(buf); > > Because "buf" is a copy of sb->buf,... I may have phrased the idea very poorly. In short, the core of the idea is that we do not have to use the original content of the strbuf at all. I.e., "buf" does not have to be anything related to sb->buf. In the following patch, I am _removing_ strbuf_reset() near the beginning, but replacing it with strbuf_release() may illustrate the idea more clearly. I didn't do so primarily because the first thing strbuf_attach() does is to call strbuf_release(), so the call would be redundant in the normal code flow. When getdelim() did not return anything positive, after asserting the return value is -1 (i.e., EOF), we only need to return EOF while emptying the caller-supplied strbuf. As the "char *buf" we have and passed to getdelim() never had anything to do with caller-supplied strbuf *sb, there is no need to worry about slopbuf or anything associated with it. strbuf.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git c/strbuf.c w/strbuf.c index 3939863cf3..ef61bd5b14 100644 --- c/strbuf.c +++ w/strbuf.c @@ -631,21 +631,20 @@ int strbuf_getcwd(struct strbuf *sb) #ifdef HAVE_GETDELIM int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) { + size_t alloc; + char *buf; ssize_t r; if (feof(fp)) return EOF; - strbuf_reset(sb); - - /* Translate slopbuf to NULL, as we cannot call realloc on it */ - if (!sb->alloc) - sb->buf = NULL; + buf = NULL; + alloc = 0; errno = 0; - r = getdelim(&sb->buf, &sb->alloc, term, fp); + r = getdelim(&buf, &alloc, term, fp); if (r > 0) { - sb->len = r; + strbuf_attach(sb, buf, (size_t) r, alloc); return 0; } assert(r == -1); @@ -664,15 +663,14 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) if (errno == ENOMEM) die("Out of memory, getdelim failed"); - /* - * Restore strbuf invariants; if getdelim left us with a NULL pointer, - * we can just re-init, but otherwise we should make sure that our - * length is empty, and that the result is NUL-terminated. + /* + * We got an EOF. If getdelim() allocated any memory, we + * would return that to the system. */ - if (!sb->buf) - strbuf_init(sb, 0); - else - strbuf_reset(sb); + free(buf); + + /* And empty the strbuf */ + strbuf_release(sb); return EOF; } #else ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-20 5:45 ` Junio C Hamano @ 2026-03-20 5:57 ` Jeff King 2026-03-20 6:06 ` Junio C Hamano 2026-03-20 6:18 ` Jeff King 0 siblings, 2 replies; 20+ messages in thread From: Jeff King @ 2026-03-20 5:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, Mar 19, 2026 at 10:45:04PM -0700, Junio C Hamano wrote: > I may have phrased the idea very poorly. In short, the core of the > idea is that we do not have to use the original content of the > strbuf at all. I.e., "buf" does not have to be anything related to > sb->buf. This seems like a non-starter to me, though, as it means getdelim() will always allocate a fresh buffer, even though we had a buffer it could have used. I.e., here: > + buf = NULL; > + alloc = 0; > errno = 0; > - r = getdelim(&sb->buf, &sb->alloc, term, fp); > + r = getdelim(&buf, &alloc, term, fp); we will always get a new allocation. And so looping over strbuf_getline() will incur one allocation per call, rather than using the same buffer over and over. I haven't measured to see what the exact cost is, but I know that looping over a strbuf (with a reset in the loop, or the implied reset from a getline call) is a common optimization trick that does have a measurable improvement for some cases. -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-20 5:57 ` Jeff King @ 2026-03-20 6:06 ` Junio C Hamano 2026-03-20 6:18 ` Jeff King 1 sibling, 0 replies; 20+ messages in thread From: Junio C Hamano @ 2026-03-20 6:06 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > I haven't measured to see what the exact cost is, but I know that > looping over a strbuf (with a reset in the loop, or the implied reset > from a getline call) is a common optimization trick that does have a > measurable improvement for some cases. Yeah, that part I missed. Thanks. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-20 5:57 ` Jeff King 2026-03-20 6:06 ` Junio C Hamano @ 2026-03-20 6:18 ` Jeff King 1 sibling, 0 replies; 20+ messages in thread From: Jeff King @ 2026-03-20 6:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Fri, Mar 20, 2026 at 01:57:09AM -0400, Jeff King wrote: > I haven't measured to see what the exact cost is, but I know that > looping over a strbuf (with a reset in the loop, or the implied reset > from a getline call) is a common optimization trick that does have a > measurable improvement for some cases. Try something like this: # input file is all objects in linux.git cd linux.git git repack -ad git show-index <.git/objects/pack/pack-*.idx | cut -d' ' -f2 >input # now do something that primarily reads a bunch of lines git cat-file --buffer --batch-check='%(objectname)" <input That's a somewhat silly command, though it does do something useful (it checks that each object exists). Here is master ("git.old") versus applying your patch ("git.new"): Benchmark 1: ./git.old cat-file --buffer --batch-check="%(objectname)" <input Time (mean ± σ): 3.152 s ± 0.057 s [User: 3.067 s, System: 0.085 s] Range (min … max): 3.048 s … 3.225 s 10 runs Benchmark 2: ./git.new cat-file --buffer --batch-check="%(objectname)" <input Time (mean ± σ): 3.377 s ± 0.065 s [User: 3.295 s, System: 0.082 s] Range (min … max): 3.279 s … 3.480 s 10 runs Summary ./git.old cat-file --buffer --batch-check="%(objectname)" <input ran 1.07 ± 0.03 times faster than ./git.new cat-file --buffer --batch-check="%(objectname)" <input That's a fairly extreme example, but I think shows that the extra allocations do have measurable overhead. If you ask it do more work (asking for %(objecttype) or something) the relative change becomes smaller, but the absolute slowdown (a few hundred ms) remains. -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-20 4:18 ` Jeff King 2026-03-20 5:45 ` Junio C Hamano @ 2026-03-21 13:14 ` René Scharfe 2026-03-21 16:41 ` Jeff King 1 sibling, 1 reply; 20+ messages in thread From: René Scharfe @ 2026-03-21 13:14 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: git On 3/20/26 5:18 AM, Jeff King wrote: > > ssize_t r; > > if (feof(fp)) > return EOF; > > strbuf_reset(sb); > > /* Translate slopbuf to NULL, as we cannot call realloc on it */ > if (!sb->alloc) > sb->buf = NULL; > errno = 0; > > I think the strbuf_reset() could go away even without any other changes. > We always adjust sb->len in the end to match what happened with > getdelim(), so there is no point in doing it up front. Yes. Same with the EOF check; getdelim(3) is (must be) prepared to handle that for us. An early return at the end of the file avoids the translate effort once per file, but adds the cost of checking for each line. René ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-21 13:14 ` René Scharfe @ 2026-03-21 16:41 ` Jeff King 0 siblings, 0 replies; 20+ messages in thread From: Jeff King @ 2026-03-21 16:41 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, git On Sat, Mar 21, 2026 at 02:14:51PM +0100, René Scharfe wrote: > > if (feof(fp)) > > return EOF; > > > > strbuf_reset(sb); > [...] > > I think the strbuf_reset() could go away even without any other changes. > > We always adjust sb->len in the end to match what happened with > > getdelim(), so there is no point in doing it up front. > > Yes. Same with the EOF check; getdelim(3) is (must be) prepared to handle > that for us. An early return at the end of the file avoids the translate > effort once per file, but adds the cost of checking for each line. I think you're probably right. I added it in the original as an attempt to simplify away a tricky case before manipulating the strbuf, but we have to eventually deal with those tricky cases anyway, since we may see the EOF fresh from getdelim(). -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-19 23:35 ` Jeff King 2026-03-20 1:46 ` Junio C Hamano @ 2026-03-21 20:47 ` René Scharfe 2026-03-21 21:18 ` Jeff King 1 sibling, 1 reply; 20+ messages in thread From: René Scharfe @ 2026-03-21 20:47 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: git On 3/20/26 12:35 AM, Jeff King wrote: > > @@ -669,10 +673,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > * we can just re-init, but otherwise we should make sure that our > * length is empty, and that the result is NUL-terminated. > */ > - if (!sb->buf) > + if (!buf) > strbuf_init(sb, 0); > - else > - strbuf_reset(sb); > + else { > + sb->buf = buf; > + sb->alloc = alloc; > + strbuf_reset(&sb); > + } > return EOF; > } > #else > > So I don't know that it makes anything simpler. We have to copy the > values back into the strbuf either way, and we still have to handle > restoring the strbuf invariants. Even the strbuf_init() case is still > needed, because we don't know whether getdelim() just didn't allocate > (in which case we could leave the strbuf alone) or if it actually ate > the allocation we passed in (which was just a copy of sb->buf). And yet this function can turn an empty strbuf into an allocated one without rolling it back on error, leaving code similar to this silly example here leaking: int copy_one_line(FILE *in, FILE *out, int term) { struct strbuf sb = STRBUF_INIT; if (strbuf_getwholeline(&sb, in, term)) return -1; fwrite(sb.buf, 1, sb.len, out); strbuf_release(&sb); return 0; } Some strbuf functions restore the original state in such a case by calling strbuf_release(), strbuf_getwholeline() doesn't. If we are OK with that then it could be simplified by growing the buffer upfront: int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) { ssize_t r; strbuf_grow(sb, 0); errno = 0; r = getdelim(&sb->buf, &sb->alloc, term, fp); if (r > 0) { sb->len = r; return 0; } assert(r == -1); if (errno == ENOMEM) die("Out of memory, getdelim failed"); strbuf_reset(sb); return EOF; } René ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-21 20:47 ` René Scharfe @ 2026-03-21 21:18 ` Jeff King 2026-03-21 23:41 ` René Scharfe 2026-03-22 1:22 ` Junio C Hamano 0 siblings, 2 replies; 20+ messages in thread From: Jeff King @ 2026-03-21 21:18 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, git On Sat, Mar 21, 2026 at 09:47:18PM +0100, René Scharfe wrote: > And yet this function can turn an empty strbuf into an allocated one > without rolling it back on error, leaving code similar to this silly > example here leaking: > > int copy_one_line(FILE *in, FILE *out, int term) > { > struct strbuf sb = STRBUF_INIT; > if (strbuf_getwholeline(&sb, in, term)) > return -1; > fwrite(sb.buf, 1, sb.len, out); > strbuf_release(&sb); > return 0; > } Yes, I almost pointed that out, but I think it's mostly a non-issue in practice because you'd generally call it multiple times (usually in a loop, but sometimes just multiple individual calls). And then you have to release if any call ever succeeded, which means either doing so after the loop ends or in a cleanup block. Grepping for 'if (strbuf_get.*line', the closest I found was get_mail_commit_oid(), which reads a single line. It doesn't have an early return, though, since it has to clean up the FILE pointer anyway. So I dunno. I don't think it's been a problem in practice, but I'm not opposed to future-proofing if it's easy to do. > Some strbuf functions restore the original state in such a case by > calling strbuf_release(), strbuf_getwholeline() doesn't. If we are OK > with that then it could be simplified by growing the buffer upfront: > > int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > { > ssize_t r; > > strbuf_grow(sb, 0); > errno = 0; > r = getdelim(&sb->buf, &sb->alloc, term, fp); This causes two allocations, but presumably only the first call of many, so not a big deal in practice. I feel like there's a lot of discussion in this thread but we're not achieving anything practical. If we do anything, I think it would be: - drop the feof and reset at the top of the function, which are redundant - make a noop read on an unallocated strbuf retain the unallocated state (your example above) Could the function be rewritten differently, or maybe even made a little simpler? Perhaps, but who cares? The function has been largely untouched for a decade and the behavior is fine. And there are a bunch of pitfalls that a rewrite risks falling into. -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-21 21:18 ` Jeff King @ 2026-03-21 23:41 ` René Scharfe 2026-03-22 1:44 ` Jeff King 2026-03-22 1:22 ` Junio C Hamano 1 sibling, 1 reply; 20+ messages in thread From: René Scharfe @ 2026-03-21 23:41 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git On 3/21/26 10:18 PM, Jeff King wrote: > On Sat, Mar 21, 2026 at 09:47:18PM +0100, René Scharfe wrote: > >> And yet this function can turn an empty strbuf into an allocated one >> without rolling it back on error, leaving code similar to this silly >> example here leaking: >> >> int copy_one_line(FILE *in, FILE *out, int term) >> { >> struct strbuf sb = STRBUF_INIT; >> if (strbuf_getwholeline(&sb, in, term)) >> return -1; >> fwrite(sb.buf, 1, sb.len, out); >> strbuf_release(&sb); >> return 0; >> } > > Yes, I almost pointed that out, but I think it's mostly a non-issue > in practice because you'd generally call it multiple times (usually in a > loop, but sometimes just multiple individual calls). And then you have > to release if any call ever succeeded, which means either doing so after > the loop ends or in a cleanup block. > > Grepping for 'if (strbuf_get.*line', the closest I found was > get_mail_commit_oid(), which reads a single line. It doesn't have an > early return, though, since it has to clean up the FILE pointer anyway. Caller strbuf_appendwholeline() handles a single line and invokes strbuf_release() on error, so it swings in the opposite direction. > So I dunno. I don't think it's been a problem in practice, but I'm not > opposed to future-proofing if it's easy to do. I also don't think it's a problem. > I feel like there's a lot of discussion in this thread but we're not > achieving anything practical. Funny how attention works. > If we do anything, I think it would be: > > - drop the feof and reset at the top of the function, which are > redundant Easy win. We can also drop the feof(3) call from the non-getdelim(3) version, but need to keep the reset there. > - make a noop read on an unallocated strbuf retain the unallocated > state (your example above) That makes the function conform to the convention of rolling back on error. This transactional behavior is a bit easier to understand. The non-getdelim(3) version doesn't do that, though. It returns whatever it got and leaves error checking and rollback to its callers. getdelim(3) doesn't allow that -- it has no way to indicate the length of partial reads. If we are OK with throwing away partial lines then we better do that consistently in both versions? Sounds a bit messed up to bin perfectly good data just because some other platform has a fancy function that goes quiet when it stumbles. The alternative of having inconsistent behavior seems worse, though. René ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-21 23:41 ` René Scharfe @ 2026-03-22 1:44 ` Jeff King 0 siblings, 0 replies; 20+ messages in thread From: Jeff King @ 2026-03-22 1:44 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, git On Sun, Mar 22, 2026 at 12:41:04AM +0100, René Scharfe wrote: > > - make a noop read on an unallocated strbuf retain the unallocated > > state (your example above) > > That makes the function conform to the convention of rolling back on > error. This transactional behavior is a bit easier to understand. The > non-getdelim(3) version doesn't do that, though. It returns whatever > it got and leaves error checking and rollback to its callers. Yeah, I didn't look at the fallback version. They definitely should match if we are going to change the behavior on an unallocated strbuf. > getdelim(3) doesn't allow that -- it has no way to indicate the length > of partial reads. If we are OK with throwing away partial lines then > we better do that consistently in both versions? Sounds a bit messed > up to bin perfectly good data just because some other platform has a > fancy function that goes quiet when it stumbles. The alternative of > having inconsistent behavior seems worse, though. I'd expect a partial read via getdelim() to return the number of bytes read, and set an internal flag such that ferror(f) returns true (and return -1 next time). But that is based more on wishful thinking than looking at the implementation (and the details may even vary between implementations). To some degree, one you see an error on a FILE handle, all bets are off, and keeping or throwing away a partial line or not is not really important. You can't realistically go back and retry. -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-21 21:18 ` Jeff King 2026-03-21 23:41 ` René Scharfe @ 2026-03-22 1:22 ` Junio C Hamano 2026-03-22 1:40 ` Jeff King 1 sibling, 1 reply; 20+ messages in thread From: Junio C Hamano @ 2026-03-22 1:22 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, git Jeff King <peff@peff.net> writes: > Could the function be rewritten differently, or maybe even made a little > simpler? Perhaps, but who cares? The function has been largely untouched > for a decade and the behavior is fine. And there are a bunch of pitfalls > that a rewrite risks falling into. Well, my only interest in this codepath was to get rid of "if (!sb->buf)" so that I can lift the special case in the Coccinelle rule. Nothing else. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-22 1:22 ` Junio C Hamano @ 2026-03-22 1:40 ` Jeff King 0 siblings, 0 replies; 20+ messages in thread From: Jeff King @ 2026-03-22 1:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, git On Sat, Mar 21, 2026 at 06:22:38PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Could the function be rewritten differently, or maybe even made a little > > simpler? Perhaps, but who cares? The function has been largely untouched > > for a decade and the behavior is fine. And there are a bunch of pitfalls > > that a rewrite risks falling into. > > Well, my only interest in this codepath was to get rid of "if > (!sb->buf)" so that I can lift the special case in the Coccinelle > rule. Nothing else. Yeah. IMHO it is better just to keep the special case. -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-19 22:14 ` [RFC] cocci: .buf in a strbuf object can never be NULL Junio C Hamano 2026-03-19 23:35 ` Jeff King @ 2026-03-21 16:24 ` Junio C Hamano 2026-03-21 16:39 ` Jeff King 1 sibling, 1 reply; 20+ messages in thread From: Junio C Hamano @ 2026-03-21 16:24 UTC (permalink / raw) To: Jeff King, René Scharfe; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > diff --git c/contrib/coccinelle/strbuf.cocci w/contrib/coccinelle/strbuf.cocci > index 5f06105df6..3dc5cd02a3 100644 > --- c/contrib/coccinelle/strbuf.cocci > +++ w/contrib/coccinelle/strbuf.cocci > ... > +@@ > +identifier funcname != { strbuf_getwholeline, parse_list_objects_filter }; > +struct strbuf SB; > +@@ > + funcname(...) {<... > +- !SB.buf > ++ 0 > + ...>} Here is my second try. strbuf_getwholeline() does not have to break strbuf invariants even tentatively. We just grab the guts of sb, let getdelim() possibly reallocate, and then return it in the normal case. In the EOF code path, the only special thing we need is when we started with slopbuf[] and getdelim() allocated some bytes yet returned EOF. We are expected to free it before returning. By the way, the big comment about xrealloc() in the middle, most of which is outside the post-context of the first hunk, should be updated, as our xrealloc() do not aggressively try to recover these days, if I understand correctly. I left it outside the scope of this patch, whose sole focus is to reduce the number of places in the codebase that check if sb->buf is NULL. strbuf.c | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git c/strbuf.c w/strbuf.c index 3939863cf3..89933c3814 100644 --- c/strbuf.c +++ w/strbuf.c @@ -632,24 +632,26 @@ int strbuf_getcwd(struct strbuf *sb) int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) { ssize_t r; + char *buf = sb->buf; + size_t alloc = sb->alloc; if (feof(fp)) return EOF; - strbuf_reset(sb); - /* Translate slopbuf to NULL, as we cannot call realloc on it */ - if (!sb->alloc) - sb->buf = NULL; + if (!alloc) + buf = NULL; errno = 0; - r = getdelim(&sb->buf, &sb->alloc, term, fp); + r = getdelim(&buf, &alloc, term, fp); if (r > 0) { + sb->buf = buf; + sb->alloc = alloc; sb->len = r; return 0; } - assert(r == -1); + assert(r == -1); /* * Normally we would have called xrealloc, which will try to free * memory and recover. But we have no way to tell getdelim() to do so. @@ -664,15 +666,16 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) if (errno == ENOMEM) die("Out of memory, getdelim failed"); - /* - * Restore strbuf invariants; if getdelim left us with a NULL pointer, - * we can just re-init, but otherwise we should make sure that our - * length is empty, and that the result is NUL-terminated. + /* + * If getdelim() allocated when we had no allocation, free it. + */ + if (!alloc) + free(buf); + + /* + * We haven't touched sb at all; as with the initial "were we + * already at EOF?" case, return EOF without touching sb. */ - if (!sb->buf) - strbuf_init(sb, 0); - else - strbuf_reset(sb); return EOF; } #else ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC] cocci: .buf in a strbuf object can never be NULL 2026-03-21 16:24 ` Junio C Hamano @ 2026-03-21 16:39 ` Jeff King 0 siblings, 0 replies; 20+ messages in thread From: Jeff King @ 2026-03-21 16:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, git On Sat, Mar 21, 2026 at 09:24:17AM -0700, Junio C Hamano wrote: > Here is my second try. strbuf_getwholeline() does not have to break > strbuf invariants even tentatively. We just grab the guts of sb, > let getdelim() possibly reallocate, and then return it in the normal > case. > > In the EOF code path, the only special thing we need is when we > started with slopbuf[] and getdelim() allocated some bytes yet > returned EOF. We are expected to free it before returning. This is similar to what I initially wrote (but revised before sending), but I don't think it works because... > + /* > + * We haven't touched sb at all; as with the initial "were we > + * already at EOF?" case, return EOF without touching sb. > */ ...this part isn't necessarily true. We handed sb->buf (copied via the local "buf") to getdelim(). It might have reallocated it behind our backs and returned the new pointer, and now sb->buf is dangling. And in that sense, assigning sb->buf to a local buf becomes _more_ confusing, because now we have two copies of a pointer that is being mutated. > By the way, the big comment about xrealloc() in the middle, most of > which is outside the post-context of the first hunk, should be > updated, as our xrealloc() do not aggressively try to recover these > days, if I understand correctly. I left it outside the scope of > this patch, whose sole focus is to reduce the number of places in > the codebase that check if sb->buf is NULL. Yes, I think you're right. -Peff ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-03-22 1:44 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-19 7:15 [PATCH] rerere: update to modern representation of empty strbufs Junio C Hamano 2026-03-19 7:57 ` Patrick Steinhardt 2026-03-19 22:14 ` [RFC] cocci: .buf in a strbuf object can never be NULL Junio C Hamano 2026-03-19 23:35 ` Jeff King 2026-03-20 1:46 ` Junio C Hamano 2026-03-20 4:18 ` Jeff King 2026-03-20 5:45 ` Junio C Hamano 2026-03-20 5:57 ` Jeff King 2026-03-20 6:06 ` Junio C Hamano 2026-03-20 6:18 ` Jeff King 2026-03-21 13:14 ` René Scharfe 2026-03-21 16:41 ` Jeff King 2026-03-21 20:47 ` René Scharfe 2026-03-21 21:18 ` Jeff King 2026-03-21 23:41 ` René Scharfe 2026-03-22 1:44 ` Jeff King 2026-03-22 1:22 ` Junio C Hamano 2026-03-22 1:40 ` Jeff King 2026-03-21 16:24 ` Junio C Hamano 2026-03-21 16:39 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox