From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Deveshi Dwivedi <deveshigurgaon@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH v2 2/2] list-objects-filter-options: avoid strbuf_split_str()
Date: Wed, 11 Mar 2026 13:45:48 -0400 [thread overview]
Message-ID: <20260311174548.GA1900488@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqo6kuqqje.fsf@gitster.g>
On Wed, Mar 11, 2026 at 09:28:21AM -0700, Junio C Hamano wrote:
> > + while (*p && !result) {
> > + const char *end = strchrnul(p, '+');
> > + char *sub = xmemdupz(p, end - p);
> > +
> > + result = parse_combine_subfilter(filter_options, sub, errbuf);
> > + free(sub);
> > + if (!*end)
> > + break;
> > + p = end + 1;
> > }
> [...]
> It is curious what would happen when the input were "combine:foo++",
> though. What happens is that the loop begins with p pointing at 'f'
> in the initial iteration, "end" points at the first '+', and a
> temporary copy of 'foo' is fed to parse_combine_subfilter(), and we
> move on to the second '+'. Then the second iteration finds NUL
> after that '+' in "end", and we end up calling the helper function
> with a temporary copy of '+'; gently_parse_list_objects_fiter() will
> reject it as an invalid filter-spec.
I don't think this is quite right. After we skip the first "+" and "p"
points to the second one, then strchrnul() will find that second "+",
not NUL. And so we have a 0-length spec, and feed the empty string to
parse_combine_subfilter(), which complains.
It is the same behavior when we see "++" in the middle of the string.
> Logically, "foo+" would be a combination of "foo" and "" (an empty
> string) and we ignore the empty string, and "foo++" would be a
> combination of "foo", "" and "" (two empty strings), but we barf at
> the empty string if it appears in the middle. And recall that "" we
> saw earlier at the beginning of this function was also triggered an
> error.
>
> Admittedly the original wasn't much better. It ignored an empty
> string in the middle (e.g., "foo++bar" would have fed 'foo', '', and
> 'bar' to parse_combine_subfilter() and an empty string would have
> become a no-op) but barfed at the trailing one "foo+". This new
> implementation swaps where it barfs, complaining an empty string in
> the middle and ignoring an empty string at the end.
>
> In any case, the error behaviour against an empty filter-spec feels
> a bit uneven.
>
> Tightening to reject empty string in the middle may appear to
> existing users as a regression if they are using "combine:foo++bar"
> as they are forced to update it to lose the extra '+'.
But yeah, it is somewhat inconsistent that we complain about an empty
spec in the middle, but not at the end. If we were starting from
scratch, I'd probably forbid it everywhere. But since we allow it in
some cases now, it may be worth being more permissive.
It is easy to check in the loop, or even just teach the helper to make
empty specs a noop:
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 616c6c7faa..56e1c651f6 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -151,6 +151,9 @@ static int parse_combine_subfilter(
char *decoded;
int result;
+ if (!*subspec)
+ return 0;
+
ALLOC_GROW_BY(filter_options->sub, filter_options->sub_nr, 1,
filter_options->sub_alloc);
list_objects_filter_init(&filter_options->sub[new_index]);
> By the way, instead of making a temporary copy and discarding it
> repeatedly in a loop, it might be cheaper to reuse an allocated
> temporary with the common pattern:
>
> struct strbuf temp = STRBUF_INIT;
> while (... loop ...) {
> const char *end = ...;
> strbuf_reset(&temp);
> strbuf_add(&temp, p, end - p);
> ... use temp.buf ...
> }
> strbuf_release(&temp);
>
> because _reset() only resets the len member of the strbuf without
> releasing the resource, if the next piece of memory you need a
> temporary copy for is shorter than the pieces you have ever used the
> strbuf for, you can make the copy without a new allocation.
As a general strategy, I agree this is a good one. But I'd be quite
surprised if it ever made a measurable difference for this loop, which
we'd expect to trigger a handful of times (and which allocates in the
sub-function anyway).
-Peff
next prev parent reply other threads:[~2026-03-11 17:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 13:20 [PATCH v2 0/2] avoid unnecessary strbuf_split*() and strbuf-by-value usage Deveshi Dwivedi
2026-03-11 13:20 ` [PATCH v2 1/2] worktree: do not pass strbuf by value Deveshi Dwivedi
2026-03-11 13:20 ` [PATCH v2 2/2] list-objects-filter-options: avoid strbuf_split_str() Deveshi Dwivedi
2026-03-11 16:28 ` Junio C Hamano
2026-03-11 17:45 ` Jeff King [this message]
2026-03-11 18:07 ` Junio C Hamano
2026-03-11 17:33 ` [PATCH v3 0/2] avoid unnecessary strbuf_split*() and strbuf-by-value usage Deveshi Dwivedi
2026-03-11 17:33 ` [PATCH v3 1/2] worktree: do not pass strbuf by value Deveshi Dwivedi
2026-03-11 17:33 ` [PATCH v3 2/2] list-objects-filter-options: avoid strbuf_split_str() Deveshi Dwivedi
2026-03-11 17:48 ` Jeff King
2026-03-11 18:13 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260311174548.GA1900488@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=deveshigurgaon@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox