public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Deveshi Dwivedi <deveshigurgaon@gmail.com>
Cc: git@vger.kernel.org,  peff@peff.net
Subject: Re: [PATCH v2 2/2] list-objects-filter-options: avoid strbuf_split_str()
Date: Wed, 11 Mar 2026 09:28:21 -0700	[thread overview]
Message-ID: <xmqqo6kuqqje.fsf@gitster.g> (raw)
In-Reply-To: <20260311132041.12044-3-deveshigurgaon@gmail.com> (Deveshi Dwivedi's message of "Wed, 11 Mar 2026 13:20:41 +0000")

Deveshi Dwivedi <deveshigurgaon@gmail.com> writes:

> Walk the input string directly with strchrnul() to find each '+'.
> strchrnul() returns a pointer to the terminating '\0' when the
> delimiter is not found, so no separate "found separator?" branch is
> needed.  Copy each sub-spec into a temporary buffer using end - p,
> which naturally excludes the '+', so the separator is always stripped
> cleanly.  A trailing '+' causes the outer while (*p) test to fail on
> the next iteration rather than passing an empty string to the parser.

OK.  The above description is a bit more excessively focused on the
implementation (which readers can find in the patch text) than the
level of detail we usually aim for, but let's let it pass.

> +	if (!*p) {
>  		strbuf_addstr(errbuf, _("expected something after combine:"));
>  		result = 1;
>  		goto cleanup;
>  	}

This complains when "combine:" is not followed by anything.

> +	while (*p && !result) {
> +		const char *end = strchrnul(p, '+');
> +		char *sub = xmemdupz(p, end - p);
> +
> +		result = parse_combine_subfilter(filter_options, sub, errbuf);
> +		free(sub);
> +		if (!*end)
> +			break;
> +		p = end + 1;
>  	}

The usual "process up to the next '+' and skip over that '+' to find the
next piece" pattern is here.  There is nothing surprising.  Good.

The "trailing '+' problem" the proposed log message talks about
happens when the input is "combine:foo+" (this loop starts scanning
from 'f').  We find the '+' at the end of the string in "end", make
a temporary copy of 'foo' and feed it to parse_combine_subfilter(),
and move on to the NUL after '+', at which the loop control notices
that we are at the end.

It is curious what would happen when the input were "combine:foo++",
though.  What happens is that the loop begins with p pointing at 'f'
in the initial iteration, "end" points at the first '+', and a
temporary copy of 'foo' is fed to parse_combine_subfilter(), and we
move on to the second '+'.  Then the second iteration finds NUL
after that '+' in "end", and we end up calling the helper function
with a temporary copy of '+'; gently_parse_list_objects_fiter() will
reject it as an invalid filter-spec.

Logically, "foo+" would be a combination of "foo" and "" (an empty
string) and we ignore the empty string, and "foo++" would be a
combination of "foo", "" and "" (two empty strings), but we barf at
the empty string if it appears in the middle.  And recall that "" we
saw earlier at the beginning of this function was also triggered an
error.

Admittedly the original wasn't much better.  It ignored an empty
string in the middle (e.g., "foo++bar" would have fed 'foo', '', and
'bar' to parse_combine_subfilter() and an empty string would have
become a no-op) but barfed at the trailing one "foo+".  This new
implementation swaps where it barfs, complaining an empty string in
the middle and ignoring an empty string at the end.

In any case, the error behaviour against an empty filter-spec feels
a bit uneven.

Tightening to reject empty string in the middle may appear to
existing users as a regression if they are using "combine:foo++bar"
as they are forced to update it to lose the extra '+'.

By the way, instead of making a temporary copy and discarding it
repeatedly in a loop, it might be cheaper to reuse an allocated
temporary with the common pattern:

	struct strbuf temp = STRBUF_INIT;
	while (... loop ...) {
		const char *end = ...;
		strbuf_reset(&temp);
		strbuf_add(&temp, p, end - p);
		... use temp.buf ...
	}
	strbuf_release(&temp);

because _reset() only resets the len member of the strbuf without
releasing the resource, if the next piece of memory you need a
temporary copy for is shorter than the pieces you have ever used the
strbuf for, you can make the copy without a new allocation.

> -test_expect_success 'validate err msg for "combine:<valid-filter>+"' '
> -	expect_invalid_filter_spec combine:tree:2+ "expected .tree:<depth>."
> -'


  reply	other threads:[~2026-03-11 16:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-11 13:20 [PATCH v2 0/2] avoid unnecessary strbuf_split*() and strbuf-by-value usage Deveshi Dwivedi
2026-03-11 13:20 ` [PATCH v2 1/2] worktree: do not pass strbuf by value Deveshi Dwivedi
2026-03-11 13:20 ` [PATCH v2 2/2] list-objects-filter-options: avoid strbuf_split_str() Deveshi Dwivedi
2026-03-11 16:28   ` Junio C Hamano [this message]
2026-03-11 17:45     ` Jeff King
2026-03-11 18:07       ` Junio C Hamano
2026-03-11 17:33 ` [PATCH v3 0/2] avoid unnecessary strbuf_split*() and strbuf-by-value usage Deveshi Dwivedi
2026-03-11 17:33   ` [PATCH v3 1/2] worktree: do not pass strbuf by value Deveshi Dwivedi
2026-03-11 17:33   ` [PATCH v3 2/2] list-objects-filter-options: avoid strbuf_split_str() Deveshi Dwivedi
2026-03-11 17:48     ` Jeff King
2026-03-11 18:13       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqo6kuqqje.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=deveshigurgaon@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox