From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 00/11] do not overuse strbuf_split*()
Date: Sat, 2 Aug 2025 14:47:15 -0400 [thread overview]
Message-ID: <20250802184715.GD1773585@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqq4iupira0.fsf@gitster.g>
On Sat, Aug 02, 2025 at 10:09:59AM -0700, Junio C Hamano wrote:
> > 2. Is the handling of repeated delimiters the same? E.g., if I split
> > on "/" and we see "foo//bar", do both split implementations yield
> > the same output. I could see either of ("foo", "", "bar") and
> > ("foo", "bar") being produced.
>
> Same between? strbuf_split*() seems to be happy to produce an empty
> piece in the result (and there is no "omit empty ones" feature).
> You can choose with STRING_LIST_SPLIT_NOEMPTY both results, but for
> a straight conversion, the vanilla one without that flag would do
> what we want, I think.
Yeah, I meant same between strbuf_split and string_list_split. I didn't
actually peek into the behavior of either. It was more of a "did you
check this it?" question. It sounds like you did. Good.
> > Some of these sites do strbuf_rtrim() on the split pieces. But
> > STRING_LIST_SPLIT_TRIM does both left and right. Is this OK? I think so
> > because in both cases we are already splitting on space, so we wouldn't
> > expect left-hand spaces (of course you could have a stray tab or
> > something, but I suspect the new code matches the intent more closely).
>
> I did wonder about these "rtrim-only" callsites strbuf_split(), but
> I did not think they needed ltrim, since they were splitting at SP
> and rejected when the element were empty (remember, strbuf_split()
> happily creates an empty element in the result).
>
> Their use of rtrim() is primarily when they split at SP they want to
> remove that SP (remember, another misdesign of strbuf_split() is
> that it leaves the delimiter itself at the end of each split piece).
> Both 05/11 step for merge-tree and 06/11 step for notes are good
> examples.
Makes sense. I did not even realize all of the horrible gotchas of
strbuf_split(). :)
> So in short, yes, the new code is being a bit more lenient (the
> original parsers were more strict and insisted on non-trimming
> behaviour). We might consider tightening them by removing the
> STRING_LIST_SPLIT_TRIM bit to be more faithful to the original, but
> as you may have noticed, doing so would make it more strict at the
> right end. The original that split at SP and then rtrimmed allowed
> trailing HT after the data to be removed, but I do not think it was
> part of the designed behaviour of the original anyway. So I am
> inclined to say we should probably drop STRING_LIST_SPLIT_TRIM;
> nobody would scream.
Hmm, yeah, I see what you mean. The documentation for merge-tree is
quite vague about whether it is a single space or if multiple spaces are
allowed. I'd be inclined to leave it on the lenient side, since I don't
think there is any ambiguity.
The git-notes one does specifically say "SP", so anybody who would be
broken by losing the extra trim is already violating the docs. But
again, I'd be tempted to just err on the friendly side.
I doubt either case matters too much either way, though.
-Peff
prev parent reply other threads:[~2025-08-02 18:47 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-31 7:41 [PATCH 0/9] do not overuse strbuf_split*() Junio C Hamano
2025-07-31 7:41 ` [PATCH 1/9] wt-status: avoid strbuf_split*() Junio C Hamano
2025-07-31 7:41 ` [PATCH 2/9] clean: do not pass strbuf by value Junio C Hamano
2025-07-31 7:41 ` [PATCH 3/9] clean: do not use strbuf_split*() [part 1] Junio C Hamano
2025-07-31 7:41 ` [PATCH 4/9] clean: do not use strbuf_split*() [part 2] Junio C Hamano
2025-07-31 7:41 ` [PATCH 5/9] merge-tree: do not use strbuf_split*() Junio C Hamano
2025-07-31 7:41 ` [PATCH 6/9] notes: " Junio C Hamano
2025-07-31 20:14 ` Eric Sunshine
2025-07-31 7:41 ` [PATCH 7/9] config: do not use strbuf_split() Junio C Hamano
2025-07-31 20:15 ` Eric Sunshine
2025-07-31 7:41 ` [PATCH 8/9] environment: do not use strbuf_split*() Junio C Hamano
2025-07-31 7:41 ` [PATCH 9/9] sub-process: " Junio C Hamano
2025-07-31 8:50 ` Christian Couder
2025-07-31 14:30 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 01/11] wt-status: avoid strbuf_split*() Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 02/11] clean: do not pass strbuf by value Junio C Hamano
2025-08-02 8:38 ` Jeff King
2025-08-02 16:44 ` Junio C Hamano
2025-08-02 18:40 ` Jeff King
2025-07-31 22:54 ` [PATCH v2 03/11] clean: do not use strbuf_split*() [part 1] Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 04/11] clean: do not use strbuf_split*() [part 2] Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 05/11] merge-tree: do not use strbuf_split*() Junio C Hamano
2025-08-02 8:55 ` Jeff King
2025-07-31 22:54 ` [PATCH v2 06/11] notes: " Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 07/11] config: do not use strbuf_split() Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 08/11] environment: do not use strbuf_split*() Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 09/11] sub-process: " Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 10/11] trace2: trim_trailing_newline followed by trim is a no-op Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 11/11] trace2: do not use strbuf_split*() Junio C Hamano
2025-08-02 9:08 ` [PATCH v2 00/11] do not overuse strbuf_split*() Jeff King
2025-08-02 17:09 ` Junio C Hamano
2025-08-02 18:47 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250802184715.GD1773585@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).