From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: [PATCH v3 00/12] do not overuse strbuf_split*()
Date: Sat, 2 Aug 2025 23:52:52 -0700 [thread overview]
Message-ID: <20250803065304.3325286-1-gitster@pobox.com> (raw)
In-Reply-To: <20250801220423.1230969-1-gitster@pobox.com>
strbuf is a very good data structure to work with string data
without having to worry about running past the end of the string.
But an array of strbuf is often a wrong data structure. You rarely
have need to be able to edit multiple strings represented by such an
array simultaneously. And strbuf_split*() that produces result in
such a shape is a misdesigned API function.
The most common use case of strbuf_split*() family of functions
seems to be to trim away the whitespaces around each piece of split
string. With modern string_list_split*(), it is often no longer
necessary.
This series builds on top of the other series that extends string
list API to allow string_list_split() to take more than one delimiter
bytes, and to optionally trim the resulting string pieces.
I do not plan to eradicate all the uses of strbuf_split*() myself,
not because I found some valid use cases in the existing code (I
haven't yet), but these patches would give interested others enough
material to study and mimic to continue the effort and I can safely
leave it as #leftoverbits to rewrite them.
Relative to v2, this iteration v3 adds one more clean-up step to
correct a callee that insists on taking a while strbuf when it can
work with any NUL-terminated strings, and comes with a handful of
typofixes.
Junio C Hamano (12):
wt-status: avoid strbuf_split*()
clean: do not pass strbuf by value
clean: do not use strbuf_split*() [part 1]
clean: do not pass the whole structure when it is not necessary
clean: do not use strbuf_split*() [part 2]
merge-tree: do not use strbuf_split*()
notes: do not use strbuf_split*()
config: do not use strbuf_split()
environment: do not use strbuf_split*()
sub-process: do not use strbuf_split*()
trace2: trim_trailing_newline followed by trim is a no-op
trace2: do not use strbuf_split*()
builtin/clean.c | 74 ++++++++++++++++++++--------------------
builtin/merge-tree.c | 30 +++++++++--------
builtin/notes.c | 23 +++++++------
config.c | 23 ++++++-------
environment.c | 19 +++++++----
sub-process.c | 15 ++++-----
trace2/tr2_cfg.c | 80 +++++++++++++++-----------------------------
wt-status.c | 31 ++++++-----------
8 files changed, 129 insertions(+), 166 deletions(-)
Range-diff against v2:
1: 27de3d9a92 = 1: 2efe707054 wt-status: avoid strbuf_split*()
2: 8f096e5a2d = 2: 899ff9c175 clean: do not pass strbuf by value
3: 768b08907e = 3: 7a4acc3607 clean: do not use strbuf_split*() [part 1]
-: ---------- > 4: 4985f72ea5 clean: do not pass the whole structure when it is not necessary
4: 0f8583e798 = 5: 4f60672f6f clean: do not use strbuf_split*() [part 2]
5: cefc2ec9f5 = 6: d33091220d merge-tree: do not use strbuf_split*()
6: 1c8ea097f6 ! 7: 566e910495 notes: do not use strbuf_split*()
@@ Metadata
## Commit message ##
notes: do not use strbuf_split*()
- When reading the copy instruction from the standard input, the
- program reads a line, splits it into tokens at whitespace, and trims
- each of the tokens before using. We no longer need to use strbuf
- just to be able to trimming, as string_list_split*() family now can
- trim while splitting a string.
+ When reading copy instructions from the standard input, the program
+ reads a line, splits it into tokens at whitespace, and trims each of
+ the tokens before using. We no longer need to use strbuf just to be
+ able to trim, as string_list_split*() family now can trim while
+ splitting a string.
- Retire the use of strbuf_split().
+ Retire the use of strbuf_split() from this code path.
Note that this loop is a bit sloppy in that it ensures at least
there are two tokens on each line, but ignores if there are extra
7: a472688ec1 ! 8: dcecac2580 config: do not use strbuf_split()
@@ Commit message
config: do not use strbuf_split()
When parsing an old-style GIT_CONFIG_PARAMETERS environment
- variable, the code parses the key=value pair by spliting them at '='
- into an array of strbuf's. As strbuf_split() leafes the delimiter
+ variable, the code parses key=value pairs by splitting them at '='
+ into an array of strbuf's. As strbuf_split() leaves the delimiter
at the end of the split piece, the code has to manually trim it.
If we split with string_list_split(), that becomes unnecessary.
- Retire the use of strbuf_split().
+ Retire the use of strbuf_split() from this code path.
Note that the max parameter of string_list_split() is of
an ergonomically iffy design---it specifies the maximum number of
8: 2b9957f31c = 9: b894d4481f environment: do not use strbuf_split*()
9: 4a5599836d = 10: d6fd08bd76 sub-process: do not use strbuf_split*()
10: cf6ecd2090 ! 11: cb8e82a641 trace2: trim_trailing_newline followed by trim is a no-op
@@ Commit message
of a string. If the code plans to call strbuf_trim() immediately
after doing so, the code is better off skipping the EOL trimming in
the first place. After all, LF/CRLF at the end is a mere special
- case of whitespaces at the right end of the string, which will be
- removed by strbuf_rtrim().
+ case of whitespaces at the end of the string, which will be removed
+ by strbuf_rtrim() anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11: c2578b6b1c ! 12: 838fe56920 trace2: do not use strbuf_split*()
@@ Metadata
## Commit message ##
trace2: do not use strbuf_split*()
- tr2_cfg_load_patterns() and tr2_load_env_vars() functions are copied
- and pasted pair of functions that each reads an environment
- variable, split the value at ',' boundaries and trims the resulting
- string pieces into an array of strbufs. But the code paths that
- later use these strbufs take no advantage of the strbuf-ness of the
- result (they do not benefit from <ptr,len> representation to avoid
- having to run strlne(<ptr>), for example).
+ tr2_cfg_load_patterns() and tr2_load_env_vars() functions are
+ functions with very similar structure that each reads an environment
+ variable, splits its value at the ',' boundaries, and trims the
+ resulting string pieces into an array of strbufs.
+
+ But the code paths that later use these strbufs take no advantage of
+ the strbuf-ness of the result (they do not benefit from <ptr,len>
+ representation to avoid having to run strlen(<ptr>), for example).
Simplify the code by teaching these functions to split into a string
- list instead.
+ list instead; even the trimming comes for free ;-).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
--
2.50.1-633-g69dfdd50af
next prev parent reply other threads:[~2025-08-03 6:53 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-31 6:39 [PATCH 0/5] string_list_split*() updates Junio C Hamano
2025-07-31 6:39 ` [PATCH 1/5] string-list: report programming error with BUG Junio C Hamano
2025-07-31 19:33 ` Eric Sunshine
2025-07-31 22:16 ` Junio C Hamano
2025-07-31 6:39 ` [PATCH 2/5] string-list: align string_list_split() with its _in_place() counterpart Junio C Hamano
2025-07-31 19:36 ` Eric Sunshine
2025-07-31 6:39 ` [PATCH 3/5] string-list: unify string_list_split* functions Junio C Hamano
2025-07-31 6:39 ` [PATCH 4/5] string-list: optionally trim string pieces split by string_list_split() Junio C Hamano
2025-07-31 6:39 ` [PATCH 5/5] diff: simplify parsing of diff.colormovedws Junio C Hamano
2025-07-31 19:45 ` Eric Sunshine
2025-07-31 22:45 ` [PATCH v2 0/7] string_list_split*() updates Junio C Hamano
2025-07-31 22:46 ` [PATCH v2 1/7] string-list: report programming error with BUG Junio C Hamano
2025-07-31 22:46 ` [PATCH v2 2/7] string-list: align string_list_split() with its _in_place() counterpart Junio C Hamano
2025-08-01 2:33 ` shejialuo
2025-08-01 3:43 ` Junio C Hamano
2025-08-01 3:55 ` shejialuo
2025-08-01 23:10 ` Junio C Hamano
2025-07-31 22:46 ` [PATCH v2 3/7] string-list: unify string_list_split* functions Junio C Hamano
2025-08-01 3:00 ` shejialuo
2025-07-31 22:46 ` [PATCH v2 4/7] string-list: optionally trim string pieces split by string_list_split*() Junio C Hamano
2025-08-01 3:18 ` shejialuo
2025-08-01 3:47 ` Junio C Hamano
2025-08-01 4:04 ` shejialuo
2025-08-01 23:09 ` Junio C Hamano
2025-08-02 1:51 ` shejialuo
2025-08-01 8:47 ` Patrick Steinhardt
2025-08-01 16:26 ` Junio C Hamano
2025-07-31 22:46 ` [PATCH v2 5/7] diff: simplify parsing of diff.colormovedws Junio C Hamano
2025-08-01 8:47 ` Patrick Steinhardt
2025-07-31 22:46 ` [PATCH v2 6/7] string-list: optionally omit empty string pieces in string_list_split*() Junio C Hamano
2025-07-31 22:54 ` Eric Sunshine
2025-08-01 3:33 ` shejialuo
2025-08-01 8:47 ` Patrick Steinhardt
2025-08-01 16:38 ` Junio C Hamano
2025-07-31 22:46 ` [PATCH v2 7/7] string-list: split-then-remove-empty can be done while splitting Junio C Hamano
2025-08-01 8:47 ` Patrick Steinhardt
2025-08-01 22:04 ` [PATCH v3 0/7] string_list_split*() updates Junio C Hamano
2025-08-01 22:04 ` [PATCH v3 1/7] string-list: report programming error with BUG Junio C Hamano
2025-08-01 22:04 ` [PATCH v3 2/7] string-list: align string_list_split() with its _in_place() counterpart Junio C Hamano
2025-08-02 8:22 ` Jeff King
2025-08-02 16:34 ` Junio C Hamano
2025-08-02 18:38 ` Jeff King
2025-08-01 22:04 ` [PATCH v3 3/7] string-list: unify string_list_split* functions Junio C Hamano
2025-08-01 22:04 ` [PATCH v3 4/7] string-list: optionally trim string pieces split by string_list_split*() Junio C Hamano
2025-08-02 8:26 ` Jeff King
2025-08-02 16:38 ` Junio C Hamano
2025-08-02 18:39 ` Jeff King
2025-08-01 22:04 ` [PATCH v3 5/7] diff: simplify parsing of diff.colormovedws Junio C Hamano
2025-08-01 22:04 ` [PATCH v3 6/7] string-list: optionally omit empty string pieces in string_list_split*() Junio C Hamano
2025-08-01 22:04 ` [PATCH v3 7/7] string-list: split-then-remove-empty can be done while splitting Junio C Hamano
2025-08-03 6:52 ` [PATCH v4 0/7] string_list_split*() updates Junio C Hamano
2025-08-03 6:52 ` [PATCH v4 1/7] string-list: report programming error with BUG Junio C Hamano
2025-08-03 6:52 ` [PATCH v4 2/7] string-list: align string_list_split() with its _in_place() counterpart Junio C Hamano
2025-08-03 6:52 ` [PATCH v4 3/7] string-list: unify string_list_split* functions Junio C Hamano
2025-08-03 6:52 ` [PATCH v4 4/7] string-list: optionally trim string pieces split by string_list_split*() Junio C Hamano
2025-08-03 6:52 ` [PATCH v4 5/7] diff: simplify parsing of diff.colormovedws Junio C Hamano
2025-08-03 6:52 ` [PATCH v4 6/7] string-list: optionally omit empty string pieces in string_list_split*() Junio C Hamano
2025-08-03 6:52 ` [PATCH v4 7/7] string-list: split-then-remove-empty can be done while splitting Junio C Hamano
2025-08-04 6:24 ` [PATCH v4 0/7] string_list_split*() updates Patrick Steinhardt
2025-08-03 6:52 ` Junio C Hamano [this message]
2025-08-03 6:52 ` [PATCH v3 01/12] wt-status: avoid strbuf_split*() Junio C Hamano
2025-08-03 6:52 ` [PATCH v3 02/12] clean: do not pass strbuf by value Junio C Hamano
2025-08-03 6:52 ` [PATCH v3 03/12] clean: do not use strbuf_split*() [part 1] Junio C Hamano
2025-08-03 6:52 ` [PATCH v3 04/12] clean: do not pass the whole structure when it is not necessary Junio C Hamano
2025-08-03 6:52 ` [PATCH v3 05/12] clean: do not use strbuf_split*() [part 2] Junio C Hamano
2025-08-03 6:52 ` [PATCH v3 06/12] merge-tree: do not use strbuf_split*() Junio C Hamano
2025-08-03 6:52 ` [PATCH v3 07/12] notes: " Junio C Hamano
2025-08-03 6:53 ` [PATCH v3 08/12] config: do not use strbuf_split() Junio C Hamano
2025-08-03 6:53 ` [PATCH v3 09/12] environment: do not use strbuf_split*() Junio C Hamano
2025-08-03 6:53 ` [PATCH v3 10/12] sub-process: " Junio C Hamano
2025-08-03 6:53 ` [PATCH v3 11/12] trace2: trim_trailing_newline followed by trim is a no-op Junio C Hamano
2025-08-03 6:53 ` [PATCH v3 12/12] trace2: do not use strbuf_split*() Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250803065304.3325286-1-gitster@pobox.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).