git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: [PATCH v3 00/12] do not overuse strbuf_split*()
Date: Sat,  2 Aug 2025 23:52:52 -0700	[thread overview]
Message-ID: <20250803065304.3325286-1-gitster@pobox.com> (raw)
In-Reply-To: <20250801220423.1230969-1-gitster@pobox.com>

strbuf is a very good data structure to work with string data
without having to worry about running past the end of the string.

But an array of strbuf is often a wrong data structure.  You rarely
have need to be able to edit multiple strings represented by such an
array simultaneously.  And strbuf_split*() that produces result in
such a shape is a misdesigned API function.

The most common use case of strbuf_split*() family of functions
seems to be to trim away the whitespaces around each piece of split
string.  With modern string_list_split*(), it is often no longer
necessary.

This series builds on top of the other series that extends string
list API to allow string_list_split() to take more than one delimiter
bytes, and to optionally trim the resulting string pieces.

I do not plan to eradicate all the uses of strbuf_split*() myself,
not because I found some valid use cases in the existing code (I
haven't yet), but these patches would give interested others enough
material to study and mimic to continue the effort and I can safely
leave it as #leftoverbits to rewrite them.

Relative to v2, this iteration v3 adds one more clean-up step to
correct a callee that insists on taking a while strbuf when it can
work with any NUL-terminated strings, and comes with a handful of
typofixes.

Junio C Hamano (12):
  wt-status: avoid strbuf_split*()
  clean: do not pass strbuf by value
  clean: do not use strbuf_split*() [part 1]
  clean: do not pass the whole structure when it is not necessary
  clean: do not use strbuf_split*() [part 2]
  merge-tree: do not use strbuf_split*()
  notes: do not use strbuf_split*()
  config: do not use strbuf_split()
  environment: do not use strbuf_split*()
  sub-process: do not use strbuf_split*()
  trace2: trim_trailing_newline followed by trim is a no-op
  trace2: do not use strbuf_split*()

 builtin/clean.c      | 74 ++++++++++++++++++++--------------------
 builtin/merge-tree.c | 30 +++++++++--------
 builtin/notes.c      | 23 +++++++------
 config.c             | 23 ++++++-------
 environment.c        | 19 +++++++----
 sub-process.c        | 15 ++++-----
 trace2/tr2_cfg.c     | 80 +++++++++++++++-----------------------------
 wt-status.c          | 31 ++++++-----------
 8 files changed, 129 insertions(+), 166 deletions(-)

Range-diff against v2:
 1:  27de3d9a92 =  1:  2efe707054 wt-status: avoid strbuf_split*()
 2:  8f096e5a2d =  2:  899ff9c175 clean: do not pass strbuf by value
 3:  768b08907e =  3:  7a4acc3607 clean: do not use strbuf_split*() [part 1]
 -:  ---------- >  4:  4985f72ea5 clean: do not pass the whole structure when it is not necessary
 4:  0f8583e798 =  5:  4f60672f6f clean: do not use strbuf_split*() [part 2]
 5:  cefc2ec9f5 =  6:  d33091220d merge-tree: do not use strbuf_split*()
 6:  1c8ea097f6 !  7:  566e910495 notes: do not use strbuf_split*()
    @@ Metadata
      ## Commit message ##
         notes: do not use strbuf_split*()
     
    -    When reading the copy instruction from the standard input, the
    -    program reads a line, splits it into tokens at whitespace, and trims
    -    each of the tokens before using.  We no longer need to use strbuf
    -    just to be able to trimming, as string_list_split*() family now can
    -    trim while splitting a string.
    +    When reading copy instructions from the standard input, the program
    +    reads a line, splits it into tokens at whitespace, and trims each of
    +    the tokens before using.  We no longer need to use strbuf just to be
    +    able to trim, as string_list_split*() family now can trim while
    +    splitting a string.
     
    -    Retire the use of strbuf_split().
    +    Retire the use of strbuf_split() from this code path.
     
         Note that this loop is a bit sloppy in that it ensures at least
         there are two tokens on each line, but ignores if there are extra
 7:  a472688ec1 !  8:  dcecac2580 config: do not use strbuf_split()
    @@ Commit message
         config: do not use strbuf_split()
     
         When parsing an old-style GIT_CONFIG_PARAMETERS environment
    -    variable, the code parses the key=value pair by spliting them at '='
    -    into an array of strbuf's.  As strbuf_split() leafes the delimiter
    +    variable, the code parses key=value pairs by splitting them at '='
    +    into an array of strbuf's.  As strbuf_split() leaves the delimiter
         at the end of the split piece, the code has to manually trim it.
     
         If we split with string_list_split(), that becomes unnecessary.
    -    Retire the use of strbuf_split().
    +    Retire the use of strbuf_split() from this code path.
     
         Note that the max parameter of string_list_split() is of
         an ergonomically iffy design---it specifies the maximum number of
 8:  2b9957f31c =  9:  b894d4481f environment: do not use strbuf_split*()
 9:  4a5599836d = 10:  d6fd08bd76 sub-process: do not use strbuf_split*()
10:  cf6ecd2090 ! 11:  cb8e82a641 trace2: trim_trailing_newline followed by trim is a no-op
    @@ Commit message
         of a string.  If the code plans to call strbuf_trim() immediately
         after doing so, the code is better off skipping the EOL trimming in
         the first place.  After all, LF/CRLF at the end is a mere special
    -    case of whitespaces at the right end of the string, which will be
    -    removed by strbuf_rtrim().
    +    case of whitespaces at the end of the string, which will be removed
    +    by strbuf_rtrim() anyway.
     
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
11:  c2578b6b1c ! 12:  838fe56920 trace2: do not use strbuf_split*()
    @@ Metadata
      ## Commit message ##
         trace2: do not use strbuf_split*()
     
    -    tr2_cfg_load_patterns() and tr2_load_env_vars() functions are copied
    -    and pasted pair of functions that each reads an environment
    -    variable, split the value at ',' boundaries and trims the resulting
    -    string pieces into an array of strbufs.  But the code paths that
    -    later use these strbufs take no advantage of the strbuf-ness of the
    -    result (they do not benefit from <ptr,len> representation to avoid
    -    having to run strlne(<ptr>), for example).
    +    tr2_cfg_load_patterns() and tr2_load_env_vars() functions are
    +    functions with very similar structure that each reads an environment
    +    variable, splits its value at the ',' boundaries, and trims the
    +    resulting string pieces into an array of strbufs.
    +
    +    But the code paths that later use these strbufs take no advantage of
    +    the strbuf-ness of the result (they do not benefit from <ptr,len>
    +    representation to avoid having to run strlen(<ptr>), for example).
     
         Simplify the code by teaching these functions to split into a string
    -    list instead.
    +    list instead; even the trimming comes for free ;-).
     
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     
-- 
2.50.1-633-g69dfdd50af


  parent reply	other threads:[~2025-08-03  6:53 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-31  6:39 [PATCH 0/5] string_list_split*() updates Junio C Hamano
2025-07-31  6:39 ` [PATCH 1/5] string-list: report programming error with BUG Junio C Hamano
2025-07-31 19:33   ` Eric Sunshine
2025-07-31 22:16     ` Junio C Hamano
2025-07-31  6:39 ` [PATCH 2/5] string-list: align string_list_split() with its _in_place() counterpart Junio C Hamano
2025-07-31 19:36   ` Eric Sunshine
2025-07-31  6:39 ` [PATCH 3/5] string-list: unify string_list_split* functions Junio C Hamano
2025-07-31  6:39 ` [PATCH 4/5] string-list: optionally trim string pieces split by string_list_split() Junio C Hamano
2025-07-31  6:39 ` [PATCH 5/5] diff: simplify parsing of diff.colormovedws Junio C Hamano
2025-07-31 19:45   ` Eric Sunshine
2025-07-31 22:45 ` [PATCH v2 0/7] string_list_split*() updates Junio C Hamano
2025-07-31 22:46   ` [PATCH v2 1/7] string-list: report programming error with BUG Junio C Hamano
2025-07-31 22:46   ` [PATCH v2 2/7] string-list: align string_list_split() with its _in_place() counterpart Junio C Hamano
2025-08-01  2:33     ` shejialuo
2025-08-01  3:43       ` Junio C Hamano
2025-08-01  3:55         ` shejialuo
2025-08-01 23:10           ` Junio C Hamano
2025-07-31 22:46   ` [PATCH v2 3/7] string-list: unify string_list_split* functions Junio C Hamano
2025-08-01  3:00     ` shejialuo
2025-07-31 22:46   ` [PATCH v2 4/7] string-list: optionally trim string pieces split by string_list_split*() Junio C Hamano
2025-08-01  3:18     ` shejialuo
2025-08-01  3:47       ` Junio C Hamano
2025-08-01  4:04         ` shejialuo
2025-08-01 23:09           ` Junio C Hamano
2025-08-02  1:51             ` shejialuo
2025-08-01  8:47     ` Patrick Steinhardt
2025-08-01 16:26       ` Junio C Hamano
2025-07-31 22:46   ` [PATCH v2 5/7] diff: simplify parsing of diff.colormovedws Junio C Hamano
2025-08-01  8:47     ` Patrick Steinhardt
2025-07-31 22:46   ` [PATCH v2 6/7] string-list: optionally omit empty string pieces in string_list_split*() Junio C Hamano
2025-07-31 22:54     ` Eric Sunshine
2025-08-01  3:33     ` shejialuo
2025-08-01  8:47     ` Patrick Steinhardt
2025-08-01 16:38       ` Junio C Hamano
2025-07-31 22:46   ` [PATCH v2 7/7] string-list: split-then-remove-empty can be done while splitting Junio C Hamano
2025-08-01  8:47     ` Patrick Steinhardt
2025-08-01 22:04   ` [PATCH v3 0/7] string_list_split*() updates Junio C Hamano
2025-08-01 22:04     ` [PATCH v3 1/7] string-list: report programming error with BUG Junio C Hamano
2025-08-01 22:04     ` [PATCH v3 2/7] string-list: align string_list_split() with its _in_place() counterpart Junio C Hamano
2025-08-02  8:22       ` Jeff King
2025-08-02 16:34         ` Junio C Hamano
2025-08-02 18:38           ` Jeff King
2025-08-01 22:04     ` [PATCH v3 3/7] string-list: unify string_list_split* functions Junio C Hamano
2025-08-01 22:04     ` [PATCH v3 4/7] string-list: optionally trim string pieces split by string_list_split*() Junio C Hamano
2025-08-02  8:26       ` Jeff King
2025-08-02 16:38         ` Junio C Hamano
2025-08-02 18:39           ` Jeff King
2025-08-01 22:04     ` [PATCH v3 5/7] diff: simplify parsing of diff.colormovedws Junio C Hamano
2025-08-01 22:04     ` [PATCH v3 6/7] string-list: optionally omit empty string pieces in string_list_split*() Junio C Hamano
2025-08-01 22:04     ` [PATCH v3 7/7] string-list: split-then-remove-empty can be done while splitting Junio C Hamano
2025-08-03  6:52     ` [PATCH v4 0/7] string_list_split*() updates Junio C Hamano
2025-08-03  6:52       ` [PATCH v4 1/7] string-list: report programming error with BUG Junio C Hamano
2025-08-03  6:52       ` [PATCH v4 2/7] string-list: align string_list_split() with its _in_place() counterpart Junio C Hamano
2025-08-03  6:52       ` [PATCH v4 3/7] string-list: unify string_list_split* functions Junio C Hamano
2025-08-03  6:52       ` [PATCH v4 4/7] string-list: optionally trim string pieces split by string_list_split*() Junio C Hamano
2025-08-03  6:52       ` [PATCH v4 5/7] diff: simplify parsing of diff.colormovedws Junio C Hamano
2025-08-03  6:52       ` [PATCH v4 6/7] string-list: optionally omit empty string pieces in string_list_split*() Junio C Hamano
2025-08-03  6:52       ` [PATCH v4 7/7] string-list: split-then-remove-empty can be done while splitting Junio C Hamano
2025-08-04  6:24       ` [PATCH v4 0/7] string_list_split*() updates Patrick Steinhardt
2025-08-03  6:52     ` Junio C Hamano [this message]
2025-08-03  6:52       ` [PATCH v3 01/12] wt-status: avoid strbuf_split*() Junio C Hamano
2025-08-03  6:52       ` [PATCH v3 02/12] clean: do not pass strbuf by value Junio C Hamano
2025-08-03  6:52       ` [PATCH v3 03/12] clean: do not use strbuf_split*() [part 1] Junio C Hamano
2025-08-03  6:52       ` [PATCH v3 04/12] clean: do not pass the whole structure when it is not necessary Junio C Hamano
2025-08-03  6:52       ` [PATCH v3 05/12] clean: do not use strbuf_split*() [part 2] Junio C Hamano
2025-08-03  6:52       ` [PATCH v3 06/12] merge-tree: do not use strbuf_split*() Junio C Hamano
2025-08-03  6:52       ` [PATCH v3 07/12] notes: " Junio C Hamano
2025-08-03  6:53       ` [PATCH v3 08/12] config: do not use strbuf_split() Junio C Hamano
2025-08-03  6:53       ` [PATCH v3 09/12] environment: do not use strbuf_split*() Junio C Hamano
2025-08-03  6:53       ` [PATCH v3 10/12] sub-process: " Junio C Hamano
2025-08-03  6:53       ` [PATCH v3 11/12] trace2: trim_trailing_newline followed by trim is a no-op Junio C Hamano
2025-08-03  6:53       ` [PATCH v3 12/12] trace2: do not use strbuf_split*() Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250803065304.3325286-1-gitster@pobox.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).