* [PATCH v2 01/11] wt-status: avoid strbuf_split*()
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 02/11] clean: do not pass strbuf by value Junio C Hamano
` (10 subsequent siblings)
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
strbuf is a very good data structure to work with string data
without having to worry about running past the end of the string,
but strbuf_split() is a wrong API and an array of strbuf that the
function produces is a wrong thing to use in general. You do not
edit these N strings split out of a single strbuf simultaneously.
Often it is much better off to split a string into string_list and
work with the resulting strings.
wt-status.c:abbrev_oid_in_line() takes one line of rebase todo list
(like "pick e813a0200a7121b97fec535f0d0b460b0a33356c title"), and
for instructions that has an object name as the second token on the
line, replace the object name with its unique abbreviation. After
splitting these tokens out of a single line, no simultaneous edit on
any of these pieces of string that takes advantage of strbuf API
takes place. The final string is composed with strbuf API, but
these split pieces are merely used as pieces of strings and there is
no need for them to be stored in individual strbuf.
Instead, split the line into a string_list, and compose the final
string using these pieces.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
wt-status.c | 31 ++++++++++---------------------
1 file changed, 10 insertions(+), 21 deletions(-)
diff --git a/wt-status.c b/wt-status.c
index 454601afa1..a34dc144ee 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1351,8 +1351,8 @@ static int split_commit_in_progress(struct wt_status *s)
*/
static void abbrev_oid_in_line(struct strbuf *line)
{
- struct strbuf **split;
- int i;
+ struct string_list split = STRING_LIST_INIT_DUP;
+ struct object_id oid;
if (starts_with(line->buf, "exec ") ||
starts_with(line->buf, "x ") ||
@@ -1360,26 +1360,15 @@ static void abbrev_oid_in_line(struct strbuf *line)
starts_with(line->buf, "l "))
return;
- split = strbuf_split_max(line, ' ', 3);
- if (split[0] && split[1]) {
- struct object_id oid;
-
- /*
- * strbuf_split_max left a space. Trim it and re-add
- * it after abbreviation.
- */
- strbuf_trim(split[1]);
- if (!repo_get_oid(the_repository, split[1]->buf, &oid)) {
- strbuf_reset(split[1]);
- strbuf_add_unique_abbrev(split[1], &oid,
- DEFAULT_ABBREV);
- strbuf_addch(split[1], ' ');
- strbuf_reset(line);
- for (i = 0; split[i]; i++)
- strbuf_addbuf(line, split[i]);
- }
+ if ((2 <= string_list_split(&split, line->buf, " ", 2)) &&
+ !repo_get_oid(the_repository, split.items[1].string, &oid)) {
+ strbuf_reset(line);
+ strbuf_addf(line, "%s ", split.items[0].string);
+ strbuf_add_unique_abbrev(line, &oid, DEFAULT_ABBREV);
+ for (size_t i = 2; i < split.nr; i++)
+ strbuf_addf(line, " %s", split.items[i].string);
}
- strbuf_list_free(split);
+ string_list_clear(&split, 0);
}
static int read_rebase_todolist(const char *fname, struct string_list *lines)
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 02/11] clean: do not pass strbuf by value
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 01/11] wt-status: avoid strbuf_split*() Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-08-02 8:38 ` Jeff King
2025-07-31 22:54 ` [PATCH v2 03/11] clean: do not use strbuf_split*() [part 1] Junio C Hamano
` (9 subsequent siblings)
11 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
When you pass a structure by value, the callee can modify the
contents of the structure that was passed in without having to worry
about changing the structure the caller has. Passing structure by
value sometimes (but not very often) can be a valid way to give
callee a temporary variable it can freely modify.
But not a structure with members that are pointers, like a strbuf.
builtin/clean.c:list_and_choose() reads a line interactively from
the user, and passes the line (in a strbuf) to parse_choice() by
value, which then munges by replacing ',' with ' ' (to accept both
comma and space separated list of choices). But because the strbuf
passed by value still shares the underlying character array buf[],
this ends up munging the caller's strbuf contents.
This is a catastrophe waiting to happen. If the callee causes the
strbuf to be reallocated, the buf[] the caller has will become
dangling, and when the caller does strbuf_release(), it would result
in double-free.
Stop calling the function with misleading call-by-value with strbuf.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/clean.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/builtin/clean.c b/builtin/clean.c
index 053c94fc6b..224551537e 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -477,7 +477,7 @@ static int find_unique(const char *choice, struct menu_stuff *menu_stuff)
*/
static int parse_choice(struct menu_stuff *menu_stuff,
int is_single,
- struct strbuf input,
+ struct strbuf *input,
int **chosen)
{
struct strbuf **choice_list, **ptr;
@@ -485,14 +485,14 @@ static int parse_choice(struct menu_stuff *menu_stuff,
int i;
if (is_single) {
- choice_list = strbuf_split_max(&input, '\n', 0);
+ choice_list = strbuf_split_max(input, '\n', 0);
} else {
- char *p = input.buf;
+ char *p = input->buf;
do {
if (*p == ',')
*p = ' ';
} while (*p++);
- choice_list = strbuf_split_max(&input, ' ', 0);
+ choice_list = strbuf_split_max(input, ' ', 0);
}
for (ptr = choice_list; *ptr; ptr++) {
@@ -630,7 +630,7 @@ static int *list_and_choose(struct menu_opts *opts, struct menu_stuff *stuff)
nr = parse_choice(stuff,
opts->flags & MENU_OPTS_SINGLETON,
- choice,
+ &choice,
&chosen);
if (opts->flags & MENU_OPTS_SINGLETON) {
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v2 02/11] clean: do not pass strbuf by value
2025-07-31 22:54 ` [PATCH v2 02/11] clean: do not pass strbuf by value Junio C Hamano
@ 2025-08-02 8:38 ` Jeff King
2025-08-02 16:44 ` Junio C Hamano
0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2025-08-02 8:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Thu, Jul 31, 2025 at 03:54:24PM -0700, Junio C Hamano wrote:
> This is a catastrophe waiting to happen. If the callee causes the
> strbuf to be reallocated, the buf[] the caller has will become
> dangling, and when the caller does strbuf_release(), it would result
> in double-free.
>
> Stop calling the function with misleading call-by-value with strbuf.
This is definitely an improvement, though I wonder if we could go
further.
When I saw the original version of this patch, my first thought was: if
we are passing a strbuf that is not modified, then why not just pass a
const string pointer? It is more flexible for the caller and makes our
intention more clear. (A careful reader will note that this does not
pass in the length, so it is different if we expect embedded NULs, but I
don't think that is the case here).
But it does not quite work in this case because of this:
> if (is_single) {
> - choice_list = strbuf_split_max(&input, '\n', 0);
> + choice_list = strbuf_split_max(input, '\n', 0);
> } else {
> - char *p = input.buf;
> + char *p = input->buf;
> do {
> if (*p == ',')
> *p = ' ';
> } while (*p++);
> - choice_list = strbuf_split_max(&input, ' ', 0);
> + choice_list = strbuf_split_max(input, ' ', 0);
> }
We do modify the string! However, that code goes away in your next patch
when we are able to switch to a split function that handles multiple
delimiters. You do still do an in-place split, but isn't strictly
needed.
If we swapped patches 2 and 3 in this series, then the strbuf cleanup
can just become:
diff --git a/builtin/clean.c b/builtin/clean.c
index a01d130f62..e4b075df70 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -477,17 +477,17 @@ static int find_unique(const char *choice, struct menu_stuff *menu_stuff)
*/
static int parse_choice(struct menu_stuff *menu_stuff,
int is_single,
- struct strbuf input,
+ const char *input,
int **chosen)
{
- struct string_list choice = STRING_LIST_INIT_NODUP;
+ struct string_list choice = STRING_LIST_INIT_DUP;
struct string_list_item *item;
int nr = 0;
int i;
- string_list_split_in_place_f(&choice, input.buf,
- is_single ? "\n" : ", ", -1,
- STRING_LIST_SPLIT_TRIM);
+ string_list_split_f(&choice, input,
+ is_single ? "\n" : ", ", -1,
+ STRING_LIST_SPLIT_TRIM);
for_each_string_list_item(item, &choice) {
const char *string;
@@ -626,7 +626,7 @@ static int *list_and_choose(struct menu_opts *opts, struct menu_stuff *stuff)
nr = parse_choice(stuff,
opts->flags & MENU_OPTS_SINGLETON,
- choice,
+ choice.buf,
&chosen);
if (opts->flags & MENU_OPTS_SINGLETON) {
I dunno. Maybe it is nitpicking, but I think "don't take a strbuf if you
only need a string" is a good general rule. Of course there is only one
caller here, so flexibility is probably not that important.
-Peff
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v2 02/11] clean: do not pass strbuf by value
2025-08-02 8:38 ` Jeff King
@ 2025-08-02 16:44 ` Junio C Hamano
2025-08-02 18:40 ` Jeff King
0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2025-08-02 16:44 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> On Thu, Jul 31, 2025 at 03:54:24PM -0700, Junio C Hamano wrote:
>
>> This is a catastrophe waiting to happen. If the callee causes the
>> strbuf to be reallocated, the buf[] the caller has will become
>> dangling, and when the caller does strbuf_release(), it would result
>> in double-free.
>>
>> Stop calling the function with misleading call-by-value with strbuf.
>
> This is definitely an improvement, though I wonder if we could go
> further.
Yes, but in short, between these two
- think twice before you pass struct by value
- do not insist taking the whole struct, take only what you need
lessons, I happened to pick the former one more important to carve
in stone than the latter one. Both are valuable guidance we should
give to our developers, though.
> nr = parse_choice(stuff,
> opts->flags & MENU_OPTS_SINGLETON,
> - choice,
> + choice.buf,
> &chosen);
>
> if (opts->flags & MENU_OPTS_SINGLETON) {
>
> I dunno. Maybe it is nitpicking, but I think "don't take a strbuf if you
> only need a string" is a good general rule. Of course there is only one
> caller here, so flexibility is probably not that important.
But I think we engrave both lessons in the history by keeping this
step as-is, do the string_list_split_in_place_f() thing, and then
add a new patch to pass just the .buf member to parse_choice().
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v2 02/11] clean: do not pass strbuf by value
2025-08-02 16:44 ` Junio C Hamano
@ 2025-08-02 18:40 ` Jeff King
0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2025-08-02 18:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sat, Aug 02, 2025 at 09:44:47AM -0700, Junio C Hamano wrote:
> > I dunno. Maybe it is nitpicking, but I think "don't take a strbuf if you
> > only need a string" is a good general rule. Of course there is only one
> > caller here, so flexibility is probably not that important.
>
> But I think we engrave both lessons in the history by keeping this
> step as-is, do the string_list_split_in_place_f() thing, and then
> add a new patch to pass just the .buf member to parse_choice().
I would certainly be happy with that.
The other strbuf-by-value in write_worktree_linking_files() could also
just turn into a "const char *".
-Peff
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 03/11] clean: do not use strbuf_split*() [part 1]
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 01/11] wt-status: avoid strbuf_split*() Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 02/11] clean: do not pass strbuf by value Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 04/11] clean: do not use strbuf_split*() [part 2] Junio C Hamano
` (8 subsequent siblings)
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
builtin/clean.c:parse_choice() is fed a single line of input, which
is space or comma separated list of tokens, and a list of menu
items. It parses the tokens into number ranges (e.g. 1-3 that means
the first three items) or string prefix (e.g. 's' to choose the menu
item "(s)elect") that specify the elements in the menu item list,
and tells the caller which ones are chosen.
For parsing the input string, it uses strbuf_split() to split it
into bunch of strbufs. Instead use string_list_split_in_place(),
for a few reasons.
* strbuf_split() is a bad API function to use, that yields an array
of strbuf that is a bad data structure to use in general.
* string_list_split_in_place() allows you to split with "comma or
space"; the current code has to preprocess the input string to
replace comma with space because strbuf_split() does not allow
this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/clean.c | 50 +++++++++++++++++++++++--------------------------
1 file changed, 23 insertions(+), 27 deletions(-)
diff --git a/builtin/clean.c b/builtin/clean.c
index 224551537e..708cd9344c 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -480,40 +480,36 @@ static int parse_choice(struct menu_stuff *menu_stuff,
struct strbuf *input,
int **chosen)
{
- struct strbuf **choice_list, **ptr;
+ struct string_list choice = STRING_LIST_INIT_NODUP;
+ struct string_list_item *item;
int nr = 0;
int i;
- if (is_single) {
- choice_list = strbuf_split_max(input, '\n', 0);
- } else {
- char *p = input->buf;
- do {
- if (*p == ',')
- *p = ' ';
- } while (*p++);
- choice_list = strbuf_split_max(input, ' ', 0);
- }
+ string_list_split_in_place_f(&choice, input->buf,
+ is_single ? "\n" : ", ", -1,
+ STRING_LIST_SPLIT_TRIM);
- for (ptr = choice_list; *ptr; ptr++) {
- char *p;
- int choose = 1;
+ for_each_string_list_item(item, &choice) {
+ const char *string;
+ int choose;
int bottom = 0, top = 0;
int is_range, is_number;
- strbuf_trim(*ptr);
- if (!(*ptr)->len)
+ string = item->string;
+ if (!*string)
continue;
/* Input that begins with '-'; unchoose */
- if (*(*ptr)->buf == '-') {
+ if (string[0] == '-') {
choose = 0;
- strbuf_remove((*ptr), 0, 1);
+ string++;
+ } else {
+ choose = 1;
}
is_range = 0;
is_number = 1;
- for (p = (*ptr)->buf; *p; p++) {
+ for (const char *p = string; *p; p++) {
if ('-' == *p) {
if (!is_range) {
is_range = 1;
@@ -531,27 +527,27 @@ static int parse_choice(struct menu_stuff *menu_stuff,
}
if (is_number) {
- bottom = atoi((*ptr)->buf);
+ bottom = atoi(string);
top = bottom;
} else if (is_range) {
- bottom = atoi((*ptr)->buf);
+ bottom = atoi(string);
/* a range can be specified like 5-7 or 5- */
- if (!*(strchr((*ptr)->buf, '-') + 1))
+ if (!*(strchr(string, '-') + 1))
top = menu_stuff->nr;
else
- top = atoi(strchr((*ptr)->buf, '-') + 1);
- } else if (!strcmp((*ptr)->buf, "*")) {
+ top = atoi(strchr(string, '-') + 1);
+ } else if (!strcmp(string, "*")) {
bottom = 1;
top = menu_stuff->nr;
} else {
- bottom = find_unique((*ptr)->buf, menu_stuff);
+ bottom = find_unique(string, menu_stuff);
top = bottom;
}
if (top <= 0 || bottom <= 0 || top > menu_stuff->nr || bottom > top ||
(is_single && bottom != top)) {
clean_print_color(CLEAN_COLOR_ERROR);
- printf(_("Huh (%s)?\n"), (*ptr)->buf);
+ printf(_("Huh (%s)?\n"), string);
clean_print_color(CLEAN_COLOR_RESET);
continue;
}
@@ -560,7 +556,7 @@ static int parse_choice(struct menu_stuff *menu_stuff,
(*chosen)[i-1] = choose;
}
- strbuf_list_free(choice_list);
+ string_list_clear(&choice, 0);
for (i = 0; i < menu_stuff->nr; i++)
nr += (*chosen)[i];
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 04/11] clean: do not use strbuf_split*() [part 2]
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (2 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 03/11] clean: do not use strbuf_split*() [part 1] Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 05/11] merge-tree: do not use strbuf_split*() Junio C Hamano
` (7 subsequent siblings)
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
builtin/clean.c:filter_by_patterns_cmd() interactively reads a line
that has exclude patterns from the user and splits the line into a
list of patterns. It uses the strbuf_split() so that each split
piece can then trimmed.
There is no need to use strbuf anymore, thanks to the recent
enhancement to string_list_split*() family that allows us to trim
the pieces split into a string_list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/clean.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/builtin/clean.c b/builtin/clean.c
index 708cd9344c..56dabf7e03 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -674,12 +674,13 @@ static int filter_by_patterns_cmd(void)
{
struct dir_struct dir = DIR_INIT;
struct strbuf confirm = STRBUF_INIT;
- struct strbuf **ignore_list;
- struct string_list_item *item;
struct pattern_list *pl;
int changed = -1, i;
for (;;) {
+ struct string_list ignore_list = STRING_LIST_INIT_NODUP;
+ struct string_list_item *item;
+
if (!del_list.nr)
break;
@@ -697,14 +698,15 @@ static int filter_by_patterns_cmd(void)
break;
pl = add_pattern_list(&dir, EXC_CMDL, "manual exclude");
- ignore_list = strbuf_split_max(&confirm, ' ', 0);
- for (i = 0; ignore_list[i]; i++) {
- strbuf_trim(ignore_list[i]);
- if (!ignore_list[i]->len)
- continue;
+ string_list_split_in_place_f(&ignore_list, confirm.buf, " ", -1,
+ STRING_LIST_SPLIT_TRIM);
- add_pattern(ignore_list[i]->buf, "", 0, pl, -(i+1));
+ for (i = 0; i < ignore_list.nr; i++) {
+ item = &ignore_list.items[i];
+ if (!*item->string)
+ continue;
+ add_pattern(item->string, "", 0, pl, -(i+1));
}
changed = 0;
@@ -725,7 +727,7 @@ static int filter_by_patterns_cmd(void)
clean_print_color(CLEAN_COLOR_RESET);
}
- strbuf_list_free(ignore_list);
+ string_list_clear(&ignore_list, 0);
dir_clear(&dir);
}
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 05/11] merge-tree: do not use strbuf_split*()
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (3 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 04/11] clean: do not use strbuf_split*() [part 2] Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-08-02 8:55 ` Jeff King
2025-07-31 22:54 ` [PATCH v2 06/11] notes: " Junio C Hamano
` (6 subsequent siblings)
11 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
When reading merge instructions from the standard input, the program
reads from the standard input, splits the line into tokens at
whitespace, and trims each of them before using. We no longer need
to use strbuf just for trimming, as string_list_split*() family can
trim while splitting a string.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/merge-tree.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index cf8b06cadc..70235856d7 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -618,32 +618,34 @@ int cmd_merge_tree(int argc,
"--merge-base", "--stdin");
line_termination = '\0';
while (strbuf_getline_lf(&buf, stdin) != EOF) {
- struct strbuf **split;
+ struct string_list split = STRING_LIST_INIT_NODUP;
const char *input_merge_base = NULL;
- split = strbuf_split(&buf, ' ');
- if (!split[0] || !split[1])
+ string_list_split_in_place_f(&split, buf.buf, " ", -1,
+ STRING_LIST_SPLIT_TRIM);
+
+ if (split.nr < 2)
die(_("malformed input line: '%s'."), buf.buf);
- strbuf_rtrim(split[0]);
- strbuf_rtrim(split[1]);
/* parse the merge-base */
- if (!strcmp(split[1]->buf, "--")) {
- input_merge_base = split[0]->buf;
+ if (!strcmp(split.items[1].string, "--")) {
+ input_merge_base = split.items[0].string;
}
- if (input_merge_base && split[2] && split[3] && !split[4]) {
- strbuf_rtrim(split[2]);
- strbuf_rtrim(split[3]);
- real_merge(&o, input_merge_base, split[2]->buf, split[3]->buf, prefix);
- } else if (!input_merge_base && !split[2]) {
- real_merge(&o, NULL, split[0]->buf, split[1]->buf, prefix);
+ if (input_merge_base && split.nr == 4) {
+ real_merge(&o, input_merge_base,
+ split.items[2].string, split.items[3].string,
+ prefix);
+ } else if (!input_merge_base && split.nr == 2) {
+ real_merge(&o, NULL,
+ split.items[0].string, split.items[1].string,
+ prefix);
} else {
die(_("malformed input line: '%s'."), buf.buf);
}
maybe_flush_or_die(stdout, "stdout");
- strbuf_list_free(split);
+ string_list_clear(&split, 0);
}
strbuf_release(&buf);
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 06/11] notes: do not use strbuf_split*()
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (4 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 05/11] merge-tree: do not use strbuf_split*() Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 07/11] config: do not use strbuf_split() Junio C Hamano
` (5 subsequent siblings)
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
When reading copy instructions from the standard input, the program
reads a line, splits it into tokens at whitespace, and trims each of
the tokens before using. We no longer need to use strbuf just to be
able to trim, as string_list_split*() family now can trim while
splitting a string.
Retire the use of strbuf_split() from this code path.
Note that this loop is a bit sloppy in that it ensures at least
there are two tokens on each line, but ignores if there are extra
tokens on the line. Tightening it is outside the scope of this
series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/notes.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/builtin/notes.c b/builtin/notes.c
index a9529b1696..4fb36a743c 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -375,18 +375,19 @@ static int notes_copy_from_stdin(int force, const char *rewrite_cmd)
while (strbuf_getline_lf(&buf, stdin) != EOF) {
struct object_id from_obj, to_obj;
- struct strbuf **split;
+ struct string_list split = STRING_LIST_INIT_NODUP;
int err;
- split = strbuf_split(&buf, ' ');
- if (!split[0] || !split[1])
+ string_list_split_in_place_f(&split, buf.buf, " ", -1,
+ STRING_LIST_SPLIT_TRIM);
+ if (split.nr < 2)
die(_("malformed input line: '%s'."), buf.buf);
- strbuf_rtrim(split[0]);
- strbuf_rtrim(split[1]);
- if (repo_get_oid(the_repository, split[0]->buf, &from_obj))
- die(_("failed to resolve '%s' as a valid ref."), split[0]->buf);
- if (repo_get_oid(the_repository, split[1]->buf, &to_obj))
- die(_("failed to resolve '%s' as a valid ref."), split[1]->buf);
+ if (repo_get_oid(the_repository, split.items[0].string, &from_obj))
+ die(_("failed to resolve '%s' as a valid ref."),
+ split.items[0].string);
+ if (repo_get_oid(the_repository, split.items[1].string, &to_obj))
+ die(_("failed to resolve '%s' as a valid ref."),
+ split.items[1].string);
if (rewrite_cmd)
err = copy_note_for_rewrite(c, &from_obj, &to_obj);
@@ -396,11 +397,11 @@ static int notes_copy_from_stdin(int force, const char *rewrite_cmd)
if (err) {
error(_("failed to copy notes from '%s' to '%s'"),
- split[0]->buf, split[1]->buf);
+ split.items[0].string, split.items[1].string);
ret = 1;
}
- strbuf_list_free(split);
+ string_list_clear(&split, 0);
}
if (!rewrite_cmd) {
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 07/11] config: do not use strbuf_split()
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (5 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 06/11] notes: " Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 08/11] environment: do not use strbuf_split*() Junio C Hamano
` (4 subsequent siblings)
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
When parsing an old-style GIT_CONFIG_PARAMETERS environment
variable, the code parses key=value pairs by splitting them at '='
into an array of strbuf's. As strbuf_split() leaves the delimiter
at the end of the split piece, the code has to manually trim it.
If we split with string_list_split(), that becomes unnecessary.
Retire the use of strbuf_split() from this code path.
Note that the max parameter of string_list_split() is of
an ergonomically iffy design---it specifies the maximum number of
times the function is allowed to split, which means that in order to
split a text into up to 2 pieces, you have to pass 1, not 2.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
config.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/config.c b/config.c
index 8a2d0b7916..1769f15ee3 100644
--- a/config.c
+++ b/config.c
@@ -638,31 +638,28 @@ int git_config_parse_parameter(const char *text,
config_fn_t fn, void *data)
{
const char *value;
- struct strbuf **pair;
+ struct string_list pair = STRING_LIST_INIT_DUP;
int ret;
struct key_value_info kvi = KVI_INIT;
kvi_from_param(&kvi);
- pair = strbuf_split_str(text, '=', 2);
- if (!pair[0])
+ string_list_split(&pair, text, "=", 1);
+ if (!pair.nr)
return error(_("bogus config parameter: %s"), text);
- if (pair[0]->len && pair[0]->buf[pair[0]->len - 1] == '=') {
- strbuf_setlen(pair[0], pair[0]->len - 1);
- value = pair[1] ? pair[1]->buf : "";
- } else {
+ if (pair.nr == 1)
value = NULL;
- }
+ else
+ value = pair.items[1].string;
- strbuf_trim(pair[0]);
- if (!pair[0]->len) {
- strbuf_list_free(pair);
+ if (!*pair.items[0].string) {
+ string_list_clear(&pair, 0);
return error(_("bogus config parameter: %s"), text);
}
- ret = config_parse_pair(pair[0]->buf, value, &kvi, fn, data);
- strbuf_list_free(pair);
+ ret = config_parse_pair(pair.items[0].string, value, &kvi, fn, data);
+ string_list_clear(&pair, 0);
return ret;
}
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 08/11] environment: do not use strbuf_split*()
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (6 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 07/11] config: do not use strbuf_split() Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 09/11] sub-process: " Junio C Hamano
` (3 subsequent siblings)
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
environment.c:get_git_namespace() learns the raw namespace from an
environment variable, splits it at "/", and appends them after
"refs/namespaces/"; the reason why it splits first is so that an
empty string resulting from double slashes can be omitted.
The split pieces do not need to be edited in any way, so an array of
strbufs is a wrong data structure to use. Instead split into a
string list and use the pieces from there.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
environment.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/environment.c b/environment.c
index 7c2480b22e..ab3ed08433 100644
--- a/environment.c
+++ b/environment.c
@@ -163,10 +163,10 @@ int have_git_dir(void)
const char *get_git_namespace(void)
{
static const char *namespace;
-
struct strbuf buf = STRBUF_INIT;
- struct strbuf **components, **c;
const char *raw_namespace;
+ struct string_list components = STRING_LIST_INIT_DUP;
+ struct string_list_item *item;
if (namespace)
return namespace;
@@ -178,12 +178,17 @@ const char *get_git_namespace(void)
}
strbuf_addstr(&buf, raw_namespace);
- components = strbuf_split(&buf, '/');
+
+ string_list_split(&components, buf.buf, "/", -1);
strbuf_reset(&buf);
- for (c = components; *c; c++)
- if (strcmp((*c)->buf, "/") != 0)
- strbuf_addf(&buf, "refs/namespaces/%s", (*c)->buf);
- strbuf_list_free(components);
+
+ for_each_string_list_item(item, &components) {
+ if (item->string[0])
+ strbuf_addf(&buf, "refs/namespaces/%s/", item->string);
+ }
+ string_list_clear(&components, 0);
+
+ strbuf_trim_trailing_dir_sep(&buf);
if (check_refname_format(buf.buf, 0))
die(_("bad git namespace path \"%s\""), raw_namespace);
strbuf_addch(&buf, '/');
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 09/11] sub-process: do not use strbuf_split*()
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (7 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 08/11] environment: do not use strbuf_split*() Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 10/11] trace2: trim_trailing_newline followed by trim is a no-op Junio C Hamano
` (2 subsequent siblings)
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
The code to read status from subprocess reads one packet line and
tries to find "status=<foo>". It is way overkill to split the line
into an array of two strbufs to extract <foo>.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
sub-process.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/sub-process.c b/sub-process.c
index 1daf5a9752..83bf0a0e82 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -30,23 +30,20 @@ struct subprocess_entry *subprocess_find_entry(struct hashmap *hashmap, const ch
int subprocess_read_status(int fd, struct strbuf *status)
{
- struct strbuf **pair;
- char *line;
int len;
for (;;) {
+ char *line;
+ const char *value;
+
len = packet_read_line_gently(fd, NULL, &line);
if ((len < 0) || !line)
break;
- pair = strbuf_split_str(line, '=', 2);
- if (pair[0] && pair[0]->len && pair[1]) {
+ if (skip_prefix(line, "status=", &value)) {
/* the last "status=<foo>" line wins */
- if (!strcmp(pair[0]->buf, "status=")) {
- strbuf_reset(status);
- strbuf_addbuf(status, pair[1]);
- }
+ strbuf_reset(status);
+ strbuf_addstr(status, value);
}
- strbuf_list_free(pair);
}
return (len < 0) ? len : 0;
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 10/11] trace2: trim_trailing_newline followed by trim is a no-op
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (8 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 09/11] sub-process: " Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-07-31 22:54 ` [PATCH v2 11/11] trace2: do not use strbuf_split*() Junio C Hamano
2025-08-02 9:08 ` [PATCH v2 00/11] do not overuse strbuf_split*() Jeff King
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
strbuf_trim_trailing_newline() removes a LF or a CRLF from the tail
of a string. If the code plans to call strbuf_trim() immediately
after doing so, the code is better off skipping the EOL trimming in
the first place. After all, LF/CRLF at the end is a mere special
case of whitespaces at the end of the string, which will be removed
by strbuf_rtrim() anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
trace2/tr2_cfg.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c
index 22a99a0682..2b7cfcd10c 100644
--- a/trace2/tr2_cfg.c
+++ b/trace2/tr2_cfg.c
@@ -39,7 +39,6 @@ static int tr2_cfg_load_patterns(void)
if (buf->len && buf->buf[buf->len - 1] == ',')
strbuf_setlen(buf, buf->len - 1);
- strbuf_trim_trailing_newline(*s);
strbuf_trim(*s);
}
@@ -78,7 +77,6 @@ static int tr2_load_env_vars(void)
if (buf->len && buf->buf[buf->len - 1] == ',')
strbuf_setlen(buf, buf->len - 1);
- strbuf_trim_trailing_newline(*s);
strbuf_trim(*s);
}
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v2 11/11] trace2: do not use strbuf_split*()
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (9 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 10/11] trace2: trim_trailing_newline followed by trim is a no-op Junio C Hamano
@ 2025-07-31 22:54 ` Junio C Hamano
2025-08-02 9:08 ` [PATCH v2 00/11] do not overuse strbuf_split*() Jeff King
11 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2025-07-31 22:54 UTC (permalink / raw)
To: git
tr2_cfg_load_patterns() and tr2_load_env_vars() functions are
functions with very similar structure that each reads an environment
variable, splits its value at the ',' boundaries, and trims the
resulting string pieces into an array of strbufs.
But the code paths that later use these strbufs take no advantage of
the strbuf-ness of the result (they do not benefit from <ptr,len>
representation to avoid having to run strlen(<ptr>), for example).
Simplify the code by teaching these functions to split into a string
list instead; even the trimming comes for free ;-).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
trace2/tr2_cfg.c | 78 +++++++++++++++++-------------------------------
1 file changed, 27 insertions(+), 51 deletions(-)
diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c
index 2b7cfcd10c..bbcfeda60a 100644
--- a/trace2/tr2_cfg.c
+++ b/trace2/tr2_cfg.c
@@ -8,87 +8,65 @@
#include "trace2/tr2_sysenv.h"
#include "wildmatch.h"
-static struct strbuf **tr2_cfg_patterns;
-static int tr2_cfg_count_patterns;
+static struct string_list tr2_cfg_patterns = STRING_LIST_INIT_DUP;
static int tr2_cfg_loaded;
-static struct strbuf **tr2_cfg_env_vars;
-static int tr2_cfg_env_vars_count;
+static struct string_list tr2_cfg_env_vars = STRING_LIST_INIT_DUP;
static int tr2_cfg_env_vars_loaded;
/*
* Parse a string containing a comma-delimited list of config keys
- * or wildcard patterns into a list of strbufs.
+ * or wildcard patterns into a string list.
*/
-static int tr2_cfg_load_patterns(void)
+static size_t tr2_cfg_load_patterns(void)
{
- struct strbuf **s;
const char *envvar;
if (tr2_cfg_loaded)
- return tr2_cfg_count_patterns;
+ return tr2_cfg_patterns.nr;
tr2_cfg_loaded = 1;
envvar = tr2_sysenv_get(TR2_SYSENV_CFG_PARAM);
if (!envvar || !*envvar)
- return tr2_cfg_count_patterns;
+ return tr2_cfg_patterns.nr;
- tr2_cfg_patterns = strbuf_split_buf(envvar, strlen(envvar), ',', -1);
- for (s = tr2_cfg_patterns; *s; s++) {
- struct strbuf *buf = *s;
-
- if (buf->len && buf->buf[buf->len - 1] == ',')
- strbuf_setlen(buf, buf->len - 1);
- strbuf_trim(*s);
- }
-
- tr2_cfg_count_patterns = s - tr2_cfg_patterns;
- return tr2_cfg_count_patterns;
+ string_list_split_f(&tr2_cfg_patterns, envvar, ",", -1,
+ STRING_LIST_SPLIT_TRIM);
+ return tr2_cfg_patterns.nr;
}
void tr2_cfg_free_patterns(void)
{
- if (tr2_cfg_patterns)
- strbuf_list_free(tr2_cfg_patterns);
- tr2_cfg_count_patterns = 0;
+ if (tr2_cfg_patterns.nr)
+ string_list_clear(&tr2_cfg_patterns, 0);
tr2_cfg_loaded = 0;
}
/*
* Parse a string containing a comma-delimited list of environment variable
- * names into a list of strbufs.
+ * names into a string list.
*/
-static int tr2_load_env_vars(void)
+static size_t tr2_load_env_vars(void)
{
- struct strbuf **s;
const char *varlist;
if (tr2_cfg_env_vars_loaded)
- return tr2_cfg_env_vars_count;
+ return tr2_cfg_env_vars.nr;
tr2_cfg_env_vars_loaded = 1;
varlist = tr2_sysenv_get(TR2_SYSENV_ENV_VARS);
if (!varlist || !*varlist)
- return tr2_cfg_env_vars_count;
-
- tr2_cfg_env_vars = strbuf_split_buf(varlist, strlen(varlist), ',', -1);
- for (s = tr2_cfg_env_vars; *s; s++) {
- struct strbuf *buf = *s;
-
- if (buf->len && buf->buf[buf->len - 1] == ',')
- strbuf_setlen(buf, buf->len - 1);
- strbuf_trim(*s);
- }
+ return tr2_cfg_env_vars.nr;
- tr2_cfg_env_vars_count = s - tr2_cfg_env_vars;
- return tr2_cfg_env_vars_count;
+ string_list_split_f(&tr2_cfg_env_vars, varlist, ",", -1,
+ STRING_LIST_SPLIT_TRIM);
+ return tr2_cfg_env_vars.nr;
}
void tr2_cfg_free_env_vars(void)
{
- if (tr2_cfg_env_vars)
- strbuf_list_free(tr2_cfg_env_vars);
- tr2_cfg_env_vars_count = 0;
+ if (tr2_cfg_env_vars.nr)
+ string_list_clear(&tr2_cfg_env_vars, 0);
tr2_cfg_env_vars_loaded = 0;
}
@@ -103,12 +81,11 @@ struct tr2_cfg_data {
static int tr2_cfg_cb(const char *key, const char *value,
const struct config_context *ctx, void *d)
{
- struct strbuf **s;
+ struct string_list_item *item;
struct tr2_cfg_data *data = (struct tr2_cfg_data *)d;
- for (s = tr2_cfg_patterns; *s; s++) {
- struct strbuf *buf = *s;
- int wm = wildmatch(buf->buf, key, WM_CASEFOLD);
+ for_each_string_list_item(item, &tr2_cfg_patterns) {
+ int wm = wildmatch(item->string, key, WM_CASEFOLD);
if (wm == WM_MATCH) {
trace2_def_param_fl(data->file, data->line, key, value,
ctx->kvi);
@@ -130,17 +107,16 @@ void tr2_cfg_list_config_fl(const char *file, int line)
void tr2_list_env_vars_fl(const char *file, int line)
{
struct key_value_info kvi = KVI_INIT;
- struct strbuf **s;
+ struct string_list_item *item;
kvi_from_param(&kvi);
if (tr2_load_env_vars() <= 0)
return;
- for (s = tr2_cfg_env_vars; *s; s++) {
- struct strbuf *buf = *s;
- const char *val = getenv(buf->buf);
+ for_each_string_list_item(item, &tr2_cfg_env_vars) {
+ const char *val = getenv(item->string);
if (val && *val)
- trace2_def_param_fl(file, line, buf->buf, val, &kvi);
+ trace2_def_param_fl(file, line, item->string, val, &kvi);
}
}
--
2.50.1-618-g45d530d26b
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v2 00/11] do not overuse strbuf_split*()
2025-07-31 22:54 ` [PATCH v2 00/11] do not overuse strbuf_split*() Junio C Hamano
` (10 preceding siblings ...)
2025-07-31 22:54 ` [PATCH v2 11/11] trace2: do not use strbuf_split*() Junio C Hamano
@ 2025-08-02 9:08 ` Jeff King
2025-08-02 17:09 ` Junio C Hamano
11 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2025-08-02 9:08 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Thu, Jul 31, 2025 at 03:54:22PM -0700, Junio C Hamano wrote:
> strbuf is a very good data structure to work with string data
> without having to worry about running past the end of the string.
>
> But an array of strbuf is often a wrong data structure. You rarely
> have need to be able to edit multiple strings represented by such an
> array simultaneously. And strbuf_split*() that produces result in
> such a shape is a misdesigned API function.
Yeah, I agree it's a bad API, and I'd be happy if we could eventually
get rid of it.
Conversion to string_list mostly makes sense (though I especially like
the case in sub-process.c where we can just parse with skip_prefix()).
I looked over the patches and didn't see anything objectionable. But I
tried to think of subtle incompatibilities we might run into:
1. I saw the different handling of "max" you had to deal with in one
patch. Yuck. It might be worth tweaking string_list_split() to
count "pieces" (which is how every other split function I've seen,
like the one in perl, works). But that can be done later (and is
tricky to do safely, since we wouldn't be changing the function
signature).
2. Is the handling of repeated delimiters the same? E.g., if I split
on "/" and we see "foo//bar", do both split implementations yield
the same output. I could see either of ("foo", "", "bar") and
("foo", "bar") being produced.
And...
> The most common use case of strbuf_split*() family of functions
> seems to be to trim away the whitespaces around each piece of split
> string. With modern string_list_split*(), it is often no longer
> necessary.
Some of these sites do strbuf_rtrim() on the split pieces. But
STRING_LIST_SPLIT_TRIM does both left and right. Is this OK? I think so
because in both cases we are already splitting on space, so we wouldn't
expect left-hand spaces (of course you could have a stray tab or
something, but I suspect the new code matches the intent more closely).
-Peff
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v2 00/11] do not overuse strbuf_split*()
2025-08-02 9:08 ` [PATCH v2 00/11] do not overuse strbuf_split*() Jeff King
@ 2025-08-02 17:09 ` Junio C Hamano
2025-08-02 18:47 ` Jeff King
0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2025-08-02 17:09 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> On Thu, Jul 31, 2025 at 03:54:22PM -0700, Junio C Hamano wrote:
>
>> strbuf is a very good data structure to work with string data
>> without having to worry about running past the end of the string.
>>
>> But an array of strbuf is often a wrong data structure. You rarely
>> have need to be able to edit multiple strings represented by such an
>> array simultaneously. And strbuf_split*() that produces result in
>> such a shape is a misdesigned API function.
>
> Yeah, I agree it's a bad API, and I'd be happy if we could eventually
> get rid of it.
I'd be happy to mark it #leftoverbits for aspiring Git hackers to
look at. Perhaps a good microproject material (there are still
several instances left---I talked Christian out of adding more in
the series he is iterating right now).
> Conversion to string_list mostly makes sense (though I especially like
> the case in sub-process.c where we can just parse with skip_prefix()).
Yes, for that one, strbuf_split() was really killing a butterfly
with a musket or cracking a nut with a sledgehammer; or whatever
translation of the idiom you'd like ;-).
> I looked over the patches and didn't see anything objectionable. But I
> tried to think of subtle incompatibilities we might run into:
>
> 1. I saw the different handling of "max" you had to deal with in one
> patch. Yuck. It might be worth tweaking string_list_split() to
> count "pieces" (which is how every other split function I've seen,
> like the one in perl, works). But that can be done later (and is
> tricky to do safely, since we wouldn't be changing the function
> signature).
Yes, I think I whined about it in one of the commit log messages.
That part is one of the things string_list API gets wrong, which we
may want to fix.
> 2. Is the handling of repeated delimiters the same? E.g., if I split
> on "/" and we see "foo//bar", do both split implementations yield
> the same output. I could see either of ("foo", "", "bar") and
> ("foo", "bar") being produced.
Same between? strbuf_split*() seems to be happy to produce an empty
piece in the result (and there is no "omit empty ones" feature).
You can choose with STRING_LIST_SPLIT_NOEMPTY both results, but for
a straight conversion, the vanilla one without that flag would do
what we want, I think.
> And...
>
>> The most common use case of strbuf_split*() family of functions
>> seems to be to trim away the whitespaces around each piece of split
>> string. With modern string_list_split*(), it is often no longer
>> necessary.
>
> Some of these sites do strbuf_rtrim() on the split pieces. But
> STRING_LIST_SPLIT_TRIM does both left and right. Is this OK? I think so
> because in both cases we are already splitting on space, so we wouldn't
> expect left-hand spaces (of course you could have a stray tab or
> something, but I suspect the new code matches the intent more closely).
I did wonder about these "rtrim-only" callsites strbuf_split(), but
I did not think they needed ltrim, since they were splitting at SP
and rejected when the element were empty (remember, strbuf_split()
happily creates an empty element in the result).
Their use of rtrim() is primarily when they split at SP they want to
remove that SP (remember, another misdesign of strbuf_split() is
that it leaves the delimiter itself at the end of each split piece).
Both 05/11 step for merge-tree and 06/11 step for notes are good
examples.
So in short, yes, the new code is being a bit more lenient (the
original parsers were more strict and insisted on non-trimming
behaviour). We might consider tightening them by removing the
STRING_LIST_SPLIT_TRIM bit to be more faithful to the original, but
as you may have noticed, doing so would make it more strict at the
right end. The original that split at SP and then rtrimmed allowed
trailing HT after the data to be removed, but I do not think it was
part of the designed behaviour of the original anyway. So I am
inclined to say we should probably drop STRING_LIST_SPLIT_TRIM;
nobody would scream.
Thanks for carefully reading.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v2 00/11] do not overuse strbuf_split*()
2025-08-02 17:09 ` Junio C Hamano
@ 2025-08-02 18:47 ` Jeff King
0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2025-08-02 18:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sat, Aug 02, 2025 at 10:09:59AM -0700, Junio C Hamano wrote:
> > 2. Is the handling of repeated delimiters the same? E.g., if I split
> > on "/" and we see "foo//bar", do both split implementations yield
> > the same output. I could see either of ("foo", "", "bar") and
> > ("foo", "bar") being produced.
>
> Same between? strbuf_split*() seems to be happy to produce an empty
> piece in the result (and there is no "omit empty ones" feature).
> You can choose with STRING_LIST_SPLIT_NOEMPTY both results, but for
> a straight conversion, the vanilla one without that flag would do
> what we want, I think.
Yeah, I meant same between strbuf_split and string_list_split. I didn't
actually peek into the behavior of either. It was more of a "did you
check this it?" question. It sounds like you did. Good.
> > Some of these sites do strbuf_rtrim() on the split pieces. But
> > STRING_LIST_SPLIT_TRIM does both left and right. Is this OK? I think so
> > because in both cases we are already splitting on space, so we wouldn't
> > expect left-hand spaces (of course you could have a stray tab or
> > something, but I suspect the new code matches the intent more closely).
>
> I did wonder about these "rtrim-only" callsites strbuf_split(), but
> I did not think they needed ltrim, since they were splitting at SP
> and rejected when the element were empty (remember, strbuf_split()
> happily creates an empty element in the result).
>
> Their use of rtrim() is primarily when they split at SP they want to
> remove that SP (remember, another misdesign of strbuf_split() is
> that it leaves the delimiter itself at the end of each split piece).
> Both 05/11 step for merge-tree and 06/11 step for notes are good
> examples.
Makes sense. I did not even realize all of the horrible gotchas of
strbuf_split(). :)
> So in short, yes, the new code is being a bit more lenient (the
> original parsers were more strict and insisted on non-trimming
> behaviour). We might consider tightening them by removing the
> STRING_LIST_SPLIT_TRIM bit to be more faithful to the original, but
> as you may have noticed, doing so would make it more strict at the
> right end. The original that split at SP and then rtrimmed allowed
> trailing HT after the data to be removed, but I do not think it was
> part of the designed behaviour of the original anyway. So I am
> inclined to say we should probably drop STRING_LIST_SPLIT_TRIM;
> nobody would scream.
Hmm, yeah, I see what you mean. The documentation for merge-tree is
quite vague about whether it is a single space or if multiple spaces are
allowed. I'd be inclined to leave it on the lenient side, since I don't
think there is any ambiguity.
The git-notes one does specifically say "SP", so anybody who would be
broken by losing the extra trim is already violating the docs. But
again, I'd be tempted to just err on the friendly side.
I doubt either case matters too much either way, though.
-Peff
^ permalink raw reply [flat|nested] 33+ messages in thread