From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Chris Torek <chris.torek@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/6] banned: mark `strok()` as banned
Date: Tue, 18 Apr 2023 15:18:40 -0400 [thread overview]
Message-ID: <cover.1681845518.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1681428696.git.me@ttaylorr.com>
Here is a medium-sized reroll of my series to add `strtok()` to the list
of banned functions.
Some notable things that have changed since the first round[1] include:
- The behavior of `string_list_split_in_place()` is unchanged, and
`string_list_split_in_place_multi()` now has the more sane behavior
of removing all delimiter characters from its output, similar to
strtok().
- We now guard against dangerous cases of assigning `list->nr` with a
new `string_list_setlen()` function.
- More test cases are added in t0063.
- `strtok_r()` is no longer on the banned list, and `strtok()` now
*is* on the list, after I forgot to remove a stray `#if 0` left over
from debugging.
As always, a range-diff is included below for convenience. Thanks in
advance for your review!
[1]: https://lore.kernel.org/git/cover.1681428696.git.me@ttaylorr.com/
Taylor Blau (6):
string-list: introduce `string_list_split_in_place_multi()`
string-list: introduce `string_list_setlen()`
t/helper/test-hashmap.c: avoid using `strtok()`
t/helper/test-oidmap.c: avoid using `strtok()`
t/helper/test-json-writer.c: avoid using `strtok()`
banned.h: mark `strtok()` as banned
banned.h | 2 +
string-list.c | 35 ++++++++++--
string-list.h | 17 ++++++
t/helper/test-hashmap.c | 30 ++++++++---
t/helper/test-json-writer.c | 76 ++++++++++++++++----------
t/helper/test-oidmap.c | 20 ++++---
t/helper/test-string-list.c | 15 ++++++
t/t0063-string-list.sh | 105 +++++++++++++++++++++++++-----------
8 files changed, 224 insertions(+), 76 deletions(-)
Range-diff against v1:
1: dda218c8c1 ! 1: 6658b231a9 string-list: introduce `string_list_split_in_place_multi()`
@@ Commit message
`_multi` variant splits the given string any any character appearing in
the string `delim`.
+ Like `strtok()`, the `_multi` variant skips past sequential delimiting
+ characters. For example:
+
+ string_list_split_in_place(&xs, xstrdup("foo::bar::baz"), ":", -1);
+
+ would place in `xs` the elements "foo", "bar", and "baz".
+
Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
- `strpbrk(2)` to find the first occurrence of *any* character in the given
- delimiter string.
+ `strcspn(2)` to move past the initial segment of characters comprised of
+ any characters in the delimiting set.
+
+ When only a single delimiting character is provided, `strcspn(2)` has
+ equivalent performance to `strchr(2)`. Modern `strcspn(2)`
+ implementations treat an empty delimiter or the singleton delimiter as a
+ special case and fall back to calling strchrnul(). Both glibc[1] and
+ musl[2] implement `strcspn(2)` this way.
Since the `_multi` variant is a generalization of the original
implementation, reimplement `string_list_split_in_place()` in terms of
the more general function by providing a single-character string for the
list of accepted delimiters.
+ To avoid regressions, update t0063 in this patch as well. Any "common"
+ test cases (i.e., those that produce the same result whether you call
+ `string_list_split()` or `string_list_split_in_place_multi()`) are
+ grouped into a loop which is parameterized over the function to test.
+
+ Any cases which aren't common (of which there is one existing case, and
+ a handful of new ones added which are specific to the `_multi` variant)
+ are tested independently.
+
+ [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
+ [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## string-list.c ##
@@ string-list.c: int string_list_split(struct string_list *list, const char *strin
-int string_list_split_in_place(struct string_list *list, char *string,
- int delim, int maxsplit)
-+int string_list_split_in_place_multi(struct string_list *list, char *string,
-+ const char *delim, int maxsplit)
++static int string_list_split_in_place_1(struct string_list *list, char *string,
++ const char *delim, int maxsplit,
++ unsigned runs)
{
int count = 0;
char *p = string, *end;
@@ string-list.c: int string_list_split_in_place(struct string_list *list, char *string,
+ die("internal error in string_list_split_in_place(): "
+ "list->strdup_strings must not be set");
+ for (;;) {
++ if (runs)
++ p += strspn(p, delim);
++
+ count++;
+ if (maxsplit >= 0 && count > maxsplit) {
string_list_append(list, p);
return count;
}
- end = strchr(p, delim);
-+ end = strpbrk(p, delim);
- if (end) {
+- if (end) {
++ end = p + strcspn(p, delim);
++ if (end && *end) {
*end = '\0';
string_list_append(list, p);
+ p = end + 1;
@@ string-list.c: int string_list_split_in_place(struct string_list *list, char *string,
}
}
}
+
++int string_list_split_in_place_multi(struct string_list *list, char *string,
++ const char *delim, int maxsplit)
++{
++ return string_list_split_in_place_1(list, string, delim, maxsplit, 1);
++}
++
+int string_list_split_in_place(struct string_list *list, char *string,
+ int delim, int maxsplit)
+{
+ char delim_s[2] = { delim, 0 };
+
-+ return string_list_split_in_place_multi(list, string, delim_s,
-+ maxsplit);
++ return string_list_split_in_place_1(list, string, delim_s, maxsplit, 0);
+}
## string-list.h ##
@@ string-list.h: int string_list_split(struct string_list *list, const char *strin
+ *
+ * The "_multi" variant splits the given string on any character
+ * appearing in "delim", and the non-"_multi" variant splits only on the
-+ * given character.
++ * given character. The "_multi" variant behaves like `strtok()` where
++ * no element contains the delimiting byte(s).
*/
+int string_list_split_in_place_multi(struct string_list *list, char *string,
+ const char *delim, int maxsplit);
int string_list_split_in_place(struct string_list *list, char *string,
int delim, int maxsplit);
#endif /* STRING_LIST_H */
+
+ ## t/helper/test-string-list.c ##
+@@ t/helper/test-string-list.c: int cmd__string_list(int argc, const char **argv)
+ return 0;
+ }
+
++ if (argc == 5 && !strcmp(argv[1], "split_in_place_multi")) {
++ struct string_list list = STRING_LIST_INIT_NODUP;
++ int i;
++ char *s = xstrdup(argv[2]);
++ const char *delim = argv[3];
++ int maxsplit = atoi(argv[4]);
++
++ i = string_list_split_in_place_multi(&list, s, delim, maxsplit);
++ printf("%d\n", i);
++ write_list(&list);
++ string_list_clear(&list, 0);
++ free(s);
++ return 0;
++ }
++
+ if (argc == 4 && !strcmp(argv[1], "filter")) {
+ /*
+ * Retain only the items that have the specified prefix.
+
+ ## t/t0063-string-list.sh ##
+@@ t/t0063-string-list.sh: test_split () {
+ "
+ }
+
+-test_split "foo:bar:baz" ":" "-1" <<EOF
+-3
+-[0]: "foo"
+-[1]: "bar"
+-[2]: "baz"
+-EOF
++test_split_in_place_multi () {
++ cat >expected &&
++ test_expect_success "split_in_place_multi $1 at $2, max $3" "
++ test-tool string-list split_in_place_multi '$1' '$2' '$3' >actual &&
++ test_cmp expected actual
++ "
++}
+
+-test_split "foo:bar:baz" ":" "0" <<EOF
+-1
+-[0]: "foo:bar:baz"
+-EOF
++for test_fn in test_split test_split_in_place_multi
++do
++ $test_fn "foo:bar:baz" ":" "-1" <<-\EOF
++ 3
++ [0]: "foo"
++ [1]: "bar"
++ [2]: "baz"
++ EOF
+
+-test_split "foo:bar:baz" ":" "1" <<EOF
+-2
+-[0]: "foo"
+-[1]: "bar:baz"
+-EOF
++ $test_fn "foo:bar:baz" ":" "0" <<-\EOF
++ 1
++ [0]: "foo:bar:baz"
++ EOF
+
+-test_split "foo:bar:baz" ":" "2" <<EOF
+-3
+-[0]: "foo"
+-[1]: "bar"
+-[2]: "baz"
+-EOF
++ $test_fn "foo:bar:baz" ":" "1" <<-\EOF
++ 2
++ [0]: "foo"
++ [1]: "bar:baz"
++ EOF
+
+-test_split "foo:bar:" ":" "-1" <<EOF
+-3
+-[0]: "foo"
+-[1]: "bar"
+-[2]: ""
+-EOF
++ $test_fn "foo:bar:baz" ":" "2" <<-\EOF
++ 3
++ [0]: "foo"
++ [1]: "bar"
++ [2]: "baz"
++ EOF
+
+-test_split "" ":" "-1" <<EOF
+-1
+-[0]: ""
+-EOF
++ $test_fn "foo:bar:" ":" "-1" <<-\EOF
++ 3
++ [0]: "foo"
++ [1]: "bar"
++ [2]: ""
++ EOF
++
++ $test_fn "" ":" "-1" <<-\EOF
++ 1
++ [0]: ""
++ EOF
++done
+
+ test_split ":" ":" "-1" <<EOF
+ 2
+@@ t/t0063-string-list.sh: test_split ":" ":" "-1" <<EOF
+ [1]: ""
+ EOF
+
++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "-1" <<-\EOF
++3
++[0]: "foo"
++[1]: "bar"
++[2]: "baz"
++EOF
++
++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "0" <<-\EOF
++1
++[0]: "foo:;:bar:;:baz"
++EOF
++
++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "1" <<-\EOF
++2
++[0]: "foo"
++[1]: "bar:;:baz"
++EOF
++
++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "2" <<-\EOF
++3
++[0]: "foo"
++[1]: "bar"
++[2]: "baz"
++EOF
++
++test_split_in_place_multi "foo:;:bar:;:" ":;" "-1" <<-\EOF
++3
++[0]: "foo"
++[1]: "bar"
++[2]: ""
++EOF
++
+ test_expect_success "test filter_string_list" '
+ test "x-" = "x$(test-tool string-list filter - y)" &&
+ test "x-" = "x$(test-tool string-list filter no y)" &&
-: ---------- > 2: 2a20ad8bc5 string-list: introduce `string_list_setlen()`
2: 0f199468a3 ! 3: 0ae07dec36 t/helper/test-hashmap.c: avoid using `strtok()`
@@ t/helper/test-hashmap.c: int cmd__hashmap(int argc, const char **argv)
+ * By doing so, we'll instead overwrite the existing
+ * entries and avoid re-allocating.
+ */
-+ parts.nr = 0;
++ string_list_setlen(&parts, 0);
/* break line into command and up to two parameters */
- cmd = strtok(line.buf, DELIM);
+ string_list_split_in_place_multi(&parts, line.buf, DELIM, 2);
3: 135222d04e ! 4: a659431e9c t/helper/test-oidmap.c: avoid using `strtok()`
@@ t/helper/test-oidmap.c: int cmd__oidmap(int argc UNUSED, const char **argv UNUSE
struct object_id oid;
+ /* see the comment in cmd__hashmap() */
-+ parts.nr = 0;
++ string_list_setlen(&parts, 0);
/* break line into command and up to two parameters */
- cmd = strtok(line.buf, DELIM);
+ string_list_split_in_place_multi(&parts, line.buf, DELIM, 2);
4: ae29d4d892 ! 5: fc6cd23698 t/helper/test-json-writer.c: avoid using `strtok()`
@@ t/helper/test-json-writer.c: static int scripted(void)
- verb = strtok(line, " ");
+ /* see the comment in cmd__hashmap() */
-+ parts.nr = 0;
++ string_list_setlen(&parts, 0);
+ /* break line into command and zero or more tokens */
+ string_list_split_in_place(&parts, line, ' ', -1);
+
5: 1d955f8bc6 ! 6: 56d2318a6d banned.h: mark `strtok()`, `strtok_r()` as banned
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- banned.h: mark `strtok()`, `strtok_r()` as banned
+ banned.h: mark `strtok()` as banned
`strtok_r()` is reentrant, but `strtok()` is not, meaning that using it
is not thread-safe.
- We could ban `strtok()` and force callers to use its reentrant
- counterpart, but there are a few drawbacks to doing so:
+ `strtok()` has a couple of drawbacks that make it undesirable to have
+ any new instances. In addition to being thread-unsafe, it also
+ encourages confusing data flows, where `strtok()` may be called from
+ multiple functions with its first argument as NULL, making it unclear
+ from the immediate context which string is being tokenized.
- - `strtok_r()` forces the caller to maintain an extra string pointer
- to pass as its `saveptr` value
+ Now that we have removed all instances of `strtok()` from the tree,
+ let's ban `strtok()` to avoid introducing new ones in the future. If new
+ callers should arise, they can either use:
- - `strtok_r()` also requires that its `saveptr` value be unmodified
- between calls.
+ - `string_list_split_in_place()`,
+ - `string_list_split_in_place_multi()`, or
+ - `strtok_r()`.
- - `strtok()` (and by extension, `strtok_r()`) is confusing when used
- across multiple functions, since the caller is supposed to pass NULL
- as its first argument after the first call. This makes it difficult
- to determine what string is actually being tokenized without clear
- dataflow.
+ Callers are encouraged to use either of the string_list functions when
+ appropriate over `strtok_r()`, since the latter suffers from the same
+ confusing data-flow problem as `strtok()` does.
- So while we could ban only `strtok()`, there really is no good reason to
- use either when callers could instead use the much friendlier
- `string_list_split_in_place()` API, which avoids the above issues.
+ But callers may prefer `strtok_r()` when the number of tokens in a given
+ string is unknown, and they want to split and process them one at a
+ time, so `strtok_r()` is left off the banned.h list.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
@@ banned.h
#define strncpy(x,y,n) BANNED(strncpy)
#undef strncat
#define strncat(x,y,n) BANNED(strncat)
-+#if 0
+#undef strtok
+#define strtok(x,y) BANNED(strtok)
-+#undef strtok_r
-+#define strtok_r(x,y,z) BANNED(strtok_r)
-+#endif
#undef sprintf
#undef vsprintf
--
2.40.0.358.g56d2318a6d
next prev parent reply other threads:[~2023-04-18 19:20 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-13 23:31 [PATCH 0/5] banned: mark `strok()`, `strtok_r()` as banned Taylor Blau
2023-04-13 23:31 ` [PATCH 1/5] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 10:10 ` Jeff King
2023-04-18 17:08 ` Taylor Blau
2023-04-13 23:31 ` [PATCH 2/5] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-18 10:23 ` Jeff King
2023-04-18 18:06 ` Taylor Blau
2023-04-13 23:31 ` [PATCH 3/5] t/helper/test-oidmap.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 4/5] t/helper/test-json-writer.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 5/5] banned.h: mark `strtok()`, `strtok_r()` as banned Taylor Blau
2023-04-14 1:39 ` Junio C Hamano
2023-04-14 2:08 ` Chris Torek
2023-04-14 13:41 ` Taylor Blau
2023-04-18 19:18 ` Taylor Blau [this message]
2023-04-18 19:18 ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 19:39 ` Junio C Hamano
2023-04-18 20:54 ` Taylor Blau
2023-04-22 11:12 ` Jeff King
2023-04-22 15:53 ` René Scharfe
2023-04-23 0:35 ` Jeff King
2023-04-24 16:24 ` Junio C Hamano
2023-04-23 2:38 ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()`t Taylor Blau
2023-04-23 2:40 ` Taylor Blau
2023-04-18 19:18 ` [PATCH v2 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-22 11:14 ` Jeff King
2023-04-18 19:18 ` [PATCH v2 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-22 11:16 ` Jeff King
2023-04-24 21:19 ` Taylor Blau
2023-04-18 19:18 ` [PATCH v2 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-18 19:18 ` [PATCH v2 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-18 19:18 ` [PATCH v2 6/6] banned.h: mark `strtok()` as banned Taylor Blau
2023-04-24 22:20 ` [PATCH v3 0/6] banned: mark `strok()`, `strtok_r()` " Taylor Blau
2023-04-24 22:20 ` [PATCH v3 1/6] string-list: multi-delimiter `string_list_split_in_place()` Taylor Blau
2023-04-24 22:20 ` [PATCH v3 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-25 6:21 ` Jeff King
2023-04-25 21:00 ` Taylor Blau
2023-04-24 22:20 ` [PATCH v3 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-24 22:20 ` [PATCH v3 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-24 22:20 ` [PATCH v3 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-25 13:57 ` Jeff Hostetler
2023-04-24 22:20 ` [PATCH v3 6/6] banned.h: mark `strtok()` and `strtok_r()` as banned Taylor Blau
2023-04-24 22:25 ` Chris Torek
2023-04-24 23:00 ` Taylor Blau
2023-04-25 6:26 ` Jeff King
2023-04-25 21:02 ` Taylor Blau
2023-04-25 6:27 ` [PATCH v3 0/6] banned: mark `strok()`, " Jeff King
2023-04-25 21:03 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1681845518.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=chris.torek@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.