All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Chris Torek <chris.torek@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/6] banned: mark `strok()` as banned
Date: Tue, 18 Apr 2023 15:18:40 -0400	[thread overview]
Message-ID: <cover.1681845518.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1681428696.git.me@ttaylorr.com>

Here is a medium-sized reroll of my series to add `strtok()` to the list
of banned functions.

Some notable things that have changed since the first round[1] include:

  - The behavior of `string_list_split_in_place()` is unchanged, and
    `string_list_split_in_place_multi()` now has the more sane behavior
    of removing all delimiter characters from its output, similar to
    strtok().

  - We now guard against dangerous cases of assigning `list->nr` with a
    new `string_list_setlen()` function.

  - More test cases are added in t0063.

  - `strtok_r()` is no longer on the banned list, and `strtok()` now
    *is* on the list, after I forgot to remove a stray `#if 0` left over
    from debugging.

As always, a range-diff is included below for convenience. Thanks in
advance for your review!

[1]: https://lore.kernel.org/git/cover.1681428696.git.me@ttaylorr.com/

Taylor Blau (6):
  string-list: introduce `string_list_split_in_place_multi()`
  string-list: introduce `string_list_setlen()`
  t/helper/test-hashmap.c: avoid using `strtok()`
  t/helper/test-oidmap.c: avoid using `strtok()`
  t/helper/test-json-writer.c: avoid using `strtok()`
  banned.h: mark `strtok()` as banned

 banned.h                    |   2 +
 string-list.c               |  35 ++++++++++--
 string-list.h               |  17 ++++++
 t/helper/test-hashmap.c     |  30 ++++++++---
 t/helper/test-json-writer.c |  76 ++++++++++++++++----------
 t/helper/test-oidmap.c      |  20 ++++---
 t/helper/test-string-list.c |  15 ++++++
 t/t0063-string-list.sh      | 105 +++++++++++++++++++++++++-----------
 8 files changed, 224 insertions(+), 76 deletions(-)

Range-diff against v1:
1:  dda218c8c1 ! 1:  6658b231a9 string-list: introduce `string_list_split_in_place_multi()`
    @@ Commit message
         `_multi` variant splits the given string any any character appearing in
         the string `delim`.
     
    +    Like `strtok()`, the `_multi` variant skips past sequential delimiting
    +    characters. For example:
    +
    +        string_list_split_in_place(&xs, xstrdup("foo::bar::baz"), ":", -1);
    +
    +    would place in `xs` the elements "foo", "bar", and "baz".
    +
         Instead of using `strchr(2)` to locate the first occurrence of the given
         delimiter character, `string_list_split_in_place_multi()` uses
    -    `strpbrk(2)` to find the first occurrence of *any* character in the given
    -    delimiter string.
    +    `strcspn(2)` to move past the initial segment of characters comprised of
    +    any characters in the delimiting set.
    +
    +    When only a single delimiting character is provided, `strcspn(2)` has
    +    equivalent performance to `strchr(2)`. Modern `strcspn(2)`
    +    implementations treat an empty delimiter or the singleton delimiter as a
    +    special case and fall back to calling strchrnul(). Both glibc[1] and
    +    musl[2] implement `strcspn(2)` this way.
     
         Since the `_multi` variant is a generalization of the original
         implementation, reimplement `string_list_split_in_place()` in terms of
         the more general function by providing a single-character string for the
         list of accepted delimiters.
     
    +    To avoid regressions, update t0063 in this patch as well. Any "common"
    +    test cases (i.e., those that produce the same result whether you call
    +    `string_list_split()` or `string_list_split_in_place_multi()`) are
    +    grouped into a loop which is parameterized over the function to test.
    +
    +    Any cases which aren't common (of which there is one existing case, and
    +    a handful of new ones added which are specific to the `_multi` variant)
    +    are tested independently.
    +
    +    [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
    +    [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## string-list.c ##
    @@ string-list.c: int string_list_split(struct string_list *list, const char *strin
      
     -int string_list_split_in_place(struct string_list *list, char *string,
     -			       int delim, int maxsplit)
    -+int string_list_split_in_place_multi(struct string_list *list, char *string,
    -+				     const char *delim, int maxsplit)
    ++static int string_list_split_in_place_1(struct string_list *list, char *string,
    ++					const char *delim, int maxsplit,
    ++					unsigned runs)
      {
      	int count = 0;
      	char *p = string, *end;
     @@ string-list.c: int string_list_split_in_place(struct string_list *list, char *string,
    + 		die("internal error in string_list_split_in_place(): "
    + 		    "list->strdup_strings must not be set");
    + 	for (;;) {
    ++		if (runs)
    ++			p += strspn(p, delim);
    ++
    + 		count++;
    + 		if (maxsplit >= 0 && count > maxsplit) {
      			string_list_append(list, p);
      			return count;
      		}
     -		end = strchr(p, delim);
    -+		end = strpbrk(p, delim);
    - 		if (end) {
    +-		if (end) {
    ++		end = p + strcspn(p, delim);
    ++		if (end && *end) {
      			*end = '\0';
      			string_list_append(list, p);
    + 			p = end + 1;
     @@ string-list.c: int string_list_split_in_place(struct string_list *list, char *string,
      		}
      	}
      }
     +
    ++int string_list_split_in_place_multi(struct string_list *list, char *string,
    ++				     const char *delim, int maxsplit)
    ++{
    ++	return string_list_split_in_place_1(list, string, delim, maxsplit, 1);
    ++}
    ++
     +int string_list_split_in_place(struct string_list *list, char *string,
     +			       int delim, int maxsplit)
     +{
     +	char delim_s[2] = { delim, 0 };
     +
    -+	return string_list_split_in_place_multi(list, string, delim_s,
    -+						maxsplit);
    ++	return string_list_split_in_place_1(list, string, delim_s, maxsplit, 0);
     +}
     
      ## string-list.h ##
    @@ string-list.h: int string_list_split(struct string_list *list, const char *strin
     + *
     + * The "_multi" variant splits the given string on any character
     + * appearing in "delim", and the non-"_multi" variant splits only on the
    -+ * given character.
    ++ * given character. The "_multi" variant behaves like `strtok()` where
    ++ * no element contains the delimiting byte(s).
       */
     +int string_list_split_in_place_multi(struct string_list *list, char *string,
     +				     const char *delim, int maxsplit);
      int string_list_split_in_place(struct string_list *list, char *string,
      			       int delim, int maxsplit);
      #endif /* STRING_LIST_H */
    +
    + ## t/helper/test-string-list.c ##
    +@@ t/helper/test-string-list.c: int cmd__string_list(int argc, const char **argv)
    + 		return 0;
    + 	}
    + 
    ++	if (argc == 5 && !strcmp(argv[1], "split_in_place_multi")) {
    ++		struct string_list list = STRING_LIST_INIT_NODUP;
    ++		int i;
    ++		char *s = xstrdup(argv[2]);
    ++		const char *delim = argv[3];
    ++		int maxsplit = atoi(argv[4]);
    ++
    ++		i = string_list_split_in_place_multi(&list, s, delim, maxsplit);
    ++		printf("%d\n", i);
    ++		write_list(&list);
    ++		string_list_clear(&list, 0);
    ++		free(s);
    ++		return 0;
    ++	}
    ++
    + 	if (argc == 4 && !strcmp(argv[1], "filter")) {
    + 		/*
    + 		 * Retain only the items that have the specified prefix.
    +
    + ## t/t0063-string-list.sh ##
    +@@ t/t0063-string-list.sh: test_split () {
    + 	"
    + }
    + 
    +-test_split "foo:bar:baz" ":" "-1" <<EOF
    +-3
    +-[0]: "foo"
    +-[1]: "bar"
    +-[2]: "baz"
    +-EOF
    ++test_split_in_place_multi () {
    ++	cat >expected &&
    ++	test_expect_success "split_in_place_multi $1 at $2, max $3" "
    ++		test-tool string-list split_in_place_multi '$1' '$2' '$3' >actual &&
    ++		test_cmp expected actual
    ++	"
    ++}
    + 
    +-test_split "foo:bar:baz" ":" "0" <<EOF
    +-1
    +-[0]: "foo:bar:baz"
    +-EOF
    ++for test_fn in test_split test_split_in_place_multi
    ++do
    ++	$test_fn "foo:bar:baz" ":" "-1" <<-\EOF
    ++	3
    ++	[0]: "foo"
    ++	[1]: "bar"
    ++	[2]: "baz"
    ++	EOF
    + 
    +-test_split "foo:bar:baz" ":" "1" <<EOF
    +-2
    +-[0]: "foo"
    +-[1]: "bar:baz"
    +-EOF
    ++	$test_fn "foo:bar:baz" ":" "0" <<-\EOF
    ++	1
    ++	[0]: "foo:bar:baz"
    ++	EOF
    + 
    +-test_split "foo:bar:baz" ":" "2" <<EOF
    +-3
    +-[0]: "foo"
    +-[1]: "bar"
    +-[2]: "baz"
    +-EOF
    ++	$test_fn "foo:bar:baz" ":" "1" <<-\EOF
    ++	2
    ++	[0]: "foo"
    ++	[1]: "bar:baz"
    ++	EOF
    + 
    +-test_split "foo:bar:" ":" "-1" <<EOF
    +-3
    +-[0]: "foo"
    +-[1]: "bar"
    +-[2]: ""
    +-EOF
    ++	$test_fn "foo:bar:baz" ":" "2" <<-\EOF
    ++	3
    ++	[0]: "foo"
    ++	[1]: "bar"
    ++	[2]: "baz"
    ++	EOF
    + 
    +-test_split "" ":" "-1" <<EOF
    +-1
    +-[0]: ""
    +-EOF
    ++	$test_fn "foo:bar:" ":" "-1" <<-\EOF
    ++	3
    ++	[0]: "foo"
    ++	[1]: "bar"
    ++	[2]: ""
    ++	EOF
    ++
    ++	$test_fn "" ":" "-1" <<-\EOF
    ++	1
    ++	[0]: ""
    ++	EOF
    ++done
    + 
    + test_split ":" ":" "-1" <<EOF
    + 2
    +@@ t/t0063-string-list.sh: test_split ":" ":" "-1" <<EOF
    + [1]: ""
    + EOF
    + 
    ++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "-1" <<-\EOF
    ++3
    ++[0]: "foo"
    ++[1]: "bar"
    ++[2]: "baz"
    ++EOF
    ++
    ++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "0" <<-\EOF
    ++1
    ++[0]: "foo:;:bar:;:baz"
    ++EOF
    ++
    ++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "1" <<-\EOF
    ++2
    ++[0]: "foo"
    ++[1]: "bar:;:baz"
    ++EOF
    ++
    ++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "2" <<-\EOF
    ++3
    ++[0]: "foo"
    ++[1]: "bar"
    ++[2]: "baz"
    ++EOF
    ++
    ++test_split_in_place_multi "foo:;:bar:;:" ":;" "-1" <<-\EOF
    ++3
    ++[0]: "foo"
    ++[1]: "bar"
    ++[2]: ""
    ++EOF
    ++
    + test_expect_success "test filter_string_list" '
    + 	test "x-" = "x$(test-tool string-list filter - y)" &&
    + 	test "x-" = "x$(test-tool string-list filter no y)" &&
-:  ---------- > 2:  2a20ad8bc5 string-list: introduce `string_list_setlen()`
2:  0f199468a3 ! 3:  0ae07dec36 t/helper/test-hashmap.c: avoid using `strtok()`
    @@ t/helper/test-hashmap.c: int cmd__hashmap(int argc, const char **argv)
     +		 * By doing so, we'll instead overwrite the existing
     +		 * entries and avoid re-allocating.
     +		 */
    -+		parts.nr = 0;
    ++		string_list_setlen(&parts, 0);
      		/* break line into command and up to two parameters */
     -		cmd = strtok(line.buf, DELIM);
     +		string_list_split_in_place_multi(&parts, line.buf, DELIM, 2);
3:  135222d04e ! 4:  a659431e9c t/helper/test-oidmap.c: avoid using `strtok()`
    @@ t/helper/test-oidmap.c: int cmd__oidmap(int argc UNUSED, const char **argv UNUSE
      		struct object_id oid;
      
     +		/* see the comment in cmd__hashmap() */
    -+		parts.nr = 0;
    ++		string_list_setlen(&parts, 0);
      		/* break line into command and up to two parameters */
     -		cmd = strtok(line.buf, DELIM);
     +		string_list_split_in_place_multi(&parts, line.buf, DELIM, 2);
4:  ae29d4d892 ! 5:  fc6cd23698 t/helper/test-json-writer.c: avoid using `strtok()`
    @@ t/helper/test-json-writer.c: static int scripted(void)
      
     -		verb = strtok(line, " ");
     +		/* see the comment in cmd__hashmap() */
    -+		parts.nr = 0;
    ++		string_list_setlen(&parts, 0);
     +		/* break line into command and zero or more tokens */
     +		string_list_split_in_place(&parts, line, ' ', -1);
     +
5:  1d955f8bc6 ! 6:  56d2318a6d banned.h: mark `strtok()`, `strtok_r()` as banned
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    banned.h: mark `strtok()`, `strtok_r()` as banned
    +    banned.h: mark `strtok()` as banned
     
         `strtok_r()` is reentrant, but `strtok()` is not, meaning that using it
         is not thread-safe.
     
    -    We could ban `strtok()` and force callers to use its reentrant
    -    counterpart, but there are a few drawbacks to doing so:
    +    `strtok()` has a couple of drawbacks that make it undesirable to have
    +    any new instances. In addition to being thread-unsafe, it also
    +    encourages confusing data flows, where `strtok()` may be called from
    +    multiple functions with its first argument as NULL, making it unclear
    +    from the immediate context which string is being tokenized.
     
    -      - `strtok_r()` forces the caller to maintain an extra string pointer
    -        to pass as its `saveptr` value
    +    Now that we have removed all instances of `strtok()` from the tree,
    +    let's ban `strtok()` to avoid introducing new ones in the future. If new
    +    callers should arise, they can either use:
     
    -      - `strtok_r()` also requires that its `saveptr` value be unmodified
    -        between calls.
    +      - `string_list_split_in_place()`,
    +      - `string_list_split_in_place_multi()`, or
    +      - `strtok_r()`.
     
    -      - `strtok()` (and by extension, `strtok_r()`) is confusing when used
    -        across multiple functions, since the caller is supposed to pass NULL
    -        as its first argument after the first call. This makes it difficult
    -        to determine what string is actually being tokenized without clear
    -        dataflow.
    +    Callers are encouraged to use either of the string_list functions when
    +    appropriate over `strtok_r()`, since the latter suffers from the same
    +    confusing data-flow problem as `strtok()` does.
     
    -    So while we could ban only `strtok()`, there really is no good reason to
    -    use either when callers could instead use the much friendlier
    -    `string_list_split_in_place()` API, which avoids the above issues.
    +    But callers may prefer `strtok_r()` when the number of tokens in a given
    +    string is unknown, and they want to split and process them one at a
    +    time, so `strtok_r()` is left off the banned.h list.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ banned.h
      #define strncpy(x,y,n) BANNED(strncpy)
      #undef strncat
      #define strncat(x,y,n) BANNED(strncat)
    -+#if 0
     +#undef strtok
     +#define strtok(x,y) BANNED(strtok)
    -+#undef strtok_r
    -+#define strtok_r(x,y,z) BANNED(strtok_r)
    -+#endif
      
      #undef sprintf
      #undef vsprintf
-- 
2.40.0.358.g56d2318a6d

  parent reply	other threads:[~2023-04-18 19:20 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13 23:31 [PATCH 0/5] banned: mark `strok()`, `strtok_r()` as banned Taylor Blau
2023-04-13 23:31 ` [PATCH 1/5] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 10:10   ` Jeff King
2023-04-18 17:08     ` Taylor Blau
2023-04-13 23:31 ` [PATCH 2/5] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-18 10:23   ` Jeff King
2023-04-18 18:06     ` Taylor Blau
2023-04-13 23:31 ` [PATCH 3/5] t/helper/test-oidmap.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 4/5] t/helper/test-json-writer.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 5/5] banned.h: mark `strtok()`, `strtok_r()` as banned Taylor Blau
2023-04-14  1:39   ` Junio C Hamano
2023-04-14  2:08     ` Chris Torek
2023-04-14 13:41     ` Taylor Blau
2023-04-18 19:18 ` Taylor Blau [this message]
2023-04-18 19:18   ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 19:39     ` Junio C Hamano
2023-04-18 20:54       ` Taylor Blau
2023-04-22 11:12     ` Jeff King
2023-04-22 15:53       ` René Scharfe
2023-04-23  0:35         ` Jeff King
2023-04-24 16:24           ` Junio C Hamano
2023-04-23  2:38       ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()`t Taylor Blau
2023-04-23  2:40         ` Taylor Blau
2023-04-18 19:18   ` [PATCH v2 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-22 11:14     ` Jeff King
2023-04-18 19:18   ` [PATCH v2 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-22 11:16     ` Jeff King
2023-04-24 21:19       ` Taylor Blau
2023-04-18 19:18   ` [PATCH v2 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 6/6] banned.h: mark `strtok()` as banned Taylor Blau
2023-04-24 22:20 ` [PATCH v3 0/6] banned: mark `strok()`, `strtok_r()` " Taylor Blau
2023-04-24 22:20   ` [PATCH v3 1/6] string-list: multi-delimiter `string_list_split_in_place()` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-25  6:21     ` Jeff King
2023-04-25 21:00       ` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-24 22:20   ` [PATCH v3 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-25 13:57     ` Jeff Hostetler
2023-04-24 22:20   ` [PATCH v3 6/6] banned.h: mark `strtok()` and `strtok_r()` as banned Taylor Blau
2023-04-24 22:25     ` Chris Torek
2023-04-24 23:00       ` Taylor Blau
2023-04-25  6:26     ` Jeff King
2023-04-25 21:02       ` Taylor Blau
2023-04-25  6:27   ` [PATCH v3 0/6] banned: mark `strok()`, " Jeff King
2023-04-25 21:03     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1681845518.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=chris.torek@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.