* [PATCH 0/4] Add some string_list-related functions @ 2012-09-09 5:53 Michael Haggerty 2012-09-09 5:53 ` [PATCH 1/4] Add a new function, string_list_split_in_place() Michael Haggerty ` (3 more replies) 0 siblings, 4 replies; 23+ messages in thread From: Michael Haggerty @ 2012-09-09 5:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Michael Haggerty This patch series adds a few functions to the string_list API. They will be used in two upcoming patch series. Unfortunately, both of the series (which are otherwise logically independent) need the same function; therefore, I am submitting these string-list enhancements as a separate series on which the other two can depend. This patch series applies to current master. Michael Haggerty (4): Add a new function, string_list_split_in_place() Add a new function, filter_string_list() Add a new function, string_list_remove_duplicates() Add a function string_list_longest_prefix() .gitignore | 1 + Documentation/technical/api-string-list.txt | 32 ++++++++++ Makefile | 1 + string-list.c | 77 ++++++++++++++++++++++++ string-list.h | 41 +++++++++++++ t/t0063-string-list.sh | 93 +++++++++++++++++++++++++++++ test-string-list.c | 47 +++++++++++++++ 7 files changed, 292 insertions(+) create mode 100755 t/t0063-string-list.sh create mode 100644 test-string-list.c -- 1.7.11.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/4] Add a new function, string_list_split_in_place() 2012-09-09 5:53 [PATCH 0/4] Add some string_list-related functions Michael Haggerty @ 2012-09-09 5:53 ` Michael Haggerty 2012-09-09 9:35 ` Junio C Hamano 2012-09-09 5:53 ` [PATCH 2/4] Add a new function, filter_string_list() Michael Haggerty ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Michael Haggerty @ 2012-09-09 5:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Michael Haggerty Split a string into a string_list on a separator character. This is similar to the strbuf_split_*() functions except that it works with the more powerful string_list interface. If strdup_strings is false, it reuses the memory from the input string (thereby needing no string memory allocations, though of course allocations are still needed for the string_list_items array). Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- In the tests, I use here documents to specify the expected output. Is this OK? (It is certainly convenient.) .gitignore | 1 + Documentation/technical/api-string-list.txt | 12 ++++++ Makefile | 1 + string-list.c | 23 +++++++++++ string-list.h | 19 +++++++++ t/t0063-string-list.sh | 63 +++++++++++++++++++++++++++++ test-string-list.c | 25 ++++++++++++ 7 files changed, 144 insertions(+) create mode 100755 t/t0063-string-list.sh create mode 100644 test-string-list.c diff --git a/.gitignore b/.gitignore index bb5c91e..0ca7df8 100644 --- a/.gitignore +++ b/.gitignore @@ -193,6 +193,7 @@ /test-run-command /test-sha1 /test-sigchain +/test-string-list /test-subprocess /test-svn-fe /common-cmds.h diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt index 5a0c14f..3b959a2 100644 --- a/Documentation/technical/api-string-list.txt +++ b/Documentation/technical/api-string-list.txt @@ -124,6 +124,18 @@ counterpart for sorted lists, which performs a binary search. is set. The third parameter controls if the `util` pointer of the items should be freed or not. +`string_list_split_in_place`:: + + Split string into substrings on character delim and append the + substrings to a string_list. The delimiter characters in + string are overwritten with NULs in the process. If maxsplit + is a positive integer, then split at most maxsplit times. If + list.strdup_strings is not set, then the new string_list_items + point into string, which therefore must not be modified or + freed while the string_list is in use. Return the number of + substrings appended to the list. + + Data structures --------------- diff --git a/Makefile b/Makefile index 66e8216..ebbb381 100644 --- a/Makefile +++ b/Makefile @@ -501,6 +501,7 @@ TEST_PROGRAMS_NEED_X += test-run-command TEST_PROGRAMS_NEED_X += test-scrap-cache-tree TEST_PROGRAMS_NEED_X += test-sha1 TEST_PROGRAMS_NEED_X += test-sigchain +TEST_PROGRAMS_NEED_X += test-string-list TEST_PROGRAMS_NEED_X += test-subprocess TEST_PROGRAMS_NEED_X += test-svn-fe diff --git a/string-list.c b/string-list.c index d9810ab..110449c 100644 --- a/string-list.c +++ b/string-list.c @@ -194,3 +194,26 @@ void unsorted_string_list_delete_item(struct string_list *list, int i, int free_ list->items[i] = list->items[list->nr-1]; list->nr--; } + +int string_list_split_in_place(struct string_list *list, char *string, + int delim, int maxsplit) +{ + int count = 0; + char *p = string, *end; + for (;;) { + count++; + if (maxsplit > 0 && count > maxsplit) { + string_list_append(list, p); + return count; + } + end = strchr(p, delim); + if (end) { + *end = '\0'; + string_list_append(list, p); + p = end + 1; + } else { + string_list_append(list, p); + return count; + } + } +} diff --git a/string-list.h b/string-list.h index 0684cb7..7e51d03 100644 --- a/string-list.h +++ b/string-list.h @@ -45,4 +45,23 @@ int unsorted_string_list_has_string(struct string_list *list, const char *string struct string_list_item *unsorted_string_list_lookup(struct string_list *list, const char *string); void unsorted_string_list_delete_item(struct string_list *list, int i, int free_util); + +/* + * Split string into substrings on character delim and append the + * substrings to list. The delimiter characters in string are + * overwritten with NULs in the process. If maxsplit is a positive + * integer, then split at most maxsplit times. If list.strdup_strings + * is not set, then the new string_list_items point into string, which + * therefore must not be modified or freed while the string_list + * is in use. Return the number of substrings appended to list. + * + * Examples: + * string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"] + * string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"] + * string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""] + * string_list_split_in_place(l, "", ':', -1) -> [""] + * string_list_split_in_place(l, ":", ':', -1) -> ["", ""] + */ +int string_list_split_in_place(struct string_list *list, char *string, + int delim, int maxsplit); #endif /* STRING_LIST_H */ diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh new file mode 100755 index 0000000..0eede83 --- /dev/null +++ b/t/t0063-string-list.sh @@ -0,0 +1,63 @@ +#!/bin/sh +# +# Copyright (c) 2012 Michael Haggerty +# + +test_description='Test string list functionality' + +. ./test-lib.sh + +string_list_split_in_place() { + cat >split-expected && + test_expect_success "split $1 $2 $3" " + test-string-list split_in_place '$1' '$2' '$3' >split-actual && + test_cmp split-expected split-actual + " +} + +string_list_split_in_place "foo:bar:baz" ":" "-1" <<EOF +3 +[0]: "foo" +[1]: "bar" +[2]: "baz" +EOF + +string_list_split_in_place "foo:bar:baz" ":" "0" <<EOF +3 +[0]: "foo" +[1]: "bar" +[2]: "baz" +EOF + +string_list_split_in_place "foo:bar:baz" ":" "1" <<EOF +2 +[0]: "foo" +[1]: "bar:baz" +EOF + +string_list_split_in_place "foo:bar:baz" ":" "2" <<EOF +3 +[0]: "foo" +[1]: "bar" +[2]: "baz" +EOF + +string_list_split_in_place "foo:bar:" ":" "-1" <<EOF +3 +[0]: "foo" +[1]: "bar" +[2]: "" +EOF + +string_list_split_in_place "" ":" "-1" <<EOF +1 +[0]: "" +EOF + +string_list_split_in_place ":" ":" "-1" <<EOF +2 +[0]: "" +[1]: "" +EOF + +test_done diff --git a/test-string-list.c b/test-string-list.c new file mode 100644 index 0000000..f08d3cc --- /dev/null +++ b/test-string-list.c @@ -0,0 +1,25 @@ +#include "cache.h" +#include "string-list.h" + +int main(int argc, char **argv) +{ + if ((argc == 4 || argc == 5) && !strcmp(argv[1], "split_in_place")) { + struct string_list list = STRING_LIST_INIT_NODUP; + int i; + char *s = xstrdup(argv[2]); + int delim = *argv[3]; + int maxsplit = (argc == 5) ? atoi(argv[4]) : -1; + + i = string_list_split_in_place(&list, s, delim, maxsplit); + printf("%d\n", i); + for (i = 0; i < list.nr; i++) + printf("[%d]: \"%s\"\n", i, list.items[i].string); + string_list_clear(&list, 0); + free(s); + return 0; + } + + fprintf(stderr, "%s: unknown function name: %s\n", argv[0], + argv[1] ? argv[1] : "(there was none)"); + return 1; +} -- 1.7.11.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/4] Add a new function, string_list_split_in_place() 2012-09-09 5:53 ` [PATCH 1/4] Add a new function, string_list_split_in_place() Michael Haggerty @ 2012-09-09 9:35 ` Junio C Hamano 2012-09-10 4:45 ` Michael Haggerty 0 siblings, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2012-09-09 9:35 UTC (permalink / raw) To: Michael Haggerty; +Cc: git Michael Haggerty <mhagger@alum.mit.edu> writes: > Split a string into a string_list on a separator character. > > This is similar to the strbuf_split_*() functions except that it works > with the more powerful string_list interface. If strdup_strings is > false, it reuses the memory from the input string (thereby needing no > string memory allocations, though of course allocations are still > needed for the string_list_items array). > > Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> > --- > > In the tests, I use here documents to specify the expected output. Is > this OK? (It is certainly convenient.) I offhand do not have an objection to that style, but it looked funny to see the helper test function "string_list_split_in_place" named without "test_" prefix. Maybe it's just me. > diff --git a/.gitignore b/.gitignore > index bb5c91e..0ca7df8 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -193,6 +193,7 @@ > /test-run-command > /test-sha1 > /test-sigchain > +/test-string-list > /test-subprocess > /test-svn-fe > /common-cmds.h > diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt > index 5a0c14f..3b959a2 100644 > --- a/Documentation/technical/api-string-list.txt > +++ b/Documentation/technical/api-string-list.txt > @@ -124,6 +124,18 @@ counterpart for sorted lists, which performs a binary search. > is set. The third parameter controls if the `util` pointer of the > items should be freed or not. > > +`string_list_split_in_place`:: > + > + Split string into substrings on character delim and append the > + substrings to a string_list. The delimiter characters in > + string are overwritten with NULs in the process. If maxsplit > + is a positive integer, then split at most maxsplit times. If So passing 0 is the natural way to say "pay attention to all the delimiters", which matches my intuition. > + list.strdup_strings is not set, then the new string_list_items > + point into string, which therefore must not be modified or > + freed while the string_list is in use. Return the number of > + substrings appended to the list. I am not sure about this strdup_strings business; it smells somewhat fishy from the API design point of view. If you are not mucking with the input string and not splitting in place, it would not be possible to do this without strdup_strings, but if you are doing the in-place splitting, is there any reason for the caller to ask for strdup_strings? In such a case, the reason the caller cannot promise that the input string will not go away to the string_list (hence it has to ask to make a copy) is because it does not own the string in the first place, and in such a case, I would imagine it cannot allow the delimiters in the string to be overwritten with NULs. I would sort-of-kind-of understand a function "string_list_split" that bases its decision to do an in-place splitting or not on the strdup_strings flag in the string list that was passed in. But it would make the use of the function a bit limited (e.g. you cannot sanely mix and match tokens from different kind of strings). > diff --git a/string-list.c b/string-list.c > index d9810ab..110449c 100644 > --- a/string-list.c > +++ b/string-list.c > @@ -194,3 +194,26 @@ void unsorted_string_list_delete_item(struct string_list *list, int i, int free_ > list->items[i] = list->items[list->nr-1]; > list->nr--; > } > + > +int string_list_split_in_place(struct string_list *list, char *string, > + int delim, int maxsplit) > +{ > + int count = 0; > + char *p = string, *end; Blank line here. > + for (;;) { > + count++; > + if (maxsplit > 0 && count > maxsplit) { > + string_list_append(list, p); > + return count; > + } > + end = strchr(p, delim); > + if (end) { > + *end = '\0'; > + string_list_append(list, p); > + p = end + 1; > + } else { > + string_list_append(list, p); > + return count; > + } > + } > +} > diff --git a/string-list.h b/string-list.h > index 0684cb7..7e51d03 100644 > --- a/string-list.h > +++ b/string-list.h > @@ -45,4 +45,23 @@ int unsorted_string_list_has_string(struct string_list *list, const char *string > struct string_list_item *unsorted_string_list_lookup(struct string_list *list, > const char *string); > void unsorted_string_list_delete_item(struct string_list *list, int i, int free_util); > + > +/* > + * Split string into substrings on character delim and append the > + * substrings to list. The delimiter characters in string are > + * overwritten with NULs in the process. If maxsplit is a positive > + * integer, then split at most maxsplit times. If list.strdup_strings > + * is not set, then the new string_list_items point into string, which > + * therefore must not be modified or freed while the string_list > + * is in use. Return the number of substrings appended to list. > + * > + * Examples: > + * string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"] > + * string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"] > + * string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""] I would find it more natural to see a sentinel value against "positive" to be 0, not -1. "-1" gives an impression as if "-2" might do something different from "-1", but Zero is a lot more special. > diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh > new file mode 100755 > index 0000000..0eede83 > --- /dev/null > +++ b/t/t0063-string-list.sh > @@ -0,0 +1,63 @@ > +#!/bin/sh > +# > +# Copyright (c) 2012 Michael Haggerty > +# > + > +test_description='Test string list functionality' > + > +. ./test-lib.sh > + > +string_list_split_in_place() { SP before ()? > + cat >split-expected && > + test_expect_success "split $1 $2 $3" " "split '$1' at '$2', max $3", or something? > + test-string-list split_in_place '$1' '$2' '$3' >split-actual && > + test_cmp split-expected split-actual > + " > +} What is it buying us to use "split-" before expected and actual to make things "unusual" from many other tests? > +string_list_split_in_place "foo:bar:baz" ":" "-1" <<EOF Likewise about "-1" (I think your test helper does not even require "-1" to be spelled out here). > +3 > +[0]: "foo" > +[1]: "bar" > +[2]: "baz" > +EOF > + > +string_list_split_in_place "foo:bar:baz" ":" "0" <<EOF > +3 > +[0]: "foo" > +[1]: "bar" > +[2]: "baz" > +EOF > + > +string_list_split_in_place "foo:bar:baz" ":" "1" <<EOF > +2 > +[0]: "foo" > +[1]: "bar:baz" > +EOF > + > +string_list_split_in_place "foo:bar:baz" ":" "2" <<EOF > +3 > +[0]: "foo" > +[1]: "bar" > +[2]: "baz" > +EOF > + > +string_list_split_in_place "foo:bar:" ":" "-1" <<EOF > +3 > +[0]: "foo" > +[1]: "bar" > +[2]: "" > +EOF > + > +string_list_split_in_place "" ":" "-1" <<EOF > +1 > +[0]: "" > +EOF > + > +string_list_split_in_place ":" ":" "-1" <<EOF > +2 > +[0]: "" > +[1]: "" > +EOF > + > +test_done > diff --git a/test-string-list.c b/test-string-list.c > new file mode 100644 > index 0000000..f08d3cc > --- /dev/null > +++ b/test-string-list.c > @@ -0,0 +1,25 @@ > +#include "cache.h" > +#include "string-list.h" > + > +int main(int argc, char **argv) > +{ > + if ((argc == 4 || argc == 5) && !strcmp(argv[1], "split_in_place")) { > + struct string_list list = STRING_LIST_INIT_NODUP; > + int i; > + char *s = xstrdup(argv[2]); > + int delim = *argv[3]; > + int maxsplit = (argc == 5) ? atoi(argv[4]) : -1; Likewise on "-1". > + i = string_list_split_in_place(&list, s, delim, maxsplit); > + printf("%d\n", i); > + for (i = 0; i < list.nr; i++) > + printf("[%d]: \"%s\"\n", i, list.items[i].string); > + string_list_clear(&list, 0); > + free(s); > + return 0; > + } > + > + fprintf(stderr, "%s: unknown function name: %s\n", argv[0], > + argv[1] ? argv[1] : "(there was none)"); > + return 1; > +} ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/4] Add a new function, string_list_split_in_place() 2012-09-09 9:35 ` Junio C Hamano @ 2012-09-10 4:45 ` Michael Haggerty 2012-09-10 5:47 ` Junio C Hamano 0 siblings, 1 reply; 23+ messages in thread From: Michael Haggerty @ 2012-09-10 4:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 09/09/2012 11:35 AM, Junio C Hamano wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > >> Split a string into a string_list on a separator character. >> >> This is similar to the strbuf_split_*() functions except that it works >> with the more powerful string_list interface. If strdup_strings is >> false, it reuses the memory from the input string (thereby needing no >> string memory allocations, though of course allocations are still >> needed for the string_list_items array). >> >> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> >> --- >> >> In the tests, I use here documents to specify the expected output. Is >> this OK? (It is certainly convenient.) > > I offhand do not have an objection to that style, but it looked > funny to see the helper test function "string_list_split_in_place" > named without "test_" prefix. Maybe it's just me. I renamed the test function to "test_split_in_place". (The "string_list" part can be dispensed with in a test called t0063-string-list.sh, I think.) >> diff --git a/.gitignore b/.gitignore >> index bb5c91e..0ca7df8 100644 >> --- a/.gitignore >> +++ b/.gitignore >> @@ -193,6 +193,7 @@ >> /test-run-command >> /test-sha1 >> /test-sigchain >> +/test-string-list >> /test-subprocess >> /test-svn-fe >> /common-cmds.h >> diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt >> index 5a0c14f..3b959a2 100644 >> --- a/Documentation/technical/api-string-list.txt >> +++ b/Documentation/technical/api-string-list.txt >> @@ -124,6 +124,18 @@ counterpart for sorted lists, which performs a binary search. >> is set. The third parameter controls if the `util` pointer of the >> items should be freed or not. >> >> +`string_list_split_in_place`:: >> + >> + Split string into substrings on character delim and append the >> + substrings to a string_list. The delimiter characters in >> + string are overwritten with NULs in the process. If maxsplit >> + is a positive integer, then split at most maxsplit times. If > > So passing 0 is the natural way to say "pay attention to all the > delimiters", which matches my intuition. See below. >> + list.strdup_strings is not set, then the new string_list_items >> + point into string, which therefore must not be modified or >> + freed while the string_list is in use. Return the number of >> + substrings appended to the list. > > I am not sure about this strdup_strings business; it smells somewhat > fishy from the API design point of view. > > If you are not mucking with the input string and not splitting in > place, it would not be possible to do this without strdup_strings, > but if you are doing the in-place splitting, is there any reason for > the caller to ask for strdup_strings? > > In such a case, the reason the caller cannot promise that the input > string will not go away to the string_list (hence it has to ask to > make a copy) is because it does not own the string in the first > place, and in such a case, I would imagine it cannot allow the > delimiters in the string to be overwritten with NULs. On one level it is immaterial whether this is a sensible usage; the caller *can* set strdup_strings on any string_list and pass that string_list into the function. In that case the function *has* to strdup the strings because when strdup_strings is set, the string_list functions are allowed to free() a string whenever they want. But there are situations when this usage is also convenient; namely when the lifetime of the string being split is shorter than the lifetime of the string_list. Consider something like struct string_list *split_file_into_words(FILE *f) { char buf[1024]; struct string_list *list = new string list; list->strdup_strings = 1; while (not EOF) { read_line_into_buf(); string_list_split_in_place(list, buf, ' ', -1); } return list; } Consider also that the string_list passed to string_list_split_in_place() might already have contents from elsewhere. Since a string_list has a single allocation policy covering all of its entries, it might be more convenient for the caller to allow the split strings to be copied than to change the allocation policy of the pre-inserted strings. > I would sort-of-kind-of understand a function "string_list_split" > that bases its decision to do an in-place splitting or not on the > strdup_strings flag in the string list that was passed in. But it > would make the use of the function a bit limited (e.g. you cannot > sanely mix and match tokens from different kind of strings). Here is my thinking: I want to have a very low overhead way to use string_list, without a memory allocation per string (so that people don't have an excuse to avoid the API). Thus the "in-place" option. It would be easy (and have no run-time cost) to change the split function to *not* modify the input string if strdup_strings is set on the string_list. But (1) I don't think it is a good idea for the handling of the string argument to depend on an option buried in another parameters, and (2) the string argument could not be declared "const", so many situations when this variant were useful would require const to be cast away. So if/when such a need arises, I think it would be better to invent a new string_list_split() variant that *never* overwrites its input string (though this function could *only* work correctly if strdup_strings is set). >> diff --git a/string-list.c b/string-list.c >> index d9810ab..110449c 100644 >> --- a/string-list.c >> +++ b/string-list.c >> @@ -194,3 +194,26 @@ void unsorted_string_list_delete_item(struct string_list *list, int i, int free_ >> list->items[i] = list->items[list->nr-1]; >> list->nr--; >> } >> + >> +int string_list_split_in_place(struct string_list *list, char *string, >> + int delim, int maxsplit) >> +{ >> + int count = 0; >> + char *p = string, *end; > > Blank line here. Thanks. >> + for (;;) { >> + count++; >> + if (maxsplit > 0 && count > maxsplit) { >> + string_list_append(list, p); >> + return count; >> + } >> + end = strchr(p, delim); >> + if (end) { >> + *end = '\0'; >> + string_list_append(list, p); >> + p = end + 1; >> + } else { >> + string_list_append(list, p); >> + return count; >> + } >> + } >> +} >> diff --git a/string-list.h b/string-list.h >> index 0684cb7..7e51d03 100644 >> --- a/string-list.h >> +++ b/string-list.h >> @@ -45,4 +45,23 @@ int unsorted_string_list_has_string(struct string_list *list, const char *string >> struct string_list_item *unsorted_string_list_lookup(struct string_list *list, >> const char *string); >> void unsorted_string_list_delete_item(struct string_list *list, int i, int free_util); >> + >> +/* >> + * Split string into substrings on character delim and append the >> + * substrings to list. The delimiter characters in string are >> + * overwritten with NULs in the process. If maxsplit is a positive >> + * integer, then split at most maxsplit times. If list.strdup_strings >> + * is not set, then the new string_list_items point into string, which >> + * therefore must not be modified or freed while the string_list >> + * is in use. Return the number of substrings appended to list. >> + * >> + * Examples: >> + * string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"] >> + * string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"] >> + * string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""] > > I would find it more natural to see a sentinel value against > "positive" to be 0, not -1. "-1" gives an impression as if "-2" > might do something different from "-1", but Zero is a lot more > special. You have raised a good point and I think there is a flaw in the API, but I'm not sure I agree with you what the flaw is... The "maxsplit" argument limits the number of times the string should be split. I.e., if maxsplit is set, then the output will have at most (maxsplit + 1) strings. When I used "-1" as the special value for "unlimited", I was subconsciously thinking that "0" is an imaginable edge case; it could mean "don't split string at all"; i.e., return the whole input as the single string in the output string_list. This would be a kindof silly usage, but might perhaps remove the need to handle 0 specially at a caller if "maxsplit" is derived from another source. But of course my definition of string_list_split_in_place() *doesn't* treat 0 this way. I think *that* was my mistake. So I propose to change the meaning of maxsplit to If maxsplit is a non-negative integer, then split at most maxsplit times. and to continue using -1 as the "special" value for unlimited splits. Are you OK with that? >> diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh >> new file mode 100755 >> index 0000000..0eede83 >> --- /dev/null >> +++ b/t/t0063-string-list.sh >> @@ -0,0 +1,63 @@ >> +#!/bin/sh >> +# >> +# Copyright (c) 2012 Michael Haggerty >> +# >> + >> +test_description='Test string list functionality' >> + >> +. ./test-lib.sh >> + >> +string_list_split_in_place() { > > SP before ()? Thanks. >> + cat >split-expected && >> + test_expect_success "split $1 $2 $3" " > > "split '$1' at '$2', max $3", or something? Good idea. >> + test-string-list split_in_place '$1' '$2' '$3' >split-actual && >> + test_cmp split-expected split-actual >> + " >> +} > > What is it buying us to use "split-" before expected and actual to > make things "unusual" from many other tests? Nothing, I guess. Changed. >> +string_list_split_in_place "foo:bar:baz" ":" "-1" <<EOF > > Likewise about "-1" (I think your test helper does not even require > "-1" to be spelled out here). Since we're testing the C API I think it's best that the tests be explicit about the value to be passed to the function. Therefore I guess I will change the test program to *require* the maxsplit option, since testing is its only purpose. >> +3 >> +[0]: "foo" >> +[1]: "bar" >> +[2]: "baz" >> +EOF >> + >> +string_list_split_in_place "foo:bar:baz" ":" "0" <<EOF >> +3 >> +[0]: "foo" >> +[1]: "bar" >> +[2]: "baz" >> +EOF >> + >> +string_list_split_in_place "foo:bar:baz" ":" "1" <<EOF >> +2 >> +[0]: "foo" >> +[1]: "bar:baz" >> +EOF >> + >> +string_list_split_in_place "foo:bar:baz" ":" "2" <<EOF >> +3 >> +[0]: "foo" >> +[1]: "bar" >> +[2]: "baz" >> +EOF >> + >> +string_list_split_in_place "foo:bar:" ":" "-1" <<EOF >> +3 >> +[0]: "foo" >> +[1]: "bar" >> +[2]: "" >> +EOF >> + >> +string_list_split_in_place "" ":" "-1" <<EOF >> +1 >> +[0]: "" >> +EOF >> + >> +string_list_split_in_place ":" ":" "-1" <<EOF >> +2 >> +[0]: "" >> +[1]: "" >> +EOF >> + >> +test_done >> diff --git a/test-string-list.c b/test-string-list.c >> new file mode 100644 >> index 0000000..f08d3cc >> --- /dev/null >> +++ b/test-string-list.c >> @@ -0,0 +1,25 @@ >> +#include "cache.h" >> +#include "string-list.h" >> + >> +int main(int argc, char **argv) >> +{ >> + if ((argc == 4 || argc == 5) && !strcmp(argv[1], "split_in_place")) { >> + struct string_list list = STRING_LIST_INIT_NODUP; >> + int i; >> + char *s = xstrdup(argv[2]); >> + int delim = *argv[3]; >> + int maxsplit = (argc == 5) ? atoi(argv[4]) : -1; > > Likewise on "-1". > >> + i = string_list_split_in_place(&list, s, delim, maxsplit); >> + printf("%d\n", i); >> + for (i = 0; i < list.nr; i++) >> + printf("[%d]: \"%s\"\n", i, list.items[i].string); >> + string_list_clear(&list, 0); >> + free(s); >> + return 0; >> + } >> + >> + fprintf(stderr, "%s: unknown function name: %s\n", argv[0], >> + argv[1] ? argv[1] : "(there was none)"); >> + return 1; >> +} Thanks for your helpful feedback! I will send a revised patch series as soon as I can. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/4] Add a new function, string_list_split_in_place() 2012-09-10 4:45 ` Michael Haggerty @ 2012-09-10 5:47 ` Junio C Hamano 2012-09-10 11:48 ` Michael Haggerty 0 siblings, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2012-09-10 5:47 UTC (permalink / raw) To: Michael Haggerty; +Cc: git Michael Haggerty <mhagger@alum.mit.edu> writes: > ... Consider something like > > struct string_list *split_file_into_words(FILE *f) > { > char buf[1024]; > struct string_list *list = new string list; > list->strdup_strings = 1; > while (not EOF) { > read_line_into_buf(); > string_list_split_in_place(list, buf, ' ', -1); > } > return list; > } That is a prime example to argue that string_list_split() would make more sense, no? The caller does _not_ mind if the function mucks with buf, but the resulting list is not allowed to point into buf. In such a case, the caller shouldn't have to _care_ if it wants to allow buf to be mucked with; it is already asking that the resulting list _not_ point into buf by setting strdup_strings (by the way, that is part of the function input, so think of it like various *opt variables passed into functions to tweak their behaviour). If the implementation can do so without sacrificing performance (and in this case, as you said, it can), it should take "const char *buf". The above caller shouldn't have to choose between sl_split() and sl_split_in_place(), in other words. So it appears to me that sl_split_in_place(), if implemented, should be kept as a special case for performance-minded callers that have full control of the lifetime rules of the variables they use, can set strdup_strings to false, and can let buf modified in place, and can accept list that point into buf. >>> + * Examples: >>> + * string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"] >>> + * string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"] >>> + * string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""] >> >> I would find it more natural to see a sentinel value against >> "positive" to be 0, not -1. "-1" gives an impression as if "-2" >> might do something different from "-1", but Zero is a lot more >> special. > > You have raised a good point and I think there is a flaw in the API, but > I'm not sure I agree with you what the flaw is... > > The "maxsplit" argument limits the number of times the string should be > split. I.e., if maxsplit is set, then the output will have at most > (maxsplit + 1) strings. So "do not split, just give me the whole thing" would be maxsplit == 0 to split into (maxsplit+1) == 1 string. I think we are in agreement that your "-1" does not make any sense, and your documentation that said "positive" is the saner thing to do, no? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/4] Add a new function, string_list_split_in_place() 2012-09-10 5:47 ` Junio C Hamano @ 2012-09-10 11:48 ` Michael Haggerty 2012-09-10 16:09 ` Junio C Hamano 0 siblings, 1 reply; 23+ messages in thread From: Michael Haggerty @ 2012-09-10 11:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 09/10/2012 07:47 AM, Junio C Hamano wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > >> ... Consider something like >> >> struct string_list *split_file_into_words(FILE *f) >> { >> char buf[1024]; >> struct string_list *list = new string list; >> list->strdup_strings = 1; >> while (not EOF) { >> read_line_into_buf(); >> string_list_split_in_place(list, buf, ' ', -1); >> } >> return list; >> } > > That is a prime example to argue that string_list_split() would make > more sense, no? The caller does _not_ mind if the function mucks > with buf, but the resulting list is not allowed to point into buf. > > In such a case, the caller shouldn't have to _care_ if it wants to > allow buf to be mucked with; it is already asking that the resulting > list _not_ point into buf by setting strdup_strings (by the way, > that is part of the function input, so think of it like various *opt > variables passed into functions to tweak their behaviour). If the > implementation can do so without sacrificing performance (and in > this case, as you said, it can), it should take "const char *buf". You're right, I was thinking that a caller of string_list_split_in_place() could choose to remain ignorant of whether strdup_strings is set, but this is incorrect because it needs to know whether to do its own memory management of the strings that it passes into string_list_append(). > So it appears to me that sl_split_in_place(), if implemented, should > be kept as a special case for performance-minded callers that have > full control of the lifetime rules of the variables they use, can > set strdup_strings to false, and can let buf modified in place, and > can accept list that point into buf. OK, so the bottom line would be to have two versions of the function. One takes a (const char *) and *requires* strdup_strings to be set on its input list: int string_list_split(struct string_list *list, const char *string, int delim, int maxsplit) { assert(list->strdup_strings); ... } The other takes a (char *) and modifies it in-place, and maybe even requires strdup_strings to be false on its input list: int string_list_split_in_place(struct string_list *list, char *string, int delim, int maxsplit) { /* not an error per se but a strong suggestion of one: */ assert(!list->strdup_strings); ... } (The latter (modulo assert) is the one that I have implemented, but it might not be needed immediately.) Do you agree? >>>> + * Examples: >>>> + * string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"] >>>> + * string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"] >>>> + * string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""] >>> >>> I would find it more natural to see a sentinel value against >>> "positive" to be 0, not -1. "-1" gives an impression as if "-2" >>> might do something different from "-1", but Zero is a lot more >>> special. >> >> You have raised a good point and I think there is a flaw in the API, but >> I'm not sure I agree with you what the flaw is... >> >> The "maxsplit" argument limits the number of times the string should be >> split. I.e., if maxsplit is set, then the output will have at most >> (maxsplit + 1) strings. > > So "do not split, just give me the whole thing" would be maxsplit == 0 > to split into (maxsplit+1) == 1 string. I think we are in agreement > that your "-1" does not make any sense, and your documentation that > said "positive" is the saner thing to do, no? No. I think that my handling of maxsplit=0 was incorrect but that we should continue using -1 as the special value. I see maxsplit=0 as a legitimate degenerate case meaning "split into 1 string". Granted, it would only be useful in specialized situations [1]. Moreover, "-1" makes a much more obvious special value than "0"; somebody reading code with maxsplit=-1 knows immediately that this is a special value, whereas the handling of maxsplit=0 isn't quite so blindingly obvious unless the reader knows the outcome of our current discussion :-) Therefore I still prefer treating only negative values of maxsplit to mean "unlimited" and fixing maxsplit=0 as described. But if you insist on the other convention, let me know and I will change it. Michael [1] A case I can think of would be parsing a format like NUMPARENTS [PARENT...] SUMMARY where "string_list_split(list, rest_of_line, ' ', numparents)" does the right thing even if numparents==0. -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/4] Add a new function, string_list_split_in_place() 2012-09-10 11:48 ` Michael Haggerty @ 2012-09-10 16:09 ` Junio C Hamano 0 siblings, 0 replies; 23+ messages in thread From: Junio C Hamano @ 2012-09-10 16:09 UTC (permalink / raw) To: Michael Haggerty; +Cc: git Michael Haggerty <mhagger@alum.mit.edu> writes: > OK, so the bottom line would be to have two versions of the function. > One takes a (const char *) and *requires* strdup_strings to be set on > its input list: > > int string_list_split(struct string_list *list, const char *string, > int delim, int maxsplit) > { > assert(list->strdup_strings); > ... > } > > The other takes a (char *) and modifies it in-place, and maybe even > requires strdup_strings to be false on its input list: > > int string_list_split_in_place(struct string_list *list, char *string, > int delim, int maxsplit) > { > /* not an error per se but a strong suggestion of one: */ > assert(!list->strdup_strings); > ... > } > > (The latter (modulo assert) is the one that I have implemented, but it > might not be needed immediately.) Do you agree? OK; I do not offhand know which one you immediately needed, but I think that is a sensible way to structure the API. > [1] A case I can think of would be parsing a format like > > NUMPARENTS [PARENT...] SUMMARY > > where "string_list_split(list, rest_of_line, ' ', numparents)" does the > right thing even if numparents==0. OK. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/4] Add a new function, filter_string_list() 2012-09-09 5:53 [PATCH 0/4] Add some string_list-related functions Michael Haggerty 2012-09-09 5:53 ` [PATCH 1/4] Add a new function, string_list_split_in_place() Michael Haggerty @ 2012-09-09 5:53 ` Michael Haggerty 2012-09-09 9:40 ` Junio C Hamano 2012-09-09 5:53 ` [PATCH 3/4] Add a new function, string_list_remove_duplicates() Michael Haggerty 2012-09-09 5:53 ` [PATCH 4/4] Add a function string_list_longest_prefix() Michael Haggerty 3 siblings, 1 reply; 23+ messages in thread From: Michael Haggerty @ 2012-09-09 5:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Michael Haggerty Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- Documentation/technical/api-string-list.txt | 8 ++++++++ string-list.c | 17 +++++++++++++++++ string-list.h | 9 +++++++++ 3 files changed, 34 insertions(+) diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt index 3b959a2..15b8072 100644 --- a/Documentation/technical/api-string-list.txt +++ b/Documentation/technical/api-string-list.txt @@ -60,6 +60,14 @@ Functions * General ones (works with sorted and unsorted lists as well) +`filter_string_list`:: + + Apply a function to each item in a list, retaining only the + items for which the function returns true. If free_util is + true, call free() on the util members of any items that have + to be deleted. Preserve the order of the items that are + retained. + `print_string_list`:: Dump a string_list to stdout, useful mainly for debugging purposes. It diff --git a/string-list.c b/string-list.c index 110449c..72610ce 100644 --- a/string-list.c +++ b/string-list.c @@ -102,6 +102,23 @@ int for_each_string_list(struct string_list *list, return ret; } +void filter_string_list(struct string_list *list, int free_util, + string_list_each_func_t fn, void *cb_data) +{ + int src, dst = 0; + for (src = 0; src < list->nr; src++) { + if (fn(&list->items[src], cb_data)) { + list->items[dst++] = list->items[src]; + } else { + if (list->strdup_strings) + free(list->items[src].string); + if (free_util) + free(list->items[src].util); + } + } + list->nr = dst; +} + void string_list_clear(struct string_list *list, int free_util) { if (list->items) { diff --git a/string-list.h b/string-list.h index 7e51d03..84996aa 100644 --- a/string-list.h +++ b/string-list.h @@ -29,6 +29,15 @@ int for_each_string_list(struct string_list *list, #define for_each_string_list_item(item,list) \ for (item = (list)->items; item < (list)->items + (list)->nr; ++item) +/* + * Apply fn to each item in list, retaining only the ones for which + * the function returns true. If free_util is true, call free() on + * the util members of any items that have to be deleted. Preserve + * the order of the items that are retained. + */ +void filter_string_list(struct string_list *list, int free_util, + string_list_each_func_t fn, void *cb_data); + /* Use these functions only on sorted lists: */ int string_list_has_string(const struct string_list *list, const char *string); int string_list_find_insert_index(const struct string_list *list, const char *string, -- 1.7.11.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] Add a new function, filter_string_list() 2012-09-09 5:53 ` [PATCH 2/4] Add a new function, filter_string_list() Michael Haggerty @ 2012-09-09 9:40 ` Junio C Hamano 2012-09-10 8:58 ` Michael Haggerty 0 siblings, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2012-09-09 9:40 UTC (permalink / raw) To: Michael Haggerty; +Cc: git Michael Haggerty <mhagger@alum.mit.edu> writes: > Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> > --- > Documentation/technical/api-string-list.txt | 8 ++++++++ > string-list.c | 17 +++++++++++++++++ > string-list.h | 9 +++++++++ > 3 files changed, 34 insertions(+) > > diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt > index 3b959a2..15b8072 100644 > --- a/Documentation/technical/api-string-list.txt > +++ b/Documentation/technical/api-string-list.txt > @@ -60,6 +60,14 @@ Functions > > * General ones (works with sorted and unsorted lists as well) > > +`filter_string_list`:: > + > + Apply a function to each item in a list, retaining only the > + items for which the function returns true. If free_util is > + true, call free() on the util members of any items that have > + to be deleted. Preserve the order of the items that are > + retained. In other words, this can safely be used on both sorted and unsorted string list. Good. > `print_string_list`:: > > Dump a string_list to stdout, useful mainly for debugging purposes. It > diff --git a/string-list.c b/string-list.c > index 110449c..72610ce 100644 > --- a/string-list.c > +++ b/string-list.c > @@ -102,6 +102,23 @@ int for_each_string_list(struct string_list *list, > return ret; > } > > +void filter_string_list(struct string_list *list, int free_util, > + string_list_each_func_t fn, void *cb_data) > +{ > + int src, dst = 0; > + for (src = 0; src < list->nr; src++) { > + if (fn(&list->items[src], cb_data)) { > + list->items[dst++] = list->items[src]; > + } else { > + if (list->strdup_strings) > + free(list->items[src].string); > + if (free_util) > + free(list->items[src].util); > + } > + } > + list->nr = dst; > +} > + > void string_list_clear(struct string_list *list, int free_util) > { > if (list->items) { > diff --git a/string-list.h b/string-list.h > index 7e51d03..84996aa 100644 > --- a/string-list.h > +++ b/string-list.h > @@ -29,6 +29,15 @@ int for_each_string_list(struct string_list *list, > #define for_each_string_list_item(item,list) \ > for (item = (list)->items; item < (list)->items + (list)->nr; ++item) > > +/* > + * Apply fn to each item in list, retaining only the ones for which > + * the function returns true. If free_util is true, call free() on > + * the util members of any items that have to be deleted. Preserve > + * the order of the items that are retained. > + */ > +void filter_string_list(struct string_list *list, int free_util, > + string_list_each_func_t fn, void *cb_data); > + > /* Use these functions only on sorted lists: */ > int string_list_has_string(const struct string_list *list, const char *string); > int string_list_find_insert_index(const struct string_list *list, const char *string, Having seen that the previous patch introduced a new test helper for unit testing (which is a very good idea) and dedicated a new test number, I would have expected to see a new test for filtering here. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] Add a new function, filter_string_list() 2012-09-09 9:40 ` Junio C Hamano @ 2012-09-10 8:58 ` Michael Haggerty 0 siblings, 0 replies; 23+ messages in thread From: Michael Haggerty @ 2012-09-10 8:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 09/09/2012 11:40 AM, Junio C Hamano wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > >> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> >> --- >> Documentation/technical/api-string-list.txt | 8 ++++++++ >> string-list.c | 17 +++++++++++++++++ >> string-list.h | 9 +++++++++ >> 3 files changed, 34 insertions(+) >> >> diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt >> index 3b959a2..15b8072 100644 >> --- a/Documentation/technical/api-string-list.txt >> +++ b/Documentation/technical/api-string-list.txt >> @@ -60,6 +60,14 @@ Functions >> >> * General ones (works with sorted and unsorted lists as well) >> >> +`filter_string_list`:: >> + >> + Apply a function to each item in a list, retaining only the >> + items for which the function returns true. If free_util is >> + true, call free() on the util members of any items that have >> + to be deleted. Preserve the order of the items that are >> + retained. > > In other words, this can safely be used on both sorted and unsorted > string list. Good. Preserving order (while retaining performance) is the main reason for this function. Otherwise, unsorted_string_list_delete_item() could be used in a loop. >> `print_string_list`:: >> >> Dump a string_list to stdout, useful mainly for debugging purposes. It >> diff --git a/string-list.c b/string-list.c >> index 110449c..72610ce 100644 >> --- a/string-list.c >> +++ b/string-list.c >> @@ -102,6 +102,23 @@ int for_each_string_list(struct string_list *list, >> return ret; >> } >> >> +void filter_string_list(struct string_list *list, int free_util, >> + string_list_each_func_t fn, void *cb_data) >> +{ >> + int src, dst = 0; >> + for (src = 0; src < list->nr; src++) { >> + if (fn(&list->items[src], cb_data)) { >> + list->items[dst++] = list->items[src]; >> + } else { >> + if (list->strdup_strings) >> + free(list->items[src].string); >> + if (free_util) >> + free(list->items[src].util); >> + } >> + } >> + list->nr = dst; >> +} >> + >> void string_list_clear(struct string_list *list, int free_util) >> { >> if (list->items) { >> diff --git a/string-list.h b/string-list.h >> index 7e51d03..84996aa 100644 >> --- a/string-list.h >> +++ b/string-list.h >> @@ -29,6 +29,15 @@ int for_each_string_list(struct string_list *list, >> #define for_each_string_list_item(item,list) \ >> for (item = (list)->items; item < (list)->items + (list)->nr; ++item) >> >> +/* >> + * Apply fn to each item in list, retaining only the ones for which >> + * the function returns true. If free_util is true, call free() on >> + * the util members of any items that have to be deleted. Preserve >> + * the order of the items that are retained. >> + */ >> +void filter_string_list(struct string_list *list, int free_util, >> + string_list_each_func_t fn, void *cb_data); >> + >> /* Use these functions only on sorted lists: */ >> int string_list_has_string(const struct string_list *list, const char *string); >> int string_list_find_insert_index(const struct string_list *list, const char *string, > > Having seen that the previous patch introduced a new test helper for > unit testing (which is a very good idea) and dedicated a new test > number, I would have expected to see a new test for filtering > here. I thought that the code was too trivial to warrant a test, especially considering that the memory handling aspect of the function can't be tested very well. But you've correctly shamed me into adding tests for this and also for patch 3/4, string_list_remove_duplicates(). Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/4] Add a new function, string_list_remove_duplicates() 2012-09-09 5:53 [PATCH 0/4] Add some string_list-related functions Michael Haggerty 2012-09-09 5:53 ` [PATCH 1/4] Add a new function, string_list_split_in_place() Michael Haggerty 2012-09-09 5:53 ` [PATCH 2/4] Add a new function, filter_string_list() Michael Haggerty @ 2012-09-09 5:53 ` Michael Haggerty 2012-09-09 9:45 ` Junio C Hamano 2012-09-09 5:53 ` [PATCH 4/4] Add a function string_list_longest_prefix() Michael Haggerty 3 siblings, 1 reply; 23+ messages in thread From: Michael Haggerty @ 2012-09-09 5:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Michael Haggerty Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- Documentation/technical/api-string-list.txt | 4 ++++ string-list.c | 17 +++++++++++++++++ string-list.h | 5 +++++ 3 files changed, 26 insertions(+) diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt index 15b8072..9206f8f 100644 --- a/Documentation/technical/api-string-list.txt +++ b/Documentation/technical/api-string-list.txt @@ -104,6 +104,10 @@ write `string_list_insert(...)->util = ...;`. Look up a given string in the string_list, returning the containing string_list_item. If the string is not found, NULL is returned. +`string_list_remove_duplicates`:: + + Remove all but the first entry that has a given string value. + * Functions for unsorted lists only `string_list_append`:: diff --git a/string-list.c b/string-list.c index 72610ce..bfef6cf 100644 --- a/string-list.c +++ b/string-list.c @@ -92,6 +92,23 @@ struct string_list_item *string_list_lookup(struct string_list *list, const char return list->items + i; } +void string_list_remove_duplicates(struct string_list *list, int free_util) +{ + if (list->nr > 1) { + int src, dst; + for (src = dst = 1; src < list->nr; src++) { + if (!strcmp(list->items[dst - 1].string, list->items[src].string)) { + if (list->strdup_strings) + free(list->items[src].string); + if (free_util) + free(list->items[src].util); + } else + list->items[dst++] = list->items[src]; + } + list->nr = dst; + } +} + int for_each_string_list(struct string_list *list, string_list_each_func_t fn, void *cb_data) { diff --git a/string-list.h b/string-list.h index 84996aa..c4dc659 100644 --- a/string-list.h +++ b/string-list.h @@ -38,6 +38,7 @@ int for_each_string_list(struct string_list *list, void filter_string_list(struct string_list *list, int free_util, string_list_each_func_t fn, void *cb_data); + /* Use these functions only on sorted lists: */ int string_list_has_string(const struct string_list *list, const char *string); int string_list_find_insert_index(const struct string_list *list, const char *string, @@ -47,6 +48,10 @@ struct string_list_item *string_list_insert_at_index(struct string_list *list, int insert_at, const char *string); struct string_list_item *string_list_lookup(struct string_list *list, const char *string); +/* Remove all but the first entry that has a given string value. */ +void string_list_remove_duplicates(struct string_list *list, int free_util); + + /* Use these functions only on unsorted lists: */ struct string_list_item *string_list_append(struct string_list *list, const char *string); void sort_string_list(struct string_list *list); -- 1.7.11.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] Add a new function, string_list_remove_duplicates() 2012-09-09 5:53 ` [PATCH 3/4] Add a new function, string_list_remove_duplicates() Michael Haggerty @ 2012-09-09 9:45 ` Junio C Hamano 2012-09-10 9:15 ` Michael Haggerty 0 siblings, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2012-09-09 9:45 UTC (permalink / raw) To: Michael Haggerty; +Cc: git Michael Haggerty <mhagger@alum.mit.edu> writes: > Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> > --- > Documentation/technical/api-string-list.txt | 4 ++++ > string-list.c | 17 +++++++++++++++++ > string-list.h | 5 +++++ > 3 files changed, 26 insertions(+) > > diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt > index 15b8072..9206f8f 100644 > --- a/Documentation/technical/api-string-list.txt > +++ b/Documentation/technical/api-string-list.txt > @@ -104,6 +104,10 @@ write `string_list_insert(...)->util = ...;`. > Look up a given string in the string_list, returning the containing > string_list_item. If the string is not found, NULL is returned. > > +`string_list_remove_duplicates`:: > + > + Remove all but the first entry that has a given string value. Unlike the previous patch, free_util is not documented? It is kind of shame that the string list must be sorted for this function to work, but I guess you do not have a need for a version that works on both sorted and unsorted (yet). We can introduce a variant with "unsorted_" prefix later when it becomes necessary, so OK. > * Functions for unsorted lists only > > `string_list_append`:: > diff --git a/string-list.c b/string-list.c > index 72610ce..bfef6cf 100644 > --- a/string-list.c > +++ b/string-list.c > @@ -92,6 +92,23 @@ struct string_list_item *string_list_lookup(struct string_list *list, const char > return list->items + i; > } > > +void string_list_remove_duplicates(struct string_list *list, int free_util) > +{ > + if (list->nr > 1) { > + int src, dst; > + for (src = dst = 1; src < list->nr; src++) { > + if (!strcmp(list->items[dst - 1].string, list->items[src].string)) { > + if (list->strdup_strings) > + free(list->items[src].string); > + if (free_util) > + free(list->items[src].util); > + } else > + list->items[dst++] = list->items[src]; > + } > + list->nr = dst; > + } > +} > + > int for_each_string_list(struct string_list *list, > string_list_each_func_t fn, void *cb_data) > { > diff --git a/string-list.h b/string-list.h > index 84996aa..c4dc659 100644 > --- a/string-list.h > +++ b/string-list.h > @@ -38,6 +38,7 @@ int for_each_string_list(struct string_list *list, > void filter_string_list(struct string_list *list, int free_util, > string_list_each_func_t fn, void *cb_data); > > + > /* Use these functions only on sorted lists: */ > int string_list_has_string(const struct string_list *list, const char *string); > int string_list_find_insert_index(const struct string_list *list, const char *string, > @@ -47,6 +48,10 @@ struct string_list_item *string_list_insert_at_index(struct string_list *list, > int insert_at, const char *string); > struct string_list_item *string_list_lookup(struct string_list *list, const char *string); > > +/* Remove all but the first entry that has a given string value. */ > +void string_list_remove_duplicates(struct string_list *list, int free_util); > + > + > /* Use these functions only on unsorted lists: */ > struct string_list_item *string_list_append(struct string_list *list, const char *string); > void sort_string_list(struct string_list *list); ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] Add a new function, string_list_remove_duplicates() 2012-09-09 9:45 ` Junio C Hamano @ 2012-09-10 9:15 ` Michael Haggerty 0 siblings, 0 replies; 23+ messages in thread From: Michael Haggerty @ 2012-09-10 9:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 09/09/2012 11:45 AM, Junio C Hamano wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > >> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> >> --- >> Documentation/technical/api-string-list.txt | 4 ++++ >> string-list.c | 17 +++++++++++++++++ >> string-list.h | 5 +++++ >> 3 files changed, 26 insertions(+) >> >> diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt >> index 15b8072..9206f8f 100644 >> --- a/Documentation/technical/api-string-list.txt >> +++ b/Documentation/technical/api-string-list.txt >> @@ -104,6 +104,10 @@ write `string_list_insert(...)->util = ...;`. >> Look up a given string in the string_list, returning the containing >> string_list_item. If the string is not found, NULL is returned. >> >> +`string_list_remove_duplicates`:: >> + >> + Remove all but the first entry that has a given string value. > > Unlike the previous patch, free_util is not documented? Fixed. > It is kind of shame that the string list must be sorted for this > function to work, but I guess you do not have a need for a version > that works on both sorted and unsorted (yet). We can introduce a > variant with "unsorted_" prefix later when it becomes necessary, so > OK. Not only that; for an unsorted list it is not quite as obvious what a caller would want. Often lists are used as a poor man's set, in which case the caller would probably not mind sorting the list anyway. There are two operations that one might conceive of for unsorted lists: (1) remove duplicates while preserving the order of the remaining entries, or (2) remove duplicates while not worrying about the order of the remaining entries. (Admittedly the first is not much more expensive than the second.) These are more complicated to program, require temporary space, and are of less obvious utility than removing duplicates from a sorted list. >> * Functions for unsorted lists only >> >> `string_list_append`:: >> diff --git a/string-list.c b/string-list.c >> index 72610ce..bfef6cf 100644 >> --- a/string-list.c >> +++ b/string-list.c >> @@ -92,6 +92,23 @@ struct string_list_item *string_list_lookup(struct string_list *list, const char >> return list->items + i; >> } >> >> +void string_list_remove_duplicates(struct string_list *list, int free_util) >> +{ >> + if (list->nr > 1) { >> + int src, dst; >> + for (src = dst = 1; src < list->nr; src++) { >> + if (!strcmp(list->items[dst - 1].string, list->items[src].string)) { >> + if (list->strdup_strings) >> + free(list->items[src].string); >> + if (free_util) >> + free(list->items[src].util); >> + } else >> + list->items[dst++] = list->items[src]; >> + } >> + list->nr = dst; >> + } >> +} >> + >> int for_each_string_list(struct string_list *list, >> string_list_each_func_t fn, void *cb_data) >> { >> diff --git a/string-list.h b/string-list.h >> index 84996aa..c4dc659 100644 >> --- a/string-list.h >> +++ b/string-list.h >> @@ -38,6 +38,7 @@ int for_each_string_list(struct string_list *list, >> void filter_string_list(struct string_list *list, int free_util, >> string_list_each_func_t fn, void *cb_data); >> >> + >> /* Use these functions only on sorted lists: */ >> int string_list_has_string(const struct string_list *list, const char *string); >> int string_list_find_insert_index(const struct string_list *list, const char *string, >> @@ -47,6 +48,10 @@ struct string_list_item *string_list_insert_at_index(struct string_list *list, >> int insert_at, const char *string); >> struct string_list_item *string_list_lookup(struct string_list *list, const char *string); >> >> +/* Remove all but the first entry that has a given string value. */ >> +void string_list_remove_duplicates(struct string_list *list, int free_util); >> + >> + >> /* Use these functions only on unsorted lists: */ >> struct string_list_item *string_list_append(struct string_list *list, const char *string); >> void sort_string_list(struct string_list *list); -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 4/4] Add a function string_list_longest_prefix() 2012-09-09 5:53 [PATCH 0/4] Add some string_list-related functions Michael Haggerty ` (2 preceding siblings ...) 2012-09-09 5:53 ` [PATCH 3/4] Add a new function, string_list_remove_duplicates() Michael Haggerty @ 2012-09-09 5:53 ` Michael Haggerty 2012-09-09 9:54 ` Junio C Hamano 3 siblings, 1 reply; 23+ messages in thread From: Michael Haggerty @ 2012-09-09 5:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Michael Haggerty Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- Documentation/technical/api-string-list.txt | 8 ++++++++ string-list.c | 20 +++++++++++++++++++ string-list.h | 8 ++++++++ t/t0063-string-list.sh | 30 +++++++++++++++++++++++++++++ test-string-list.c | 22 +++++++++++++++++++++ 5 files changed, 88 insertions(+) diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt index 9206f8f..291ac4c 100644 --- a/Documentation/technical/api-string-list.txt +++ b/Documentation/technical/api-string-list.txt @@ -68,6 +68,14 @@ Functions to be deleted. Preserve the order of the items that are retained. +`string_list_longest_prefix`:: + + Return the longest string within a string_list that is a + prefix (in the sense of prefixcmp()) of the specified string, + or NULL if no such prefix exists. This function does not + require the string_list to be sorted (it does a linear + search). + `print_string_list`:: Dump a string_list to stdout, useful mainly for debugging purposes. It diff --git a/string-list.c b/string-list.c index bfef6cf..043f6c4 100644 --- a/string-list.c +++ b/string-list.c @@ -136,6 +136,26 @@ void filter_string_list(struct string_list *list, int free_util, list->nr = dst; } +char *string_list_longest_prefix(const struct string_list *prefixes, + const char *string) +{ + int i, max_len = -1; + char *retval = NULL; + + for (i = 0; i < prefixes->nr; i++) { + char *prefix = prefixes->items[i].string; + if (!prefixcmp(string, prefix)) { + int len = strlen(prefix); + if (len > max_len) { + retval = prefix; + max_len = len; + } + } + } + + return retval; +} + void string_list_clear(struct string_list *list, int free_util) { if (list->items) { diff --git a/string-list.h b/string-list.h index c4dc659..680916c 100644 --- a/string-list.h +++ b/string-list.h @@ -38,6 +38,14 @@ int for_each_string_list(struct string_list *list, void filter_string_list(struct string_list *list, int free_util, string_list_each_func_t fn, void *cb_data); +/* + * Return the longest string in prefixes that is a prefix (in the + * sense of prefixcmp()) of string, or NULL if no such prefix exists. + * This function does not require the string_list to be sorted (it + * does a linear search). + */ +char *string_list_longest_prefix(const struct string_list *prefixes, const char *string); + /* Use these functions only on sorted lists: */ int string_list_has_string(const struct string_list *list, const char *string); diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh index 0eede83..fa96eba 100755 --- a/t/t0063-string-list.sh +++ b/t/t0063-string-list.sh @@ -15,6 +15,14 @@ string_list_split_in_place() { " } +longest_prefix() { + test "$(test-string-list longest_prefix "$1" "$2")" = "$3" +} + +no_longest_prefix() { + test_must_fail test-string-list longest_prefix "$1" "$2" +} + string_list_split_in_place "foo:bar:baz" ":" "-1" <<EOF 3 [0]: "foo" @@ -60,4 +68,26 @@ string_list_split_in_place ":" ":" "-1" <<EOF [1]: "" EOF +test_expect_success "test longest_prefix" ' + no_longest_prefix - '' && + no_longest_prefix - x && + longest_prefix "" x "" && + longest_prefix x x x && + longest_prefix "" foo "" && + longest_prefix : foo "" && + longest_prefix f foo f && + longest_prefix foo foobar foo && + longest_prefix foo foo foo && + no_longest_prefix bar foo && + no_longest_prefix bar:bar foo && + no_longest_prefix foobar foo && + longest_prefix foo:bar foo foo && + longest_prefix foo:bar bar bar && + longest_prefix foo::bar foo foo && + longest_prefix foo:foobar foo foo && + longest_prefix foobar:foo foo foo && + longest_prefix foo: bar "" && + longest_prefix :foo bar "" +' + test_done diff --git a/test-string-list.c b/test-string-list.c index f08d3cc..c7e71f2 100644 --- a/test-string-list.c +++ b/test-string-list.c @@ -19,6 +19,28 @@ int main(int argc, char **argv) return 0; } + if (argc == 4 && !strcmp(argv[1], "longest_prefix")) { + /* arguments: <colon-separated-prefixes>|- <string> */ + struct string_list prefixes = STRING_LIST_INIT_NODUP; + int retval; + char *prefix_string = xstrdup(argv[2]); + char *string = argv[3]; + char *match; + + if (strcmp(prefix_string, "-")) + string_list_split_in_place(&prefixes, prefix_string, ':', -1); + match = string_list_longest_prefix(&prefixes, string); + if (match) { + printf("%s\n", match); + retval = 0; + } + else + retval = 1; + string_list_clear(&prefixes, 0); + free(prefix_string); + return retval; + } + fprintf(stderr, "%s: unknown function name: %s\n", argv[0], argv[1] ? argv[1] : "(there was none)"); return 1; -- 1.7.11.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] Add a function string_list_longest_prefix() 2012-09-09 5:53 ` [PATCH 4/4] Add a function string_list_longest_prefix() Michael Haggerty @ 2012-09-09 9:54 ` Junio C Hamano 2012-09-10 10:01 ` Michael Haggerty 0 siblings, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2012-09-09 9:54 UTC (permalink / raw) To: Michael Haggerty; +Cc: git Michael Haggerty <mhagger@alum.mit.edu> writes: > Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> > --- > Documentation/technical/api-string-list.txt | 8 ++++++++ > string-list.c | 20 +++++++++++++++++++ > string-list.h | 8 ++++++++ > t/t0063-string-list.sh | 30 +++++++++++++++++++++++++++++ > test-string-list.c | 22 +++++++++++++++++++++ > 5 files changed, 88 insertions(+) > > diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt > index 9206f8f..291ac4c 100644 > --- a/Documentation/technical/api-string-list.txt > +++ b/Documentation/technical/api-string-list.txt > @@ -68,6 +68,14 @@ Functions > to be deleted. Preserve the order of the items that are > retained. > > +`string_list_longest_prefix`:: > + > + Return the longest string within a string_list that is a > + prefix (in the sense of prefixcmp()) of the specified string, > + or NULL if no such prefix exists. This function does not > + require the string_list to be sorted (it does a linear > + search). > + > `print_string_list`:: This may feel like outside the scope of this series, but since this series will be the main culprit for adding many new functions to this API in the recent history... - We may want to name things a bit more consistently so that people can tell which ones can be called on any string list, which ones are sorted list only, and which ones are unsorted one only. In addition, the last category _may_ need a bit more thought. Calling unsorted_string_list_lookup() on an already sorted list is not a crime---it is just a stupid thing to do. - Why are these new functions described at the top, not appended at the bottom? I would have expected either an alphabetical, or a more generic ones first (i.e. print and clear are a lot "easier" ones compared to filter and prefix that are very much more specialized). > diff --git a/string-list.c b/string-list.c > index bfef6cf..043f6c4 100644 > --- a/string-list.c > +++ b/string-list.c > @@ -136,6 +136,26 @@ void filter_string_list(struct string_list *list, int free_util, > list->nr = dst; > } > > +char *string_list_longest_prefix(const struct string_list *prefixes, > + const char *string) > +{ > + int i, max_len = -1; > + char *retval = NULL; > + > + for (i = 0; i < prefixes->nr; i++) { > + char *prefix = prefixes->items[i].string; > + if (!prefixcmp(string, prefix)) { > + int len = strlen(prefix); > + if (len > max_len) { > + retval = prefix; > + max_len = len; > + } > + } > + } > + > + return retval; > +} > + > void string_list_clear(struct string_list *list, int free_util) > { > if (list->items) { > diff --git a/string-list.h b/string-list.h > index c4dc659..680916c 100644 > --- a/string-list.h > +++ b/string-list.h > @@ -38,6 +38,14 @@ int for_each_string_list(struct string_list *list, > void filter_string_list(struct string_list *list, int free_util, > string_list_each_func_t fn, void *cb_data); > > +/* > + * Return the longest string in prefixes that is a prefix (in the > + * sense of prefixcmp()) of string, or NULL if no such prefix exists. > + * This function does not require the string_list to be sorted (it > + * does a linear search). > + */ > +char *string_list_longest_prefix(const struct string_list *prefixes, const char *string); > + > > /* Use these functions only on sorted lists: */ > int string_list_has_string(const struct string_list *list, const char *string); > diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh > index 0eede83..fa96eba 100755 > --- a/t/t0063-string-list.sh > +++ b/t/t0063-string-list.sh > @@ -15,6 +15,14 @@ string_list_split_in_place() { > " > } > > +longest_prefix() { > + test "$(test-string-list longest_prefix "$1" "$2")" = "$3" > +} > + > +no_longest_prefix() { > + test_must_fail test-string-list longest_prefix "$1" "$2" > +} > + > string_list_split_in_place "foo:bar:baz" ":" "-1" <<EOF > 3 > [0]: "foo" > @@ -60,4 +68,26 @@ string_list_split_in_place ":" ":" "-1" <<EOF > [1]: "" > EOF > > +test_expect_success "test longest_prefix" ' > + no_longest_prefix - '' && > + no_longest_prefix - x && > + longest_prefix "" x "" && > + longest_prefix x x x && > + longest_prefix "" foo "" && > + longest_prefix : foo "" && > + longest_prefix f foo f && > + longest_prefix foo foobar foo && > + longest_prefix foo foo foo && > + no_longest_prefix bar foo && > + no_longest_prefix bar:bar foo && > + no_longest_prefix foobar foo && > + longest_prefix foo:bar foo foo && > + longest_prefix foo:bar bar bar && > + longest_prefix foo::bar foo foo && > + longest_prefix foo:foobar foo foo && > + longest_prefix foobar:foo foo foo && > + longest_prefix foo: bar "" && > + longest_prefix :foo bar "" > +' > + > test_done > diff --git a/test-string-list.c b/test-string-list.c > index f08d3cc..c7e71f2 100644 > --- a/test-string-list.c > +++ b/test-string-list.c > @@ -19,6 +19,28 @@ int main(int argc, char **argv) > return 0; > } > > + if (argc == 4 && !strcmp(argv[1], "longest_prefix")) { > + /* arguments: <colon-separated-prefixes>|- <string> */ > + struct string_list prefixes = STRING_LIST_INIT_NODUP; > + int retval; > + char *prefix_string = xstrdup(argv[2]); > + char *string = argv[3]; > + char *match; > + > + if (strcmp(prefix_string, "-")) > + string_list_split_in_place(&prefixes, prefix_string, ':', -1); > + match = string_list_longest_prefix(&prefixes, string); > + if (match) { > + printf("%s\n", match); > + retval = 0; > + } > + else > + retval = 1; > + string_list_clear(&prefixes, 0); > + free(prefix_string); > + return retval; > + } > + > fprintf(stderr, "%s: unknown function name: %s\n", argv[0], > argv[1] ? argv[1] : "(there was none)"); > return 1; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] Add a function string_list_longest_prefix() 2012-09-09 9:54 ` Junio C Hamano @ 2012-09-10 10:01 ` Michael Haggerty 2012-09-10 16:24 ` Junio C Hamano 0 siblings, 1 reply; 23+ messages in thread From: Michael Haggerty @ 2012-09-10 10:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 09/09/2012 11:54 AM, Junio C Hamano wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > >> [...] >> diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt >> index 9206f8f..291ac4c 100644 >> --- a/Documentation/technical/api-string-list.txt >> +++ b/Documentation/technical/api-string-list.txt >> @@ -68,6 +68,14 @@ Functions >> to be deleted. Preserve the order of the items that are >> retained. >> >> +`string_list_longest_prefix`:: >> + >> + Return the longest string within a string_list that is a >> + prefix (in the sense of prefixcmp()) of the specified string, >> + or NULL if no such prefix exists. This function does not >> + require the string_list to be sorted (it does a linear >> + search). >> + >> `print_string_list`:: > > This may feel like outside the scope of this series, but since this > series will be the main culprit for adding many new functions to > this API in the recent history... > > - We may want to name things a bit more consistently so that people > can tell which ones can be called on any string list, which ones > are sorted list only, and which ones are unsorted one only. > > In addition, the last category _may_ need a bit more thought. > Calling unsorted_string_list_lookup() on an already sorted list > is not a crime---it is just a stupid thing to do. Yes, this could be clearer. Though I'm skeptical that a naming convention can capture all of the variation without being too cumbersome. Another idea: in string-list.h, one could name parameters "sorted_list" when they must be sorted as a precondition of the function. But before getting too hung up on finery, the effort might be better invested adding documentation for functions that are totally lacking it, like string_list_clear_func() for_each_string_list() for_each_string_list_item() string_list_find_insert_index() string_list_insert_at_index() While we're on the subject, it seems to me that documenting APIs like these in separate files under Documentation/technical rather than in the header files themselves - makes the documentation for a particular function harder to find, - makes it easier for the documentation to get out of sync with the actual collection of functions (e.g., the 5 undocumented functions listed above). - makes it awkward for the documentation to refer to particular function parameters by name. While it is nice to have a high-level prose description of an API, I am often frustrated by the lack of "docstrings" in the header file where a function is declared. The high-level description of an API could be put at the top of the header file. Also, better documentation in header files could enable the automatic generation of API docs (e.g., via doxygen). Is there some reason for the current documentation policy or is it historical and just needs somebody to put in the work to change it? > - Why are these new functions described at the top, not appended at > the bottom? I would have expected either an alphabetical, or a > more generic ones first (i.e. print and clear are a lot "easier" > ones compared to filter and prefix that are very much more > specialized). The order seemed logical to me at the time (given the constraint that functions are grouped by sorted/unsorted/don't-care): print_string_list() is only useful for debugging, so it seemed to belong below the "production" functions. string_list_clear() was already below print_string_list() (which I guessed was because it is logically used last in the life of a string_list) so I left it at the end of its section. My preference would probably have been to move print_string_list() below string_list_clear(), but somebody else made the opposite choice so I decided to respect it. That being said, I don't have anything against a different order. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] Add a function string_list_longest_prefix() 2012-09-10 10:01 ` Michael Haggerty @ 2012-09-10 16:24 ` Junio C Hamano 2012-09-10 16:33 ` Jeff King 0 siblings, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2012-09-10 16:24 UTC (permalink / raw) To: Michael Haggerty; +Cc: git Michael Haggerty <mhagger@alum.mit.edu> writes: > Another idea: in string-list.h, one could name parameters "sorted_list" > when they must be sorted as a precondition of the function. That sounds like a very sensible thing to do. > But before getting too hung up on finery, the effort might be better > invested adding documentation for functions that are totally lacking it, > like > > string_list_clear_func() > for_each_string_list() > for_each_string_list_item() > string_list_find_insert_index() > string_list_insert_at_index() > > While we're on the subject, it seems to me that documenting APIs like > these in separate files under Documentation/technical rather than in the > header files themselves > > - makes the documentation for a particular function harder to find, > > - makes it easier for the documentation to get out of sync with the > actual collection of functions (e.g., the 5 undocumented functions > listed above). > > - makes it awkward for the documentation to refer to particular function > parameters by name. > > While it is nice to have a high-level prose description of an API, I am > often frustrated by the lack of "docstrings" in the header file where a > function is declared. The high-level description of an API could be put > at the top of the header file. > > Also, better documentation in header files could enable the automatic > generation of API docs (e.g., via doxygen). Yeah, perhaps you may want to look into doing an automated generation of Documentation/technical/api-*.txt files out of the headers. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] Add a function string_list_longest_prefix() 2012-09-10 16:24 ` Junio C Hamano @ 2012-09-10 16:33 ` Jeff King 2012-09-10 17:48 ` Andreas Ericsson 0 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2012-09-10 16:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael Haggerty, git On Mon, Sep 10, 2012 at 09:24:17AM -0700, Junio C Hamano wrote: > > While we're on the subject, it seems to me that documenting APIs like > > these in separate files under Documentation/technical rather than in the > > header files themselves > > > > - makes the documentation for a particular function harder to find, > > > > - makes it easier for the documentation to get out of sync with the > > actual collection of functions (e.g., the 5 undocumented functions > > listed above). > > > > - makes it awkward for the documentation to refer to particular function > > parameters by name. > > > > While it is nice to have a high-level prose description of an API, I am > > often frustrated by the lack of "docstrings" in the header file where a > > function is declared. The high-level description of an API could be put > > at the top of the header file. > > > > Also, better documentation in header files could enable the automatic > > generation of API docs (e.g., via doxygen). > > Yeah, perhaps you may want to look into doing an automated > generation of Documentation/technical/api-*.txt files out of the > headers. I was just documenting something in technical/api-* the other day, and had the same feeling. I'd be very happy if we moved to some kind of literate-programming system. I have no idea which ones are good or bad, though. I have used doxygen, but all I remember is it being painfully baroque. I'd much rather have something simple and lightweight, with an easy markup format. For example, this: http://tomdoc.org/ Looks much nicer to me than most doxygen I've seen. But again, it's been a while, so maybe doxygen is nicer than I remember. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] Add a function string_list_longest_prefix() 2012-09-10 16:33 ` Jeff King @ 2012-09-10 17:48 ` Andreas Ericsson 2012-09-10 19:21 ` Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] Michael Haggerty 0 siblings, 1 reply; 23+ messages in thread From: Andreas Ericsson @ 2012-09-10 17:48 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Michael Haggerty, git On 09/10/2012 06:33 PM, Jeff King wrote: > On Mon, Sep 10, 2012 at 09:24:17AM -0700, Junio C Hamano wrote: > >>> While we're on the subject, it seems to me that documenting APIs like >>> these in separate files under Documentation/technical rather than in the >>> header files themselves >>> >>> - makes the documentation for a particular function harder to find, >>> >>> - makes it easier for the documentation to get out of sync with the >>> actual collection of functions (e.g., the 5 undocumented functions >>> listed above). >>> >>> - makes it awkward for the documentation to refer to particular function >>> parameters by name. >>> >>> While it is nice to have a high-level prose description of an API, I am >>> often frustrated by the lack of "docstrings" in the header file where a >>> function is declared. The high-level description of an API could be put >>> at the top of the header file. >>> >>> Also, better documentation in header files could enable the automatic >>> generation of API docs (e.g., via doxygen). >> >> Yeah, perhaps you may want to look into doing an automated >> generation of Documentation/technical/api-*.txt files out of the >> headers. > > I was just documenting something in technical/api-* the other day, and > had the same feeling. I'd be very happy if we moved to some kind of > literate-programming system. I have no idea which ones are good or bad, > though. I have used doxygen, but all I remember is it being painfully > baroque. I'd much rather have something simple and lightweight, with an > easy markup format. For example, this: > > http://tomdoc.org/ > > Looks much nicer to me than most doxygen I've seen. But again, it's been > a while, so maybe doxygen is nicer than I remember. > Doxygen has a the very nifty feature of being able to generate callgraphs though. We use it extensively at $dayjob, so if you need a hand building something sensible out of git's headers, I'd be happy to help. libgit2 uses doxygen btw, and has done since the start. If we ever merge the two, it would be neat to use the same. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] 2012-09-10 17:48 ` Andreas Ericsson @ 2012-09-10 19:21 ` Michael Haggerty 2012-09-10 21:56 ` Jeff King 0 siblings, 1 reply; 23+ messages in thread From: Michael Haggerty @ 2012-09-10 19:21 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Jeff King, Junio C Hamano, git I'm renaming this thread so that the bikeshedding can get over ASAP. On 09/10/2012 07:48 PM, Andreas Ericsson wrote: > On 09/10/2012 06:33 PM, Jeff King wrote: >> On Mon, Sep 10, 2012 at 09:24:17AM -0700, Junio C Hamano wrote: >>> Michael Haggerty <mhagger@alum.mit.edu> writes: >>>> Also, better documentation in header files could enable the automatic >>>> generation of API docs (e.g., via doxygen). >>> >>> Yeah, perhaps you may want to look into doing an automated >>> generation of Documentation/technical/api-*.txt files out of the >>> headers. >> >> I was just documenting something in technical/api-* the other day, and >> had the same feeling. I'd be very happy if we moved to some kind of >> literate-programming system. I have no idea which ones are good or bad, >> though. I have used doxygen, but all I remember is it being painfully >> baroque. I'd much rather have something simple and lightweight, with an >> easy markup format. For example, this: >> >> http://tomdoc.org/ >> >> Looks much nicer to me than most doxygen I've seen. But again, it's been >> a while, so maybe doxygen is nicer than I remember. I don't have a personal preference for what system is used. I mentioned doxygen only because it seems to be a well-known example. >From a glance at the URL you mentioned, it looks like TomDoc is only applicable to Ruby code. > Doxygen has a the very nifty feature of being able to generate > callgraphs though. We use it extensively at $dayjob, so if you need a > hand building something sensible out of git's headers, I'd be happy > to help. My plate is full. If you are able to work on this, it would be awesome. As far as I'm concerned, you are the new literate documentation czar :-) Most importantly, having a convenient system of converting header comments into documentation would hopefully motivate other people to add better header comments in the first place, and motivate reviewers to insist on them. It's shocking (to me) how few functions are documented, and how often I have to read masses of C code to figure out what a function is for, its pre- and post-conditions, its memory policy, etc. Often I find myself having to read functions three layers down the call tree to figure out the behavior of the top-layer function. I try to document things as I go, but it's only a drop in the bucket. > libgit2 uses doxygen btw, and has done since the start. If we ever > merge the two, it would be neat to use the same. That would be a nice bonus. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] 2012-09-10 19:21 ` Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] Michael Haggerty @ 2012-09-10 21:56 ` Jeff King 2012-09-10 22:09 ` Michael Haggerty 2012-09-11 1:01 ` Andreas Ericsson 0 siblings, 2 replies; 23+ messages in thread From: Jeff King @ 2012-09-10 21:56 UTC (permalink / raw) To: Michael Haggerty; +Cc: Andreas Ericsson, Junio C Hamano, git On Mon, Sep 10, 2012 at 09:21:12PM +0200, Michael Haggerty wrote: > I'm renaming this thread so that the bikeshedding can get over ASAP. Thanks. :) > >> http://tomdoc.org/ > >> > >> Looks much nicer to me than most doxygen I've seen. But again, it's been > >> a while, so maybe doxygen is nicer than I remember. > > I don't have a personal preference for what system is used. I mentioned > doxygen only because it seems to be a well-known example. > > From a glance at the URL you mentioned, it looks like TomDoc is only > applicable to Ruby code. Yeah, sorry, I should have been more clear; tomdoc is not an option because it doesn't do C. But what I like about it is the more natural markup syntax. I was wondering if there were other similar solutions. Looks like "NaturalDocs" is one: http://www.naturaldocs.org/documenting.html On the other hand, doxygen is well-known among open source folks, which counts for something. And from what I've read, recent versions support Markdown, but I'm not sure of the details. So maybe it is a lot better than I remember. > > Doxygen has a the very nifty feature of being able to generate > > callgraphs though. We use it extensively at $dayjob, so if you need a > > hand building something sensible out of git's headers, I'd be happy > > to help. It has been over a decade since I seriously used doxygen for anything, and then it was a medium-sized project. So take my opinion with a grain of salt. But I remember the callgraph feature being one of those things that _sounded_ really cool, but in practice was not all that useful. > My plate is full. If you are able to work on this, it would be awesome. > As far as I'm concerned, you are the new literate documentation czar :-) Lucky me? :) I think I'll leave it for the moment, and next time I start to add some api-level documentation I'll take a look at doxygen-ating them and see how I like it. And I'd invite anyone else to do the same (in doxygen, or whatever system you like -- the best way to evaluate a tool like this is to see how your real work would look). -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] 2012-09-10 21:56 ` Jeff King @ 2012-09-10 22:09 ` Michael Haggerty 2012-09-11 1:01 ` Andreas Ericsson 1 sibling, 0 replies; 23+ messages in thread From: Michael Haggerty @ 2012-09-10 22:09 UTC (permalink / raw) To: Jeff King; +Cc: Andreas Ericsson, Junio C Hamano, git On 09/10/2012 11:56 PM, Jeff King wrote: > On Mon, Sep 10, 2012 at 09:21:12PM +0200, Michael Haggerty wrote: >> My plate is full. If you are able to work on this, it would be awesome. >> As far as I'm concerned, you are the new literate documentation czar :-) > > Lucky me? :) I was nominating Andreas, who rashly volunteered to help. But don't feel left out; there's enough work to go around :-) > I think I'll leave it for the moment, and next time I start to add some > api-level documentation I'll take a look at doxygen-ating them and see > how I like it. And I'd invite anyone else to do the same (in doxygen, or > whatever system you like -- the best way to evaluate a tool like this is > to see how your real work would look). I agree with that. A very good start would be to mark up a single API and build the docs (by hand if need be) using a proposed tool. This will let people get a feel for (1) what the markup has to look like and (2) what they get out of it. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] 2012-09-10 21:56 ` Jeff King 2012-09-10 22:09 ` Michael Haggerty @ 2012-09-11 1:01 ` Andreas Ericsson 1 sibling, 0 replies; 23+ messages in thread From: Andreas Ericsson @ 2012-09-11 1:01 UTC (permalink / raw) To: Jeff King; +Cc: Michael Haggerty, Junio C Hamano, git On 09/10/2012 11:56 PM, Jeff King wrote: > On Mon, Sep 10, 2012 at 09:21:12PM +0200, Michael Haggerty wrote: > >> I'm renaming this thread so that the bikeshedding can get over ASAP. > > Thanks. :) > >>>> http://tomdoc.org/ >>>> >>>> Looks much nicer to me than most doxygen I've seen. But again, it's been >>>> a while, so maybe doxygen is nicer than I remember. >> >> I don't have a personal preference for what system is used. I mentioned >> doxygen only because it seems to be a well-known example. >> >> From a glance at the URL you mentioned, it looks like TomDoc is only >> applicable to Ruby code. > > Yeah, sorry, I should have been more clear; tomdoc is not an option > because it doesn't do C. But what I like about it is the more > natural markup syntax. I was wondering if there were other similar > solutions. Looks like "NaturalDocs" is one: > > http://www.naturaldocs.org/documenting.html > > On the other hand, doxygen is well-known among open source folks, which > counts for something. And from what I've read, recent versions support > Markdown, but I'm not sure of the details. So maybe it is a lot better > than I remember. > Markdown is supported, yes. There aren't really any details to it. I don't particularly like markdown, but my colleagues tend to use it for howto's and whatnot and it can be mixed with other doxygen styles without problem. >>> Doxygen has a the very nifty feature of being able to generate >>> callgraphs though. We use it extensively at $dayjob, so if you need a >>> hand building something sensible out of git's headers, I'd be happy >>> to help. > > It has been over a decade since I seriously used doxygen for anything, > and then it was a medium-sized project. So take my opinion with a grain > of salt. But I remember the callgraph feature being one of those things > that _sounded_ really cool, but in practice was not all that useful. > It's like all tools; Once you're used to it, it's immensely useful. I tend to prefer using it to find either code in dire need of refactoring (where the graph is too large), or engines and exit points. For those purposes, it's pretty hard to beat a good callgraph. >> My plate is full. If you are able to work on this, it would be awesome. >> As far as I'm concerned, you are the new literate documentation czar :-) > > Lucky me? :) > I think he was talking to me, but since you seem to have volunteered... ;) > I think I'll leave it for the moment, and next time I start to add some > api-level documentation I'll take a look at doxygen-ating them and see > how I like it. And I'd invite anyone else to do the same (in doxygen, or > whatever system you like -- the best way to evaluate a tool like this is > to see how your real work would look). > That's one of the problems. People follow what's already there, and there are no comments there now so there won't be any added in the future :-/ -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-09-11 1:01 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-09 5:53 [PATCH 0/4] Add some string_list-related functions Michael Haggerty 2012-09-09 5:53 ` [PATCH 1/4] Add a new function, string_list_split_in_place() Michael Haggerty 2012-09-09 9:35 ` Junio C Hamano 2012-09-10 4:45 ` Michael Haggerty 2012-09-10 5:47 ` Junio C Hamano 2012-09-10 11:48 ` Michael Haggerty 2012-09-10 16:09 ` Junio C Hamano 2012-09-09 5:53 ` [PATCH 2/4] Add a new function, filter_string_list() Michael Haggerty 2012-09-09 9:40 ` Junio C Hamano 2012-09-10 8:58 ` Michael Haggerty 2012-09-09 5:53 ` [PATCH 3/4] Add a new function, string_list_remove_duplicates() Michael Haggerty 2012-09-09 9:45 ` Junio C Hamano 2012-09-10 9:15 ` Michael Haggerty 2012-09-09 5:53 ` [PATCH 4/4] Add a function string_list_longest_prefix() Michael Haggerty 2012-09-09 9:54 ` Junio C Hamano 2012-09-10 10:01 ` Michael Haggerty 2012-09-10 16:24 ` Junio C Hamano 2012-09-10 16:33 ` Jeff King 2012-09-10 17:48 ` Andreas Ericsson 2012-09-10 19:21 ` Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] Michael Haggerty 2012-09-10 21:56 ` Jeff King 2012-09-10 22:09 ` Michael Haggerty 2012-09-11 1:01 ` Andreas Ericsson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).