All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/4] Add a new function, string_list_split_in_place()
Date: Mon, 10 Sep 2012 13:48:53 +0200	[thread overview]
Message-ID: <504DD3A5.8000201@alum.mit.edu> (raw)
In-Reply-To: <7vhar6pgxs.fsf@alter.siamese.dyndns.org>

On 09/10/2012 07:47 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> ...  Consider something like
>>
>>     struct string_list *split_file_into_words(FILE *f)
>>     {
>>         char buf[1024];
>>         struct string_list *list = new string list;
>>         list->strdup_strings = 1;
>>         while (not EOF) {
>>             read_line_into_buf();
>>             string_list_split_in_place(list, buf, ' ', -1);
>>         }
>>         return list;
>>     }
> 
> That is a prime example to argue that string_list_split() would make
> more sense, no?  The caller does _not_ mind if the function mucks
> with buf, but the resulting list is not allowed to point into buf.
> 
> In such a case, the caller shouldn't have to _care_ if it wants to
> allow buf to be mucked with; it is already asking that the resulting
> list _not_ point into buf by setting strdup_strings (by the way,
> that is part of the function input, so think of it like various *opt
> variables passed into functions to tweak their behaviour).  If the
> implementation can do so without sacrificing performance (and in
> this case, as you said, it can), it should take "const char *buf".

You're right, I was thinking that a caller of
string_list_split_in_place() could choose to remain ignorant of whether
strdup_strings is set, but this is incorrect because it needs to know
whether to do its own memory management of the strings that it passes
into string_list_append().

> So it appears to me that sl_split_in_place(), if implemented, should
> be kept as a special case for performance-minded callers that have
> full control of the lifetime rules of the variables they use, can
> set strdup_strings to false, and can let buf modified in place, and
> can accept list that point into buf.

OK, so the bottom line would be to have two versions of the function.
One takes a (const char *) and *requires* strdup_strings to be set on
its input list:

int string_list_split(struct string_list *list, const char *string,
		      int delim, int maxsplit)
{
	assert(list->strdup_strings);
	...
}

The other takes a (char *) and modifies it in-place, and maybe even
requires strdup_strings to be false on its input list:

int string_list_split_in_place(struct string_list *list, char *string,
			       int delim, int maxsplit)
{
	/* not an error per se but a strong suggestion of one: */
	assert(!list->strdup_strings);
	...
}

(The latter (modulo assert) is the one that I have implemented, but it
might not be needed immediately.)  Do you agree?

>>>> + * Examples:
>>>> + *   string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"]
>>>> + *   string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"]
>>>> + *   string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""]
>>>
>>> I would find it more natural to see a sentinel value against
>>> "positive" to be 0, not -1.  "-1" gives an impression as if "-2"
>>> might do something different from "-1", but Zero is a lot more
>>> special.
>>
>> You have raised a good point and I think there is a flaw in the API, but
>> I'm not sure I agree with you what the flaw is...
>>
>> The "maxsplit" argument limits the number of times the string should be
>> split.  I.e., if maxsplit is set, then the output will have at most
>> (maxsplit + 1) strings.
> 
> So "do not split, just give me the whole thing" would be maxsplit == 0
> to split into (maxsplit+1) == 1 string.  I think we are in agreement
> that your "-1" does not make any sense, and your documentation that
> said "positive" is the saner thing to do, no?

No.  I think that my handling of maxsplit=0 was incorrect but that we
should continue using -1 as the special value.

I see maxsplit=0 as a legitimate degenerate case meaning "split into 1
string".  Granted, it would only be useful in specialized situations
[1].  Moreover, "-1" makes a much more obvious special value than "0";
somebody reading code with maxsplit=-1 knows immediately that this is a
special value, whereas the handling of maxsplit=0 isn't quite so
blindingly obvious unless the reader knows the outcome of our current
discussion :-)

Therefore I still prefer treating only negative values of maxsplit to
mean "unlimited" and fixing maxsplit=0 as described.  But if you insist
on the other convention, let me know and I will change it.

Michael

[1] A case I can think of would be parsing a format like

    NUMPARENTS [PARENT...] SUMMARY

where "string_list_split(list, rest_of_line, ' ', numparents)" does the
right thing even if numparents==0.

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2012-09-10 11:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-09  5:53 [PATCH 0/4] Add some string_list-related functions Michael Haggerty
2012-09-09  5:53 ` [PATCH 1/4] Add a new function, string_list_split_in_place() Michael Haggerty
2012-09-09  9:35   ` Junio C Hamano
2012-09-10  4:45     ` Michael Haggerty
2012-09-10  5:47       ` Junio C Hamano
2012-09-10 11:48         ` Michael Haggerty [this message]
2012-09-10 16:09           ` Junio C Hamano
2012-09-09  5:53 ` [PATCH 2/4] Add a new function, filter_string_list() Michael Haggerty
2012-09-09  9:40   ` Junio C Hamano
2012-09-10  8:58     ` Michael Haggerty
2012-09-09  5:53 ` [PATCH 3/4] Add a new function, string_list_remove_duplicates() Michael Haggerty
2012-09-09  9:45   ` Junio C Hamano
2012-09-10  9:15     ` Michael Haggerty
2012-09-09  5:53 ` [PATCH 4/4] Add a function string_list_longest_prefix() Michael Haggerty
2012-09-09  9:54   ` Junio C Hamano
2012-09-10 10:01     ` Michael Haggerty
2012-09-10 16:24       ` Junio C Hamano
2012-09-10 16:33         ` Jeff King
2012-09-10 17:48           ` Andreas Ericsson
2012-09-10 19:21             ` Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] Michael Haggerty
2012-09-10 21:56               ` Jeff King
2012-09-10 22:09                 ` Michael Haggerty
2012-09-11  1:01                 ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=504DD3A5.8000201@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.