* [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split"
@ 2009-03-12 7:51 Christian Couder
[not found] ` <20090312190846.6117@nanako3.lavabit.com>
0 siblings, 1 reply; 9+ messages in thread
From: Christian Couder @ 2009-03-12 7:51 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Ingo Molnar, John Tapsell, Johannes Schindelin,
Pierre Habouzit
The "strbuf_split" function used to include the delimiter character
at the end of the splited strbufs it produced.
This behavior is not wanted in many cases, so this patch adds a new
"include_delim" parameter to the function to let us switch it on or
off as we want.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
builtin-mailinfo.c | 2 +-
imap-send.c | 2 +-
strbuf.c | 8 +++++---
strbuf.h | 2 +-
4 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index 2789ccd..b96ea1a 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -814,7 +814,7 @@ static void handle_body(void)
* multiple new lines. Pass only one chunk
* at a time to handle_filter()
*/
- lines = strbuf_split(&line, '\n');
+ lines = strbuf_split(&line, '\n', 1);
for (it = lines; (sb = *it); it++) {
if (*(it + 1) == NULL) /* The last line */
if (sb->buf[sb->len - 1] != '\n') {
diff --git a/imap-send.c b/imap-send.c
index cb518eb..a27f744 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1289,7 +1289,7 @@ static void wrap_in_html(struct msg_data *msg)
int added_header = 0;
strbuf_attach(&buf, msg->data, msg->len, msg->len);
- lines = strbuf_split(&buf, '\n');
+ lines = strbuf_split(&buf, '\n', 1);
strbuf_release(&buf);
for (p = lines; *p; p++) {
if (! added_header) {
diff --git a/strbuf.c b/strbuf.c
index 6ed0684..4ef9084 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -97,7 +97,9 @@ void strbuf_tolower(struct strbuf *sb)
sb->buf[i] = tolower(sb->buf[i]);
}
-struct strbuf **strbuf_split(const struct strbuf *sb, int delim)
+struct strbuf **strbuf_split(const struct strbuf *sb,
+ int delim,
+ int include_delim)
{
int alloc = 2, pos = 0;
char *n, *p;
@@ -114,8 +116,8 @@ struct strbuf **strbuf_split(const struct strbuf *sb, int delim)
ret = xrealloc(ret, sizeof(struct strbuf *) * alloc);
}
if (!n)
- n = sb->buf + sb->len - 1;
- len = n - p + 1;
+ n = sb->buf + sb->len - (include_delim ? 1 : 0);
+ len = n - p + (include_delim ? 1 : 0);
t = xmalloc(sizeof(struct strbuf));
strbuf_init(t, len);
strbuf_add(t, p, len);
diff --git a/strbuf.h b/strbuf.h
index 89bd36e..2202d28 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -83,7 +83,7 @@ extern void strbuf_ltrim(struct strbuf *);
extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
extern void strbuf_tolower(struct strbuf *);
-extern struct strbuf **strbuf_split(const struct strbuf *, int delim);
+extern struct strbuf **strbuf_split(const struct strbuf *, int delim, int include_delim);
extern void strbuf_list_free(struct strbuf **);
/*----- add data in your buffer -----*/
--
1.6.2.83.g012a16.dirty
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split"
[not found] ` <20090312190846.6117@nanako3.lavabit.com>
@ 2009-03-13 4:48 ` Christian Couder
2009-03-13 5:17 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Christian Couder @ 2009-03-13 4:48 UTC (permalink / raw)
To: Nanako Shiraishi
Cc: Junio C Hamano, git, Ingo Molnar, John Tapsell,
Johannes Schindelin, Pierre Habouzit
Le jeudi 12 mars 2009, Nanako Shiraishi a écrit :
> Quoting Christian Couder <chriscool@tuxfamily.org>:
> > The "strbuf_split" function used to include the delimiter character
> > at the end of the splited strbufs it produced.
> >
> > This behavior is not wanted in many cases, so this patch adds a new
> > "include_delim" parameter to the function to let us switch it on or
> > off as we want.
>
> Sorry, but I don't understand the above claim. You say "not wanted in
> many cases" but your patch updates the existing callers, all of which do
> want to include the delimiter.
In many programming languages, like Perl and Python for example, there is
a "split" function that splits strings, and by default the resulting
strings don't include the delimiter.
In Git there are only 2 existing callers and I think this function could be
used a lot more if there was a way not to include the delimiter.
In my patch series I add one caller that don't want the delimiter so after
my patch series there are already half as many callers that don't want the
delimiter.
And by the way, when I mentored the GSoC sequencer project I suggested to
Stephan to use strbuf_split, but we also had the problem that the delimiter
was included.
> The patch would easily justify itself if it made the callers pass 0 to
> the function to decline the delimiter, and as the result it made the
> codepaths that use the result simpler. But I don't think that is what
> your patch does.
Yes, my patch does not do that, because I think including the delimiter is a
special case of the more general and useful behavior of not including it.
Best regards,
Christian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split"
2009-03-13 4:48 ` Christian Couder
@ 2009-03-13 5:17 ` Junio C Hamano
2009-03-13 6:02 ` Christian Couder
0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2009-03-13 5:17 UTC (permalink / raw)
To: Christian Couder
Cc: Nanako Shiraishi, git, Ingo Molnar, John Tapsell,
Johannes Schindelin, Pierre Habouzit
Christian Couder <chriscool@tuxfamily.org> writes:
> Yes, my patch does not do that, because I think including the delimiter is a
> special case of the more general and useful behavior of not including it.
You got it backwards.
With the way the returned string is used by the single caller that your
patch adds (which splits at ","), I would agree that lack of delimiter
allows the result to get used directly in the further processing.
But even in that codepath, I have to say that it is just lazy programming
that the caller does not postprocess the returned value from the splitter
function. If it wants not just accept input such as "a,b,c" but also
wants to tolerate things like "a, b, c", it will have to look at the
resulting string, and ignoring the delimiter at the end becomes just a
small part of the general clean-up process [*1*].
Once you start allowing "split at one of these characters" and/or "split
at delimiter that matches this pattern", you cannot just discard the
delimiter if you want to support non-simplistic callers, because they
would want to know what the delimiter was.
Stripping out the delimiter is the special case for simplistic callers
(think "gets()" that strips, and "fgets()" that doesn't). A more general
solution should be by default not to strip it, and I do not think your new
caller, if it were written correctly, needs stripping behaviour either.
That means there is no need for the "optionally strip" flag to the
function in order to support the rest of the series [*2*].
Also comparing this with Perl/Python split() forgets that you are working
in C, where truncating an existing string is quite cheap (just assign '\0').
There is a different trade-off to be made in these language environments.
[Footnote]
*1* and this is not a made-up feature enhancement request. If you tell
people that they can give more than one value with comma separated, some
of them _will_ feed you --option="a, b, c". Your parser can error out by
saying "I do not understand ' b'", but you will be told "What other
possible interpretation is there for that thing, other than 'b'!", and
would grudgingly have to change your code to accept such a list.
*2* In any case, as I told you in a separate review comments, I think
passing a huge list as a command line parameter, be it separated with
comma or whatever, is not an appropriate way to solve the issue of
filter_skipped() your primary topic seems to be trying to address, so I
expect your series would not need strbuf_split() at all. You would most
likely be calling for_each_ref(), looking at the refs under "refs/bisect"
hierarchy, instead of having shell feed you the list.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split"
2009-03-13 5:17 ` Junio C Hamano
@ 2009-03-13 6:02 ` Christian Couder
2009-03-13 6:53 ` Junio C Hamano
2009-03-13 7:06 ` [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split" Junio C Hamano
0 siblings, 2 replies; 9+ messages in thread
From: Christian Couder @ 2009-03-13 6:02 UTC (permalink / raw)
To: Junio C Hamano
Cc: Nanako Shiraishi, git, Ingo Molnar, John Tapsell,
Johannes Schindelin, Pierre Habouzit
Le vendredi 13 mars 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@tuxfamily.org> writes:
> > Yes, my patch does not do that, because I think including the delimiter
> > is a special case of the more general and useful behavior of not
> > including it.
>
> You got it backwards.
>
> With the way the returned string is used by the single caller that your
> patch adds (which splits at ","), I would agree that lack of delimiter
> allows the result to get used directly in the further processing.
>
> But even in that codepath, I have to say that it is just lazy programming
> that the caller does not postprocess the returned value from the splitter
> function. If it wants not just accept input such as "a,b,c" but also
> wants to tolerate things like "a, b, c", it will have to look at the
> resulting string, and ignoring the delimiter at the end becomes just a
> small part of the general clean-up process [*1*].
I think talking about "lazy programming" in this case is a bit strong,
because first "git rev-list --bisect-skip" is plumbing and will be used
mostly by porcelain and second because there are much more common shell
utilities that don't tolerate things like "a, b, c". Try using "cut"
with -f'1, 2' (instead of -f1,2) for example.
> Once you start allowing "split at one of these characters" and/or "split
> at delimiter that matches this pattern", you cannot just discard the
> delimiter if you want to support non-simplistic callers, because they
> would want to know what the delimiter was.
But I let non simplistic callers use "1" as the last parameter if they want
the delimiter. I just give one more way to use strbuf_split. I don't remove
anything.
> Stripping out the delimiter is the special case for simplistic callers
> (think "gets()" that strips, and "fgets()" that doesn't).
Aren't gets and fgets an example that having the choice to strip out the
delimiter or not is good?
> A more general
> solution should be by default not to strip it, and I do not think your
> new caller, if it were written correctly, needs stripping behaviour
> either. That means there is no need for the "optionally strip" flag to
> the function in order to support the rest of the series [*2*].
I think my patch 8/7, that I just sent, is a good solution and it still uses
the new behavior of strbuf_split introduced in 1/7.
> Also comparing this with Perl/Python split() forgets that you are working
> in C, where truncating an existing string is quite cheap (just assign
> '\0'). There is a different trade-off to be made in these language
> environments.
Sorry but I think the goal of the strbuf API is to be quite high level, so I
think comparing this with Perl/Python is ok.
Best regards,
Christian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split"
2009-03-13 6:02 ` Christian Couder
@ 2009-03-13 6:53 ` Junio C Hamano
2009-03-14 7:46 ` Migrate bisect to C (was: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split") Christian Couder
2009-03-13 7:06 ` [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split" Junio C Hamano
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2009-03-13 6:53 UTC (permalink / raw)
To: Christian Couder
Cc: Nanako Shiraishi, git, Ingo Molnar, John Tapsell,
Johannes Schindelin, Pierre Habouzit
Christian Couder <chriscool@tuxfamily.org> writes:
> Aren't gets and fgets an example that having the choice to strip out the
> delimiter or not is good?
I consider it is an example that an earlier simplistic API (gets()) later
gets corrected when it is redone right (fgets()) with other changes like a
proper bounds check.
>> A more general
>> solution should be by default not to strip it, and I do not think your
>> new caller, if it were written correctly, needs stripping behaviour
>> either. That means there is no need for the "optionally strip" flag to
>> the function in order to support the rest of the series [*2*].
I also do not agree that you have to keep list of skip both in shell and
rev-list when you go the route I suggested. I think a separate bisect.c
you did is a good first step to make not just the bisect machinery but the
whole bisect command into a built-in, and even if we do not do the full
rewrite in C in one go, moving these "shell script reads from refs/bisect
only to feed the result to rev-list --bisect" pattern to "shell script
updates refs/bisect and let rev-list --bisect read from there" pattern
would be a good initial step. Oh, and I did not mean it only for "skip",
but also doing this for "good" and "bad" as well.
For example, you read "refs/bisect/skip-*" and keep that in $skip to:
- feed it to filter_skipped() which you are making built-in with this
series;
- feed it to check_good_are_ancestors_of_bad that in turn calls
check_merge_bases;
and its use is contained in bisect_next() alone. After this series is
done, we can move the logic in check_good_are... to bisect.c and you do
not have to read refs/bisect/skip-* in the shell anymore. IOW, we can
migrate away from the "shell reads from refs/bisect/ and feeds that to
rev-list --bisect" pattern incrementally.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split"
2009-03-13 6:02 ` Christian Couder
2009-03-13 6:53 ` Junio C Hamano
@ 2009-03-13 7:06 ` Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2009-03-13 7:06 UTC (permalink / raw)
To: Christian Couder
Cc: Nanako Shiraishi, git, Ingo Molnar, John Tapsell,
Johannes Schindelin, Pierre Habouzit
Christian Couder <chriscool@tuxfamily.org> writes:
>> Also comparing this with Perl/Python split() forgets that you are working
>> in C, where truncating an existing string is quite cheap (just assign
>> '\0'). There is a different trade-off to be made in these language
>> environments.
>
> Sorry but I think the goal of the strbuf API is to be quite high level, so I
> think comparing this with Perl/Python is ok.
I think you are mistaken again.
The highlevel-ness of strbuf holds true only while you live in the strbuf
world, but you should consider what happens when its end product finally
gets used in the calling function written in C. It will be a good-old NUL
terminated string or just a chunk of memory with known length.
For a user written in C, it is far easier and cheaper to trim excess at
the end, especially when the length of the string is known (strbuf_detach
gives you the length of the string exactly for this reason) than having to
extend it (in order to recover the lost delimiter because the splitter
removed it). Trimming from the front is also cheap (just move the head
pointer and start consuming from the middle).
^ permalink raw reply [flat|nested] 9+ messages in thread
* Migrate bisect to C (was: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split")
2009-03-13 6:53 ` Junio C Hamano
@ 2009-03-14 7:46 ` Christian Couder
2009-03-14 8:16 ` Migrate bisect to C Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Christian Couder @ 2009-03-14 7:46 UTC (permalink / raw)
To: Junio C Hamano
Cc: Nanako Shiraishi, git, Ingo Molnar, John Tapsell,
Johannes Schindelin, Pierre Habouzit
Le vendredi 13 mars 2009, Junio C Hamano a écrit :
>
> I also do not agree that you have to keep list of skip both in shell and
> rev-list when you go the route I suggested. I think a separate bisect.c
> you did is a good first step to make not just the bisect machinery but
> the whole bisect command into a built-in, and even if we do not do the
> full rewrite in C in one go, moving these "shell script reads from
> refs/bisect only to feed the result to rev-list --bisect" pattern to
> "shell script updates refs/bisect and let rev-list --bisect read from
> there" pattern would be a good initial step. Oh, and I did not mean it
> only for "skip", but also doing this for "good" and "bad" as well.
>
> For example, you read "refs/bisect/skip-*" and keep that in $skip to:
>
> - feed it to filter_skipped() which you are making built-in with this
> series;
>
> - feed it to check_good_are_ancestors_of_bad that in turn calls
> check_merge_bases;
>
> and its use is contained in bisect_next() alone. After this series is
> done, we can move the logic in check_good_are... to bisect.c and you do
> not have to read refs/bisect/skip-* in the shell anymore. IOW, we can
> migrate away from the "shell reads from refs/bisect/ and feeds that to
> rev-list --bisect" pattern incrementally.
Do you mean that you want this series to migrate both "filter_skipped" and
"check_good_are_ancestors_of_bad" to C? Or is it ok
if "check_good_are_ancestors_of_bad" migrates later?
If it is ok to migrate "check_good_are_ancestors_of_bad" later, then I think
something like the 8/7 patch I posted yesterday might be a good way,
because I think a "--bisect-read-refs" option that read refs
from "refs/bisect/*" would not fit well in "git rev-list".
Because, the "git rev-list" usage is:
git rev-list [OPTION] <commit-id>... [ -- paths... ]
That means that at least one <commit-id> should always be passed to "git
rev-list".
So it would be strange to have to pass a commit on the command line when
using the "--bisect-read-refs" option. And I think it would not be very
consistent to change the usage like this:
git rev-list [OPTION] [ --bisect-read-refs | <commit-id>... ] [ --
paths... ]
Also when we migrate "check_good_are_ancestors_of_bad" to C, we will
probably have to move the code that checks out the source code
("bisect_checkout" shell function),
because "check_good_are_ancestors_of_bad" can call "bisect_checkout".
And I don't think that the checkout behavior would fit well in "git
rev-list".
That's why I suggested to add a new "git rev-bisect" plumbing command that
would read refs from "refs/bisect/*" and that could later be fitted with
the "bisect_checkout" and "check_good_are_ancestors_of_bad" behavior.
Best regards,
Christian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Migrate bisect to C
2009-03-14 7:46 ` Migrate bisect to C (was: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split") Christian Couder
@ 2009-03-14 8:16 ` Junio C Hamano
2009-03-14 12:09 ` fetch--tool, was " Johannes Schindelin
0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2009-03-14 8:16 UTC (permalink / raw)
To: Christian Couder
Cc: Nanako Shiraishi, git, Ingo Molnar, John Tapsell,
Johannes Schindelin, Pierre Habouzit
Christian Couder <chriscool@tuxfamily.org> writes:
> Do you mean that you want this series to migrate both "filter_skipped" and
> "check_good_are_ancestors_of_bad" to C? Or is it ok
> if "check_good_are_ancestors_of_bad" migrates later?
One small step at a time. That's the only sane way we can get there.
> If it is ok to migrate "check_good_are_ancestors_of_bad" later, then I think
> something like the 8/7 patch I posted yesterday might be a good way,
In the final shape, you will be reading from refs/bisect namespace using
for_each_ref(), and at that point you won't have anybody feeding the
skipped from the standard input. The code you would add in [8/7] would
have to be removed if you go that route. If you make the filter_skipped
codepath to read from for_each_ref() during this round, you can still keep
that codepath even after you fully migrate everybody to C, no?
> ...
> That means that at least one <commit-id> should always be passed to "git
> rev-list".
But you do not have to even be tied to rev-list. After all, the partial
migration of filter_skipped is not "git bisect in C", but more like the
first subcommand to "git bisect--helper" command that is written in C and
can be called from shell. The next subcommand might be check_good_are...
and eventually you will have all the necessary and complex pieces the
shell version of "git bisect" currently implements as shell function as
the subcommands of "git bisect--helper". Finally, "git bisect in C" will
then make direct calls to the functions that would implement that "git
bisect--helper" command, and gets rid of the "helper" command altogether.
Side note. That was how git-fetch--tool was started; it was a
helper to partially migrate slower parts of git-fetch.sh to C. I
suspect we can almost remove it but not quite yet...
^ permalink raw reply [flat|nested] 9+ messages in thread
* fetch--tool, was Re: Migrate bisect to C
2009-03-14 8:16 ` Migrate bisect to C Junio C Hamano
@ 2009-03-14 12:09 ` Johannes Schindelin
0 siblings, 0 replies; 9+ messages in thread
From: Johannes Schindelin @ 2009-03-14 12:09 UTC (permalink / raw)
To: Junio C Hamano
Cc: Christian Couder, Nanako Shiraishi, git, Ingo Molnar,
John Tapsell, Pierre Habouzit
Hi,
On Sat, 14 Mar 2009, Junio C Hamano wrote:
> Side note. That was how git-fetch--tool was started; it was a
> helper to partially migrate slower parts of git-fetch.sh to C. I
> suspect we can almost remove it but not quite yet...
Funny. I was looking at that last week, and yes, there is still a user.
Through several functions in parse-remote.sh, git-pull.sh uses
fetch--tool.
Now, git-pull.sh is not all that large/complicated...
Ciao,
Dscho
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-03-14 12:09 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-12 7:51 [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split" Christian Couder
[not found] ` <20090312190846.6117@nanako3.lavabit.com>
2009-03-13 4:48 ` Christian Couder
2009-03-13 5:17 ` Junio C Hamano
2009-03-13 6:02 ` Christian Couder
2009-03-13 6:53 ` Junio C Hamano
2009-03-14 7:46 ` Migrate bisect to C (was: [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split") Christian Couder
2009-03-14 8:16 ` Migrate bisect to C Junio C Hamano
2009-03-14 12:09 ` fetch--tool, was " Johannes Schindelin
2009-03-13 7:06 ` [PATCH 1/7] strbuf: add "include_delim" parameter to "strbuf_split" Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).