* [PATCH 1/4] strbuf_split_buf(): use ALLOC_GROW()
2012-11-04 6:46 [PATCH 0/4] Simplify and document strbuf_split() functions Michael Haggerty
@ 2012-11-04 6:46 ` Michael Haggerty
2012-11-04 11:41 ` Jeff King
2012-11-04 6:46 ` [PATCH 2/4] strbuf_split_buf(): simplify iteration Michael Haggerty
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Michael Haggerty @ 2012-11-04 6:46 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty
Use ALLOC_GROW() rather than inline code to manage memory in
strbuf_split_buf(). Rename "pos" to "nr" because it better describes
the use of the variable and it better conforms to the "ALLOC_GROW"
idiom.
Also, instead of adding a sentinal NULL value after each entry is
added to the list, only add it once after all of the entries have been
added.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
strbuf.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 4b9e30c..5256c2a 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -108,33 +108,30 @@ void strbuf_ltrim(struct strbuf *sb)
struct strbuf **strbuf_split_buf(const char *str, size_t slen, int delim, int max)
{
- int alloc = 2, pos = 0;
+ struct strbuf **ret = NULL;
+ size_t nr = 0, alloc = 0;
const char *n, *p;
- struct strbuf **ret;
struct strbuf *t;
- ret = xcalloc(alloc, sizeof(struct strbuf *));
p = n = str;
while (n < str + slen) {
int len;
- if (max <= 0 || pos + 1 < max)
+ if (max <= 0 || nr + 1 < max)
n = memchr(n, delim, slen - (n - str));
else
n = NULL;
- if (pos + 1 >= alloc) {
- alloc = alloc * 2;
- ret = xrealloc(ret, sizeof(struct strbuf *) * alloc);
- }
if (!n)
n = str + slen - 1;
len = n - p + 1;
t = xmalloc(sizeof(struct strbuf));
strbuf_init(t, len);
strbuf_add(t, p, len);
- ret[pos] = t;
- ret[++pos] = NULL;
+ ALLOC_GROW(ret, nr + 2, alloc);
+ ret[nr++] = t;
p = ++n;
}
+ ALLOC_GROW(ret, nr + 1, alloc); /* In case string was empty */
+ ret[nr] = NULL;
return ret;
}
--
1.8.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/4] strbuf_split_buf(): use ALLOC_GROW()
2012-11-04 6:46 ` [PATCH 1/4] strbuf_split_buf(): use ALLOC_GROW() Michael Haggerty
@ 2012-11-04 11:41 ` Jeff King
2012-11-06 7:54 ` Michael Haggerty
0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2012-11-04 11:41 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Junio C Hamano, git
On Sun, Nov 04, 2012 at 07:46:51AM +0100, Michael Haggerty wrote:
> Use ALLOC_GROW() rather than inline code to manage memory in
> strbuf_split_buf(). Rename "pos" to "nr" because it better describes
> the use of the variable and it better conforms to the "ALLOC_GROW"
> idiom.
I suspect this was not used originally because ALLOC_GROW relies on
alloc_nr, which does fast growth early on. At (x+16)*3/2, we end up with
24 slots for the first allocation. We are typically splitting 1 or 2
values.
It probably doesn't make a big difference in practice, though, as we're
talking about wasting less than 200 bytes on a 64-bit platform, and we
do not tend to keep large numbers of split lists around.
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/4] strbuf_split_buf(): use ALLOC_GROW()
2012-11-04 11:41 ` Jeff King
@ 2012-11-06 7:54 ` Michael Haggerty
2012-11-08 16:38 ` Jeff King
0 siblings, 1 reply; 9+ messages in thread
From: Michael Haggerty @ 2012-11-06 7:54 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
On 11/04/2012 12:41 PM, Jeff King wrote:
> On Sun, Nov 04, 2012 at 07:46:51AM +0100, Michael Haggerty wrote:
>
>> Use ALLOC_GROW() rather than inline code to manage memory in
>> strbuf_split_buf(). Rename "pos" to "nr" because it better describes
>> the use of the variable and it better conforms to the "ALLOC_GROW"
>> idiom.
>
> I suspect this was not used originally because ALLOC_GROW relies on
> alloc_nr, which does fast growth early on. At (x+16)*3/2, we end up with
> 24 slots for the first allocation. We are typically splitting 1 or 2
> values.
>
> It probably doesn't make a big difference in practice, though, as we're
> talking about wasting less than 200 bytes on a 64-bit platform, and we
> do not tend to keep large numbers of split lists around.
I did a little bit of archeology, and found out that
* ALLOC_GROW() did indeed exist when this code was developed, so it
*could have* been used.
* OTOH, I didn't find any indication on the mailing list that the
choice not to use ALLOC_GROW() was a conscious decision.
So history doesn't give us much guidance.
If the size of the initial allocation is a concern, then I would suggest
adding a macro like ALLOC_SET_SIZE(ary,nr,alloc) that could be called to
initialize the size to some number less than 24. Such a macro might be
useful elsewhere, too. It wouldn't, of course, slow the growth rate
*after* the first allocation.
FWIW, the "max" parameter of strbuf_split*() is only used in one place,
though strbuf_split*() is used in some other places where not too many
substrings would be expected.
I am working on some patch series that will eliminate even more uses of
strbuf_split*(), so I won't work more on optimizing its resource usage
unless somebody gives me a stronger nudge.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/4] strbuf_split_buf(): use ALLOC_GROW()
2012-11-06 7:54 ` Michael Haggerty
@ 2012-11-08 16:38 ` Jeff King
0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2012-11-08 16:38 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Junio C Hamano, git
On Tue, Nov 06, 2012 at 08:54:07AM +0100, Michael Haggerty wrote:
> > I suspect this was not used originally because ALLOC_GROW relies on
> > alloc_nr, which does fast growth early on. At (x+16)*3/2, we end up with
> > 24 slots for the first allocation. We are typically splitting 1 or 2
> > values.
> >
> > It probably doesn't make a big difference in practice, though, as we're
> > talking about wasting less than 200 bytes on a 64-bit platform, and we
> > do not tend to keep large numbers of split lists around.
>
> I did a little bit of archeology, and found out that
>
> * ALLOC_GROW() did indeed exist when this code was developed, so it
> *could have* been used.
>
> * OTOH, I didn't find any indication on the mailing list that the
> choice not to use ALLOC_GROW() was a conscious decision.
>
> So history doesn't give us much guidance.
Thanks for digging.
> If the size of the initial allocation is a concern, then I would suggest
> adding a macro like ALLOC_SET_SIZE(ary,nr,alloc) that could be called to
> initialize the size to some number less than 24. Such a macro might be
> useful elsewhere, too. It wouldn't, of course, slow the growth rate
> *after* the first allocation.
I think we are getting into premature optimization territory. Let's
take your series as a cleanup, and we can worry about micro-optimizing
the allocation if and when it ever becomes an issue.
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/4] strbuf_split_buf(): simplify iteration
2012-11-04 6:46 [PATCH 0/4] Simplify and document strbuf_split() functions Michael Haggerty
2012-11-04 6:46 ` [PATCH 1/4] strbuf_split_buf(): use ALLOC_GROW() Michael Haggerty
@ 2012-11-04 6:46 ` Michael Haggerty
2012-11-04 6:46 ` [PATCH 3/4] strbuf_split*(): rename "delim" parameter to "terminator" Michael Haggerty
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Michael Haggerty @ 2012-11-04 6:46 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty
While iterating, update str and slen to keep track of the part of the
string that hasn't been processed yet rather than computing things
relative to the start of the original string. This eliminates one
local variable, reduces the scope of another, and reduces the amount
of arithmetic needed within the loop.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
strbuf.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 5256c2a..c7cd529 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -110,25 +110,22 @@ struct strbuf **strbuf_split_buf(const char *str, size_t slen, int delim, int ma
{
struct strbuf **ret = NULL;
size_t nr = 0, alloc = 0;
- const char *n, *p;
struct strbuf *t;
- p = n = str;
- while (n < str + slen) {
- int len;
- if (max <= 0 || nr + 1 < max)
- n = memchr(n, delim, slen - (n - str));
- else
- n = NULL;
- if (!n)
- n = str + slen - 1;
- len = n - p + 1;
+ while (slen) {
+ int len = slen;
+ if (max <= 0 || nr + 1 < max) {
+ const char *end = memchr(str, delim, slen);
+ if (end)
+ len = end - str + 1;
+ }
t = xmalloc(sizeof(struct strbuf));
strbuf_init(t, len);
- strbuf_add(t, p, len);
+ strbuf_add(t, str, len);
ALLOC_GROW(ret, nr + 2, alloc);
ret[nr++] = t;
- p = ++n;
+ str += len;
+ slen -= len;
}
ALLOC_GROW(ret, nr + 1, alloc); /* In case string was empty */
ret[nr] = NULL;
--
1.8.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/4] strbuf_split*(): rename "delim" parameter to "terminator"
2012-11-04 6:46 [PATCH 0/4] Simplify and document strbuf_split() functions Michael Haggerty
2012-11-04 6:46 ` [PATCH 1/4] strbuf_split_buf(): use ALLOC_GROW() Michael Haggerty
2012-11-04 6:46 ` [PATCH 2/4] strbuf_split_buf(): simplify iteration Michael Haggerty
@ 2012-11-04 6:46 ` Michael Haggerty
2012-11-04 6:46 ` [PATCH 4/4] strbuf_split*(): document functions Michael Haggerty
2012-11-04 11:50 ` [PATCH 0/4] Simplify and document strbuf_split() functions Jeff King
4 siblings, 0 replies; 9+ messages in thread
From: Michael Haggerty @ 2012-11-04 6:46 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty
The word "delimiter" suggests that the argument separates the
substrings, whereas in fact (1) the delimiter characters are included
in the output, and (2) if the input string ends with the delimiter,
then the output does not include a final empty string. So rename the
"delim" arguments of the strbuf_split() family of functions to
"terminator", which is more suggestive of how it is used.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
strbuf.c | 5 +++--
strbuf.h | 15 ++++++++-------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index c7cd529..05d0693 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -106,7 +106,8 @@ void strbuf_ltrim(struct strbuf *sb)
sb->buf[sb->len] = '\0';
}
-struct strbuf **strbuf_split_buf(const char *str, size_t slen, int delim, int max)
+struct strbuf **strbuf_split_buf(const char *str, size_t slen,
+ int terminator, int max)
{
struct strbuf **ret = NULL;
size_t nr = 0, alloc = 0;
@@ -115,7 +116,7 @@ struct strbuf **strbuf_split_buf(const char *str, size_t slen, int delim, int ma
while (slen) {
int len = slen;
if (max <= 0 || nr + 1 < max) {
- const char *end = memchr(str, delim, slen);
+ const char *end = memchr(str, terminator, slen);
if (end)
len = end - str + 1;
}
diff --git a/strbuf.h b/strbuf.h
index be941ee..c896a47 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -45,20 +45,21 @@ extern void strbuf_ltrim(struct strbuf *);
extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
extern struct strbuf **strbuf_split_buf(const char *, size_t,
- int delim, int max);
+ int terminator, int max);
static inline struct strbuf **strbuf_split_str(const char *str,
- int delim, int max)
+ int terminator, int max)
{
- return strbuf_split_buf(str, strlen(str), delim, max);
+ return strbuf_split_buf(str, strlen(str), terminator, max);
}
static inline struct strbuf **strbuf_split_max(const struct strbuf *sb,
- int delim, int max)
+ int terminator, int max)
{
- return strbuf_split_buf(sb->buf, sb->len, delim, max);
+ return strbuf_split_buf(sb->buf, sb->len, terminator, max);
}
-static inline struct strbuf **strbuf_split(const struct strbuf *sb, int delim)
+static inline struct strbuf **strbuf_split(const struct strbuf *sb,
+ int terminator)
{
- return strbuf_split_max(sb, delim, 0);
+ return strbuf_split_max(sb, terminator, 0);
}
extern void strbuf_list_free(struct strbuf **);
--
1.8.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/4] strbuf_split*(): document functions
2012-11-04 6:46 [PATCH 0/4] Simplify and document strbuf_split() functions Michael Haggerty
` (2 preceding siblings ...)
2012-11-04 6:46 ` [PATCH 3/4] strbuf_split*(): rename "delim" parameter to "terminator" Michael Haggerty
@ 2012-11-04 6:46 ` Michael Haggerty
2012-11-04 11:50 ` [PATCH 0/4] Simplify and document strbuf_split() functions Jeff King
4 siblings, 0 replies; 9+ messages in thread
From: Michael Haggerty @ 2012-11-04 6:46 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git, Michael Haggerty
Document strbuf_split_buf(), strbuf_split_str(), strbuf_split_max(),
strbuf_split(), and strbuf_list_free() in the header file and in
api-strbuf.txt. (These functions were previously completely
undocumented.)
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
Documentation/technical/api-strbuf.txt | 16 ++++++++++++++++
strbuf.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+)
diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index 95a8bf3..84686b5 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -279,6 +279,22 @@ same behaviour as well.
Strip whitespace from a buffer. The second parameter controls if
comments are considered contents to be removed or not.
+`strbuf_split_buf`::
+`strbuf_split_str`::
+`strbuf_split_max`::
+`strbuf_split`::
+
+ Split a string or strbuf into a list of strbufs at a specified
+ terminator character. The returned substrings include the
+ terminator characters. Some of these functions take a `max`
+ parameter, which, if positive, limits the output to that
+ number of substrings.
+
+`strbuf_list_free`::
+
+ Free a list of strbufs (for example, the return values of the
+ `strbuf_split()` functions).
+
`launch_editor`::
Launch the user preferred editor to edit a file and fill the buffer
diff --git a/strbuf.h b/strbuf.h
index c896a47..aa386c6 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -44,23 +44,56 @@ extern void strbuf_rtrim(struct strbuf *);
extern void strbuf_ltrim(struct strbuf *);
extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
+/*
+ * Split str (of length slen) at the specified terminator character.
+ * Return a null-terminated array of pointers to strbuf objects
+ * holding the substrings. The substrings include the terminator,
+ * except for the last substring, which might be unterminated if the
+ * original string did not end with a terminator. If max is positive,
+ * then split the string into at most max substrings (with the last
+ * substring containing everything following the (max-1)th terminator
+ * character).
+ *
+ * For lighter-weight alternatives, see string_list_split() and
+ * string_list_split_in_place().
+ */
extern struct strbuf **strbuf_split_buf(const char *, size_t,
int terminator, int max);
+
+/*
+ * Split a NUL-terminated string at the specified terminator
+ * character. See strbuf_split_buf() for more information.
+ */
static inline struct strbuf **strbuf_split_str(const char *str,
int terminator, int max)
{
return strbuf_split_buf(str, strlen(str), terminator, max);
}
+
+/*
+ * Split a strbuf at the specified terminator character. See
+ * strbuf_split_buf() for more information.
+ */
static inline struct strbuf **strbuf_split_max(const struct strbuf *sb,
int terminator, int max)
{
return strbuf_split_buf(sb->buf, sb->len, terminator, max);
}
+
+/*
+ * Split a strbuf at the specified terminator character. See
+ * strbuf_split_buf() for more information.
+ */
static inline struct strbuf **strbuf_split(const struct strbuf *sb,
int terminator)
{
return strbuf_split_max(sb, terminator, 0);
}
+
+/*
+ * Free a NULL-terminated list of strbufs (for example, the return
+ * values of the strbuf_split*() functions).
+ */
extern void strbuf_list_free(struct strbuf **);
/*----- add data in your buffer -----*/
--
1.8.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] Simplify and document strbuf_split() functions
2012-11-04 6:46 [PATCH 0/4] Simplify and document strbuf_split() functions Michael Haggerty
` (3 preceding siblings ...)
2012-11-04 6:46 ` [PATCH 4/4] strbuf_split*(): document functions Michael Haggerty
@ 2012-11-04 11:50 ` Jeff King
4 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2012-11-04 11:50 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Junio C Hamano, git
On Sun, Nov 04, 2012 at 07:46:50AM +0100, Michael Haggerty wrote:
> The strbuf_split() family of functions was completely undocumented.
> Add documentation and also simplify the definition of
> strbuf_split_buf().
Thanks. Looks good overall, even with the comments I raised for patch 1
(I think it is not a big deal, and even if it does become a big deal,
the right fix is probably to make alloc_nr smarter about early growth).
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread