git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Strbuf documentation: document most functions
@ 2008-06-02 22:59 Miklos Vajna
  2008-06-02 23:39 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Miklos Vajna @ 2008-06-02 22:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

All functions in strbuf.h are documented, except strbuf_expand() and
launch_editor().

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
---

Actually this is a bit of request for help, I haven't figured out what
strbuf_expand() does, and I don't know the second argument of
launch_editor() is good for, but I *think* I'm familiar with the rest of
the functions, so here is an attempt to document them.

 Documentation/technical/api-strbuf.txt |  173 +++++++++++++++++++++++++++++++-
 1 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index a52e4f3..3879e0e 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -1,6 +1,175 @@
 strbuf API
 ==========
 
-Talk about <strbuf.h>
+strbuf's can be use in many ways: as a byte array, or to store arbitrary
+long, overflow safe strings.
 
-(Pierre, JC)
+strbufs has some invariants that are very important to keep in mind:
+
+. The `->buf` member is always malloc-ed, hence strbuf's can be used to
+  build complex strings/buffers whose final size isn't easily known.
++
+It is *not* legal to copy the `->buf` pointer away. `strbuf_detach()` is
+the operation that detachs a buffer from its shell while keeping the
+shell valid wrt its invariants.
+
+. The `->buf` member is a byte array that has at least `->len + 1` bytes
+  allocated. The extra byte is used to store a `'\0'`, allowing the
+  `->buf` member to be a valid C-string. Every strbuf function ensure this
+  invariant is preserved.
++
+NOTE: It is OK to "play" with the buffer directly if you work it that
+      way:
++
+----
+strbuf_grow(sb, SOME_SIZE);
+   ... Here, the memory array starting at sb->buf, and of length
+   ... strbuf_avail(sb) is all yours, and you are sure that
+   ... strbuf_avail(sb) is at least SOME_SIZE.
+strbuf_setlen(sb, sb->len + SOME_OTHER_SIZE);
+----
++
+Of course, SOME_OTHER_SIZE must be smaller or equal to strbuf_avail(sb).
++
+Doing so is safe, though if it has to be done in many places, adding the
+missing API to the strbuf module is the way to go.
++
+WARNING: Do _not_ assume that the area that is yours is of size `->alloc
+- 1` even if it's true in the current implementation. Alloc is somehow a
+"private" member that should not be messed with.
+
+Data structures
+---------------
+
+* `struct strbuf`
+
+This is string buffer structure. The `->len` variable can be used to
+determine the current length of the string, and `->buf` provides access
+to the string itself.
+
+Functions
+---------
+
+* Life cycle
+
+`strbuf_init`::
+
+	Initializes the structure. The second parameter can be zero or a bigger
+	number to allocate memory, in case you want to prevent further reallocs.
+
+`strbuf_release`::
+
+	Releases a string buffer and the memory it used. You should not use the
+	string buffer after using this function, unless you initialize it again.
+
+`strbuf_detach`::
+
+	Detaches the string from the string buffer. The function returns a
+	pointer to the old string and empties the buffer.
+
+`strbuf_attach`::
+
+	Attaches a string to a buffer. You should specify the string to attach,
+	the current length of the string and the amount of allocated memory.
+
+`strbuf_swap`::
+
+	Swaps the contents of two string buffers.
+
+* Related to the size of the buffer
+
+`strbuf_avail`::
+
+	Determines the amount of allocated but not used memory.
+
+`strbuf_grow`::
+
+	Allocated extra memory for the buffer.
+
+`strbuf_setlen`::
+
+	Sets the length of the buffer to a given value.
+
+`strbuf_reset`::
+
+	Empties the buffer by setting the size of it to zero.
+
+* Related to the contents of the buffer
+
+`strbuf_rtrim`::
+
+	Strip whitespace from the end of a string.
+
+`strbuf_cmp`::
+
+	Compares two buffers. Returns an integer less than, equal to, or greater
+	than zero if the first buffer is found, respectively, to be less than,
+	to match, or be greater than the second buffer.
+
+* Adding data to the buffer
+
+`strbuf_addch`::
+
+Adds a single character.
+
+`strbuf_insert`::
+
+	Insert data to the given position of the buffer. The remaining contents
+	will be shifted, not overwritten.
+
+`strbuf_remove`::
+
+	Remove given amount of data from a given position of the buffer.
+
+`strbuf_splice`::
+
+	Splice pos..pos+len with given data.
+
+`strbuf_add`::
+
+	Add data of given length to the buffer.
+
+`strbuf_addstr`::
+
+	Add a NULL-terminated string to the buffer.
+
+`strbuf_addbuf`::
+
+	Add an other buffer to the current one.
+
+`strbuf_adddup`::
+
+	Copy part of the buffer from a given position till a given length to the
+	end of the buffer.
+
+`strbuf_expand`::
+
+`strbuf_addf`::
+
+	Add a formatted string to the buffer.
+
+`strbuf_fread`::
+
+	Read a given size of data from a FILE* pointer to the buffer.
+
+`strbuf_read`::
+
+	Read the contents of a given file descriptor. The third argument can be
+	used to give a hint about the file, to avoid reallocs.
+
+`strbuf_read_file`::
+
+	Read the contents of a file, specified by its path. The third argument
+	can be used to give a hint about the file, to avoid reallocs.
+
+`strbuf_getline`::
+
+	Read a line from a FILE* pointer. The second argument specifies the line
+	terminator character, like `'\n'`.
+
+`stripspace`::
+
+	Strips whitespace from a buffer. The second parameter controls if
+	comments are considered contents to be removed or not.
+
+`launch_editor`::
-- 
1.5.6.rc0.dirty

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Strbuf documentation: document most functions
  2008-06-02 22:59 [PATCH] Strbuf documentation: document most functions Miklos Vajna
@ 2008-06-02 23:39 ` Junio C Hamano
  2008-06-03  0:00 ` Johannes Schindelin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2008-06-02 23:39 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: git, Pierre Habouzit

Miklos Vajna <vmiklos@frugalware.org> writes:

> All functions in strbuf.h are documented, except strbuf_expand() and
> launch_editor().
>
> Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
> ---
>
> Actually this is a bit of request for help,...

In that case,...

> diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
> index a52e4f3..3879e0e 100644
> --- a/Documentation/technical/api-strbuf.txt
> +++ b/Documentation/technical/api-strbuf.txt
> @@ -1,6 +1,175 @@
>  strbuf API
>  ==========
>  
> -Talk about <strbuf.h>
> +strbuf's can be use in many ways: as a byte array, or to store arbitrary
> +long, overflow safe strings.
>  
> -(Pierre, JC)

I expected somebody who touches this file to take hints from these names
before removing them ;-)

I'm still at work so I won't be commenting on them in this message but
I'll take a look at it later.  Pierre Cc'ed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Strbuf documentation: document most functions
  2008-06-02 22:59 [PATCH] Strbuf documentation: document most functions Miklos Vajna
  2008-06-02 23:39 ` Junio C Hamano
@ 2008-06-03  0:00 ` Johannes Schindelin
  2008-06-03  8:42   ` Pierre Habouzit
  2008-06-03 15:41 ` René Scharfe
  2008-06-04  1:15 ` Miklos Vajna
  3 siblings, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2008-06-03  0:00 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Junio C Hamano, Pierre Habouzit, git

Hi,

I am no author of strbuf, but hey, I thought I'd just give you a few 
comments...

In general, I'd rather leave the "->" from the members, since you have 
many instances where you access them with ".".

On Tue, 3 Jun 2008, Miklos Vajna wrote:

> diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
> index a52e4f3..3879e0e 100644
> --- a/Documentation/technical/api-strbuf.txt
> +++ b/Documentation/technical/api-strbuf.txt
> @@ -1,6 +1,175 @@
>  strbuf API
>  ==========
>  
> -Talk about <strbuf.h>
> +strbuf's can be use in many ways: as a byte array, or to store arbitrary
> +long, overflow safe strings.

I think that you should not suggest using strbufs as byte array, even if 
that is certainly possible.  Rather, you should say something like:

	An strbuf is NUL terminated for convenience, but no function in 
	the strbuf API actually relies on the string being free of NULs.

> +strbufs has some invariants that are very important to keep in mind:
> +
> +. The `->buf` member is always malloc-ed, hence strbuf's can be used to
> +  build complex strings/buffers whose final size isn't easily known.

Is this true?  I thought the initial string is empty, but not alloc'ed.

So I'd rather have something like

	The "buf" member is never NULL, so you can safely strcmp() it.

I'd like to see a comment that strbuf's _have_ to be initialized either by 
strbuf_init() or by "= STRBUF_INIT" before the invariants, though.

> +`strbuf_attach`::
> +
> +	Attaches a string to a buffer. You should specify the string to attach,
> +	the current length of the string and the amount of allocated memory.

... This string _must_ be malloc()ed, and after attaching, the pointer 
cannot be relied upon anymore, and neither be free()d directly.

> +`strbuf_read`::
> +
> +	Read the contents of a given file descriptor. The third argument can be
> +	used to give a hint about the file, to avoid reallocs.

s/about the file/& size/

> +`strbuf_read_file`::
> +
> +	Read the contents of a file, specified by its path. The third argument
> +	can be used to give a hint about the file, to avoid reallocs.

Ditto.

> +`strbuf_getline`::
> +
> +	Read a line from a FILE* pointer. The second argument specifies the line
> +	terminator character, like `'\n'`.

s/like/typically/

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Strbuf documentation: document most functions
  2008-06-03  0:00 ` Johannes Schindelin
@ 2008-06-03  8:42   ` Pierre Habouzit
  0 siblings, 0 replies; 8+ messages in thread
From: Pierre Habouzit @ 2008-06-03  8:42 UTC (permalink / raw)
  To: Miklos Vajna, Johannes Schindelin; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 8437 bytes --]

On Mon, Jun 02, 2008 at 10:59:51PM +0000, Miklos Vajna wrote:
> +strbufs has some invariants that are very important to keep in mind:
> +
> +. The `->buf` member is always malloc-ed, hence strbuf's can be used to
> +  build complex strings/buffers whose final size isn't easily known.
> ++
> +It is *not* legal to copy the `->buf` pointer away. `strbuf_detach()` is
> +the operation that detachs a buffer from its shell while keeping the
> +shell valid wrt its invariants.
> +
> +. The `->buf` member is a byte array that has at least `->len + 1` bytes
> +  allocated. The extra byte is used to store a `'\0'`, allowing the
> +  `->buf` member to be a valid C-string. Every strbuf function ensure this
> +  invariant is preserved.
> ++

  I wouldn't write that part this way at all: I have a personal
implementation of things that look like strbufs that are able to use an
`alloca`-ed buffers and deals with overflows with malloc's, possibility
to stream and so on (implemented with skips, meaning that what is the
->buf member is, isn't always the real start of the allocated memory
block).  And I don't think it's a good idea to carve such informations
into stone. So what I'd says in summary is:

  (1) it is totally safe to touch anything in the buffer pointed by the
      buf member between the index 0 and buf->len excluded.

  (1b) what you write later: it's also possible to write after buf->len
       if not after strbuf_avail() _BUT_ then you have when you're done
       the task to reset the (2) invariant yourself, using
       strbuf_setlen().

  (2) ->buf[->len] == '\0' holds _ALL TIME_.

  (3) ->buf is never ever NULL so it can be used in any usual C string
      ops safely.

  (4) do NOT assume anything on what ->buf really is (allocated memory
      or not e.g.), use strbuf_detach to unwrap a memory buffer from its
      strbuf shell in a safe way. That is the sole supported way. This
      will give you a malloced buffer that you can later free().


> +* Life cycle
> +
> +`strbuf_init`::
> +
> +	Initializes the structure. The second parameter can be zero or a bigger
> +	number to allocate memory, in case you want to prevent further reallocs.

  I'd add that it is _MANDATORY_ to initialize strbufs, and that a
static allocation (for global variables e.g.) can be done using
the STRBUF_INIT static initializer.

> +`strbuf_release`::
> +
> +	Releases a string buffer and the memory it used. You should not use the
> +	string buffer after using this function, unless you initialize it again.

  Actually this is wrong because strbuf_release performs a new init
since init allocates 0 memory and that it's idiot-proof. But it could be
changed in the future and it should not be relied upon.

> +`strbuf_detach`::
> +
> +	Detaches the string from the string buffer. The function returns a
> +	pointer to the old string and empties the buffer.

  Not really strbuf_detach unwraps the embedded buffer for sure, but it
doesn't "empties" the buffer, strbuf_detach is like strbuf_release:
after a release, strbuf should be init-ed again (even if for now
strbuf_release does so).


> +`strbuf_attach`::
> +
> +	Attaches a string to a buffer. You should specify the string to attach,
> +	the current length of the string and the amount of allocated memory.

  In addition to what Johannes said: size must be > len. Because the
string you pass is supposed to be a NUL-terminated string.


> +* Related to the size of the buffer
> +
> +`strbuf_avail`::
> +
> +	Determines the amount of allocated but not used memory.
> +
> +`strbuf_grow`::
> +
> +	Allocated extra memory for the buffer.

  I'd put that this way: ensure that at least this amount of available
memory is available. This is used when you know a typical size for what
you will do and want to avoid repetitive automatic resize of the
underlying buffer. This is never a needed operation, but can be critical
for performance in some cases.

> +`strbuf_setlen`::
> +
> +	Sets the length of the buffer to a given value.

  This function does NOT allocate new memory, so you should not perform
a strbuf_setlen to a length that is larger than sb->len + strbuf_avail(sb).
strbuf_setlen is just meant as a "please fix invariants from this strbuf
I just messed with)"

> +`strbuf_add`::
> +
> +	Add data of given length to the buffer.
> +
> +`strbuf_addstr`::
> +
> +	Add a NULL-terminated string to the buffer.

  Please use NUL, '\0' is NUL (as in its ascii name), NULL is (void *)0.
In addition to that, I'd say that strbuf_addstr will ALWAYS be
implemented as an inline or a macro that expands to:
  strbuf_add(..., s, strlen(s))

Meaning that this is efficient to write things like:
  strbuf_addstr(sb, "immediate string").

> +`strbuf_expand`::

  This function is a pretty printer that expands magic formats string
thanks to callbacks, so that it's done in a generic way. It's what is
used to generate git-log e.g. I'm not its author, so I'm not really best
placed to describe it.

> +`strbuf_fread`::
> +
> +	Read a given size of data from a FILE* pointer to the buffer.
> +
> +`strbuf_read`::
> +
> +	Read the contents of a given file descriptor. The third argument can be
> +	used to give a hint about the file, to avoid reallocs.
> +
> +`strbuf_read_file`::
> +
> +	Read the contents of a file, specified by its path. The third argument
> +	can be used to give a hint about the file, to avoid reallocs.
> +
> +`strbuf_getline`::
> +
> +	Read a line from a FILE* pointer. The second argument specifies the line
> +	terminator character, like `'\n'`.
> +

  For all: the buffer is rewinded if the read fails.
  If -1 is returned, errno must be consulted, like you would do for
read(3).


On Tue, Jun 03, 2008 at 12:00:33AM +0000, Johannes Schindelin wrote:
> Hi,
> 
> I am no author of strbuf, but hey, I thought I'd just give you a few 
> comments...
> 
> In general, I'd rather leave the "->" from the members, since you have 
> many instances where you access them with ".".
> 
> On Tue, 3 Jun 2008, Miklos Vajna wrote:
> 
> > diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
> > index a52e4f3..3879e0e 100644
> > --- a/Documentation/technical/api-strbuf.txt
> > +++ b/Documentation/technical/api-strbuf.txt
> > @@ -1,6 +1,175 @@
> >  strbuf API
> >  ==========
> >  
> > -Talk about <strbuf.h>
> > +strbuf's can be use in many ways: as a byte array, or to store arbitrary
> > +long, overflow safe strings.
> 
> I think that you should not suggest using strbufs as byte array, even if 
> that is certainly possible.  Rather, you should say something like:
> 
> 	An strbuf is NUL terminated for convenience, but no function in 
> 	the strbuf API actually relies on the string being free of NULs.

  ACK. I'd add the fact that strbuf are meant to be used with all the
usual C string and memory APIs. Given that the length of the buffer is
known, it's often better to use the mem* functions than a str* one
(memchr vs. strchr e.g.). Though, one has to be careful about the fact
that str* functions often stop on NULs and that strbufs may have
embedded NULs.


> > +strbufs has some invariants that are very important to keep in mind:
> > +
> > +. The `->buf` member is always malloc-ed, hence strbuf's can be used to
> > +  build complex strings/buffers whose final size isn't easily known.
> 
> Is this true?  I thought the initial string is empty, but not alloc'ed.
> 
> So I'd rather have something like
> 
> 	The "buf" member is never NULL, so you can safely strcmp() it.
> 
> I'd like to see a comment that strbuf's _have_ to be initialized either by 
> strbuf_init() or by "= STRBUF_INIT" before the invariants, though.

  Well I'd go even further like I proposed above.

> > +`strbuf_attach`::
> > +
> > +	Attaches a string to a buffer. You should specify the string to attach,
> > +	the current length of the string and the amount of allocated memory.
> 
> .... This string _must_ be malloc()ed, and after attaching, the pointer 
> cannot be relied upon anymore, and neither be free()d directly.

  ACK.


-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Strbuf documentation: document most functions
  2008-06-02 22:59 [PATCH] Strbuf documentation: document most functions Miklos Vajna
  2008-06-02 23:39 ` Junio C Hamano
  2008-06-03  0:00 ` Johannes Schindelin
@ 2008-06-03 15:41 ` René Scharfe
  2008-06-04  1:15 ` Miklos Vajna
  3 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2008-06-03 15:41 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Junio C Hamano, git

Miklos Vajna schrieb:
> Actually this is a bit of request for help, I haven't figured out what
> strbuf_expand() does [...]

It can be used to expand a format string containing placeholders.  To
that end, it parses the string and calls the specified function for
every percent sign found.

The callback function is given a pointer to the character after the '%'
and a pointer to the struct strbuf.  It is expected to add the expanded
version of the placeholder to the strbuf, e.g. to add a newline
character if the letter 'n' appears after a '%'.  The function returns
the length of the placeholder recognized and strbuf_expand skips over it.

All other characters (non-percent and not skipped ones) are copied
verbatim to the strbuf.  If the callback returned zero, meaning that the
placeholder is unknown, then the percent sign is copied, too.

In order to facilitate caching and to make it possible to give
parameters to the callback, strbuf_expand passes a context pointer,
which can be used by the programmer of the callback as she sees fit.

You can see it in action in pretty.c, where it expands --pretty=format:
placeholder strings.  The callback may be a bit heavy for a first
encounter, though. ;-)

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Strbuf documentation: document most functions
  2008-06-02 22:59 [PATCH] Strbuf documentation: document most functions Miklos Vajna
                   ` (2 preceding siblings ...)
  2008-06-03 15:41 ` René Scharfe
@ 2008-06-04  1:15 ` Miklos Vajna
  2008-06-04  7:45   ` Junio C Hamano
  3 siblings, 1 reply; 8+ messages in thread
From: Miklos Vajna @ 2008-06-04  1:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Pierre Habouzit, Ren� Scharfe, git

All functions in strbuf.h are documented, except launch_editor().

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
---

Here is an improved version. Details below.

On Tue, Jun 03, 2008 at 01:00:33AM +0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> In general, I'd rather leave the "->" from the members, since you have
> many instances where you access them with ".".

Ok, removed, except the code sniplet where the strbuf is obviously a
pointer.

> > +strbuf's can be use in many ways: as a byte array, or to store
> > arbitrary
> > +long, overflow safe strings.
>
> I think that you should not suggest using strbufs as byte array, even
> if
> that is certainly possible.  Rather, you should say something like:
>
>       An strbuf is NUL terminated for convenience, but no function in
>       the strbuf API actually relies on the string being free of NULs.

Changed.

> > +. The `->buf` member is always malloc-ed, hence strbuf's can be
> > used to
> > +  build complex strings/buffers whose final size isn't easily
> > known.
>
> Is this true?  I thought the initial string is empty, but not
> alloc'ed.
>
> So I'd rather have something like
>
>       The "buf" member is never NULL, so you can safely strcmp() it.

Right, corrected.

> I'd like to see a comment that strbuf's _have_ to be initialized
> either by
> strbuf_init() or by "= STRBUF_INIT" before the invariants, though.
>
> > +`strbuf_attach`::
> > +
> > +   Attaches a string to a buffer. You should specify the string to
> > attach,
> > +   the current length of the string and the amount of allocated
> > memory.
>
> ... This string _must_ be malloc()ed, and after attaching, the pointer
> cannot be relied upon anymore, and neither be free()d directly.

Added. Also fixed the rest of the typos you pointed out.

On Tue, Jun 03, 2008 at 10:42:40AM +0200, Pierre Habouzit <madcoder@debian.org> wrote:
>   (1) it is totally safe to touch anything in the buffer pointed by
>   the
>       buf member between the index 0 and buf->len excluded.
>
>   (1b) what you write later: it's also possible to write after
>   buf->len
>        if not after strbuf_avail() _BUT_ then you have when you're
>        done
>        the task to reset the (2) invariant yourself, using
>        strbuf_setlen().
>
>   (2) ->buf[->len] == '\0' holds _ALL TIME_.
>
>   (3) ->buf is never ever NULL so it can be used in any usual C string
>       ops safely.
>
>   (4) do NOT assume anything on what ->buf really is (allocated memory
>       or not e.g.), use strbuf_detach to unwrap a memory buffer from
>       its
>       strbuf shell in a safe way. That is the sole supported way. This
>       will give you a malloced buffer that you can later free().

Thanks, I improved the introduction including these, hopefully I haven't
missing anything.

> > +* Life cycle
> > +
> > +`strbuf_init`::
> > +
> > +   Initializes the structure. The second parameter can be zero or a
> > bigger
> > +   number to allocate memory, in case you want to prevent further
> > reallocs.
>
>   I'd add that it is _MANDATORY_ to initialize strbufs, and that a
> static allocation (for global variables e.g.) can be done using
> the STRBUF_INIT static initializer.

Added.

>
> > +`strbuf_release`::
> > +
> > +   Releases a string buffer and the memory it used. You should not
> > use the
> > +   string buffer after using this function, unless you initialize
> > it again.
>
>   Actually this is wrong because strbuf_release performs a new init
> since init allocates 0 memory and that it's idiot-proof. But it could
> be
> changed in the future and it should not be relied upon.

That's why I think 'should' is the good term and not 'must'. I don't
think the reader should be informed of the current implementation.

> > +`strbuf_detach`::
> > +
> > +   Detaches the string from the string buffer. The function returns
> > a
> > +   pointer to the old string and empties the buffer.
>
>   Not really strbuf_detach unwraps the embedded buffer for sure, but
>   it
> doesn't "empties" the buffer, strbuf_detach is like strbuf_release:
> after a release, strbuf should be init-ed again (even if for now
> strbuf_release does so).

Right, thanks for pointing this out.

> > +`strbuf_attach`::
> > +
> > +   Attaches a string to a buffer. You should specify the string to
> > attach,
> > +   the current length of the string and the amount of allocated
> > memory.
>
>   In addition to what Johannes said: size must be > len. Because the
> string you pass is supposed to be a NUL-terminated string.

Added.

> > +`strbuf_grow`::
> > +
> > +   Allocated extra memory for the buffer.
>
>   I'd put that this way: ensure that at least this amount of available
> memory is available. This is used when you know a typical size for
> what
> you will do and want to avoid repetitive automatic resize of the
> underlying buffer. This is never a needed operation, but can be
> critical
> for performance in some cases.

Changed.

> > +`strbuf_setlen`::
> > +
> > +   Sets the length of the buffer to a given value.
>
>   This function does NOT allocate new memory, so you should not
>   perform
> a strbuf_setlen to a length that is larger than sb->len +
> strbuf_avail(sb).
> strbuf_setlen is just meant as a "please fix invariants from this
> strbuf
> I just messed with)"

Added.

> > +`strbuf_add`::
> > +
> > +   Add data of given length to the buffer.
> > +
> > +`strbuf_addstr`::
> > +
> > +   Add a NULL-terminated string to the buffer.
>
>   Please use NUL, '\0' is NUL (as in its ascii name), NULL is (void
>   *)0.
> In addition to that, I'd say that strbuf_addstr will ALWAYS be
> implemented as an inline or a macro that expands to:
>   strbuf_add(..., s, strlen(s))
>
> Meaning that this is efficient to write things like:
>   strbuf_addstr(sb, "immediate string").

Changed and added.

> > +`strbuf_expand`::
>
>   This function is a pretty printer that expands magic formats string
> thanks to callbacks, so that it's done in a generic way. It's what is
> used to generate git-log e.g. I'm not its author, so I'm not really
> best
> placed to describe it.

OK. ;-)

> > +`strbuf_fread`::
> > +
> > +   Read a given size of data from a FILE* pointer to the buffer.
> > +
> > +`strbuf_read`::
> > +
> > +   Read the contents of a given file descriptor. The third argument
> > can be
> > +   used to give a hint about the file, to avoid reallocs.
> > +
> > +`strbuf_read_file`::
> > +
> > +   Read the contents of a file, specified by its path. The third
> > argument
> > +   can be used to give a hint about the file, to avoid reallocs.
> > +
> > +`strbuf_getline`::
> > +
> > +   Read a line from a FILE* pointer. The second argument specifies
> > the line
> > +   terminator character, like `'\n'`.
> > +
>
>   For all: the buffer is rewinded if the read fails.
>   If -1 is returned, errno must be consulted, like you would do for
> read(3).

Added.

> >     An strbuf is NUL terminated for convenience, but no function in
> >     the strbuf API actually relies on the string being free of NULs.
>
>   ACK. I'd add the fact that strbuf are meant to be used with all the
> usual C string and memory APIs. Given that the length of the buffer is
> known, it's often better to use the mem* functions than a str* one
> (memchr vs. strchr e.g.). Though, one has to be careful about the fact
> that str* functions often stop on NULs and that strbufs may have
> embedded NULs.

Added.

On Tue, Jun 03, 2008 at 05:41:38PM +0200, René Scharfe <rene.scharfe@lsrfire.ath.cx> wrote:
> > Actually this is a bit of request for help, I haven't figured out
> > what
> > strbuf_expand() does [...]
>
> It can be used to expand a format string containing placeholders.  To
> that end, it parses the string and calls the specified function for
> every percent sign found.
>
> The callback function is given a pointer to the character after the
> '%'
> and a pointer to the struct strbuf.  It is expected to add the
> expanded
> version of the placeholder to the strbuf, e.g. to add a newline
> character if the letter 'n' appears after a '%'.  The function returns
> the length of the placeholder recognized and strbuf_expand skips over
> it.
>
> All other characters (non-percent and not skipped ones) are copied
> verbatim to the strbuf.  If the callback returned zero, meaning that
> the
> placeholder is unknown, then the percent sign is copied, too.
>
> In order to facilitate caching and to make it possible to give
> parameters to the callback, strbuf_expand passes a context pointer,
> which can be used by the programmer of the callback as she sees fit.

Thanks, added.

 Documentation/technical/api-strbuf.txt |  234 +++++++++++++++++++++++++++++++-
 1 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index a52e4f3..15f66b2 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -1,6 +1,236 @@
 strbuf API
 ==========
 
-Talk about <strbuf.h>
+An strbuf is NUL terminated for convenience, but no function in the
+strbuf API actually relies on the string being free of NULs.
 
-(Pierre, JC)
+strbuf's are meant to be used with all the usual C string and memory
+APIs. Given that the length of the buffer is known, it's often better to
+use the mem* functions than a str* one (memchr vs. strchr e.g.).
+Though, one has to be careful about the fact that str* functions often
+stop on NULs and that strbufs may have embedded NULs.
+
+strbufs has some invariants that are very important to keep in mind:
+
+. The `buf` member is never NULL, so you it can be used in any usual C
+string operations safely. strbuf's _have_ to be initialized either by
+`strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
++
+Do *not* assume anything on what `buf` really is (e.g. if it is
+allocated memory or not), use `strbuf_detach()` to unwrap a memory
+buffer from its strbuf shell in a safe way. That is the sole supported
+way. This will give you a malloced buffer that you can later `free()`.
++
+However, it is totally safe to touch anything in the buffer pointed by
+the `buf` member between the index `0` and `len` excluded.
+
+. The `buf` member is a byte array that has at least `len + 1` bytes
+  allocated. The extra byte is used to store a `'\0'`, allowing the
+  `buf` member to be a valid C-string. Every strbuf function ensure this
+  invariant is preserved.
++
+NOTE: It is OK to "play" with the buffer directly if you work it that
+      way:
++
+----
+strbuf_grow(sb, SOME_SIZE); <1>
+strbuf_setlen(sb, sb->len + SOME_OTHER_SIZE);
+----
+<1> Here, the memory array starting at `buf`, and of length
+`strbuf_avail(sb)` is all yours, and you can be sure that
+`strbuf_avail(sb)` is at least `SOME_SIZE`.
++
+Of course, `SOME_OTHER_SIZE` must be smaller or equal to `strbuf_avail(sb)`.
++
+Doing so is safe, though if it has to be done in many places, adding the
+missing API to the strbuf module is the way to go.
++
+WARNING: Do _not_ assume that the area that is yours is of size `alloc
+- 1` even if it's true in the current implementation. Alloc is somehow a
+"private" member that should not be messed with.
+
+Data structures
+---------------
+
+* `struct strbuf`
+
+This is string buffer structure. The `len` variable can be used to
+determine the current length of the string, and `buf` provides access
+to the string itself.
+
+Functions
+---------
+
+* Life cycle
+
+`strbuf_init`::
+
+	Initializes the structure. The second parameter can be zero or a bigger
+	number to allocate memory, in case you want to prevent further reallocs.
+
+`strbuf_release`::
+
+	Releases a string buffer and the memory it used. You should not use the
+	string buffer after using this function, unless you initialize it again.
+
+`strbuf_detach`::
+
+	Detaches the string from the string buffer. The function returns a
+	pointer to the old string and releases a buffer, so that if you want to
+	use it again, you should initialize it before doing so.
+
+`strbuf_attach`::
+
+	Attaches a string to a buffer. You should specify the string to attach,
+	the current length of the string and the amount of allocated memory.
+	The amount must be larger than the string length, because the string you
+	pass is supposed to be a NUL-terminated string.  This string _must_ be
+	malloc()ed, and after attaching, the pointer cannot be relied upon
+	anymore, and neither be free()d directly.
+
+`strbuf_swap`::
+
+	Swaps the contents of two string buffers.
+
+* Related to the size of the buffer
+
+`strbuf_avail`::
+
+	Determines the amount of allocated but not used memory.
+
+`strbuf_grow`::
+
+	Ensure that at least this amount of available memory is available. This
+	is used when you know a typical size for what you will do and want to
+	avoid repetitive automatic resize of the underlying buffer. This is
+	never a needed operation, but can be critical for performance in some
+	cases.
+
+`strbuf_setlen`::
+
+	Sets the length of the buffer to a given value. This function does *not*
+	allocate new memory, so you should not perform a `strbuf_setlen()` to a
+	length that is larger than `len + strbuf_avail()`. `strbuf_setlen()` is
+	just meant as a 'please fix invariants from this strbuf I just messed
+	with'.
+
+`strbuf_reset`::
+
+	Empties the buffer by setting the size of it to zero.
+
+* Related to the contents of the buffer
+
+`strbuf_rtrim`::
+
+	Strip whitespace from the end of a string.
+
+`strbuf_cmp`::
+
+	Compares two buffers. Returns an integer less than, equal to, or greater
+	than zero if the first buffer is found, respectively, to be less than,
+	to match, or be greater than the second buffer.
+
+* Adding data to the buffer
+
+`strbuf_addch`::
+
+	Adds a single character to the buffer.
+
+`strbuf_insert`::
+
+	Insert data to the given position of the buffer. The remaining contents
+	will be shifted, not overwritten.
+
+`strbuf_remove`::
+
+	Remove given amount of data from a given position of the buffer.
+
+`strbuf_splice`::
+
+	Splice pos..pos+len with given data.
+
+`strbuf_add`::
+
+	Add data of given length to the buffer.
+
+`strbuf_addstr`::
+
+Add a NUL-terminated string to the buffer.
++
+NOTE: This function will *always* be implemented as an inline or a macro
+that expands to:
++
+----
+strbuf_add(..., s, strlen(s));
+----
++
+Meaning that this is efficient to write things like:
++
+----
+strbuf_addstr(sb, "immediate string");
+----
+
+`strbuf_addbuf`::
+
+	Add an other buffer to the current one.
+
+`strbuf_adddup`::
+
+	Copy part of the buffer from a given position till a given length to the
+	end of the buffer.
+
+`strbuf_expand`::
+
+	This function can be used to expand a format string containing
+	placeholders. To that end, it parses the string and calls the specified
+	function for every percent sign found.
++
+The callback function is given a pointer to the character after the `%`
+and a pointer to the struct strbuf.  It is expected to add the expanded
+version of the placeholder to the strbuf, e.g. to add a newline
+character if the letter `n` appears after a `%`.  The function returns
+the length of the placeholder recognized and `strbuf_expand()` skips
+over it.
++
+All other characters (non-percent and not skipped ones) are copied
+verbatim to the strbuf.  If the callback returned zero, meaning that the
+placeholder is unknown, then the percent sign is copied, too.
++
+In order to facilitate caching and to make it possible to give
+parameters to the callback, `strbuf_expand()` passes a context pointer,
+which can be used by the programmer of the callback as she sees fit.
+
+`strbuf_addf`::
+
+	Add a formatted string to the buffer.
+
+`strbuf_fread`::
+
+	Read a given size of data from a FILE* pointer to the buffer.
++
+NOTE: The buffer is rewinded if the read fails. If -1 is returned,
+`errno` must be consulted, like you would do for `read(3)`.
+`strbuf_read()`, `strbuf_read_file()` and `strbuf_getline()` has the
+same behaviour as well.
+
+`strbuf_read`::
+
+	Read the contents of a given file descriptor. The third argument can be
+	used to give a hint about the file size, to avoid reallocs.
+
+`strbuf_read_file`::
+
+	Read the contents of a file, specified by its path. The third argument
+	can be used to give a hint about the file size, to avoid reallocs.
+
+`strbuf_getline`::
+
+	Read a line from a FILE* pointer. The second argument specifies the line
+	terminator character, typically `'\n'`.
+
+`stripspace`::
+
+	Strips whitespace from a buffer. The second parameter controls if
+	comments are considered contents to be removed or not.
+
+`launch_editor`::
-- 
1.5.6.rc0.dirty

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Strbuf documentation: document most functions
  2008-06-04  1:15 ` Miklos Vajna
@ 2008-06-04  7:45   ` Junio C Hamano
  2008-06-04 21:20     ` Miklos Vajna
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2008-06-04  7:45 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Johannes Schindelin, Pierre Habouzit, Ren頓charfe, git

Miklos Vajna <vmiklos@frugalware.org> writes:

> diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
> index a52e4f3..15f66b2 100644
> --- a/Documentation/technical/api-strbuf.txt
> +++ b/Documentation/technical/api-strbuf.txt
> @@ -1,6 +1,236 @@
>  strbuf API
>  ==========
>  
> +An strbuf is NUL terminated for convenience, but no function in the
> +strbuf API actually relies on the string being free of NULs.
>  
> +strbuf's are meant to be used with all the usual C string and memory
> +APIs. Given that the length of the buffer is known, it's often better to
> +use the mem* functions than a str* one (memchr vs. strchr e.g.).
> +Though, one has to be careful about the fact that str* functions often
> +stop on NULs and that strbufs may have embedded NULs.

I think these two paragraphs should be swapped.  The first two lines are
only interesting after you establish what it is and what it is used for
with the second paragraph.

> +However, it is totally safe to touch anything in the buffer pointed by
> +the `buf` member between the index `0` and `len` excluded.

Somehow this "excluded" feels not quite right wording.
Also "touch" is a fuzzy word.  I think.

 - "sb->buf[i] = x" is Ok if and only if "0 <= i < sb->len";
   col. "sb->buf[0] = 'x'" is a no-no if !sb->len.

 - "if (!sb->buf[0])" is Ok even if !sb->len.

> +. The `buf` member is a byte array that has at least `len + 1` bytes
> +  allocated. The extra byte is used to store a `'\0'`, allowing the
> +  `buf` member to be a valid C-string. Every strbuf function ensure this
> +  invariant is preserved.
> ++
> +NOTE: It is OK to "play" with the buffer directly if you work it that
> +      way:

If you are refering to the example you talk next, s/that way/this way/.

> ++
> +----
> +strbuf_grow(sb, SOME_SIZE); <1>
> +strbuf_setlen(sb, sb->len + SOME_OTHER_SIZE);
> +----
> +<1> Here, the memory array starting at `buf`, and of length

Here you should say `sb->buf` not just `buf`, as you are explaining this
specific example that uses a concrete instance of strbuf `sb`.

> +`strbuf_avail(sb)` is all yours, and you can be sure that
> +`strbuf_avail(sb)` is at least `SOME_SIZE`.
> ++
> +Of course, `SOME_OTHER_SIZE` must be smaller or equal to `strbuf_avail(sb)`.

It is not clear why this is "Of course" here in this document, because you
haven't talked about what "setlen" is.  It is "Of course" only because
setlen's sole purpose is to truncate and it cannot be used to extend the
buffer (I won't discuss if that is a good API here).

> +Doing so is safe, though if it has to be done in many places, adding the
> +missing API to the strbuf module is the way to go.
> ++
> +WARNING: Do _not_ assume that the area that is yours is of size `alloc
> +- 1` even if it's true in the current implementation. Alloc is somehow a
> +"private" member that should not be messed with.

Talking about "don't" is important, but without guiding the reader with
"instead of that, do this", the document becomes a frustrating read.

> +Data structures
> +---------------
> +
> +* `struct strbuf`
> +
> +This is string buffer structure. The `len` variable can be used to
> +determine the current length of the string, and `buf` provides access
> +to the string itself.

Call both consistently "member".  I.e. The `len` member can be ....

> +Functions
> +---------
> +
> +* Life cycle
> +
> +`strbuf_init`::
> +
> +	Initializes the structure. The second parameter can be zero or a bigger
> +	number to allocate memory, in case you want to prevent further reallocs.

When you start the sentence without a subject, use imperative mood, not
third person singular, for consistency.  I.e. "Initialize the structure".
This comment applies everywhere (you have mixed style).

> +`strbuf_release`::
> +
> +	Releases a string buffer and the memory it used. You should not use the
> +	string buffer after using this function, unless you initialize it again.
> +
> +`strbuf_detach`::
> +
> +	Detaches the string from the string buffer. The function returns a
> +	pointer to the old string and releases a buffer, so that if you want to
> +	use it again, you should initialize it before doing so.

"Detaches the string from the strbuf and returns it; you now own the
storage the string occupies and it is your responsibility from then on to
release it with free(3) when you are done with it."

The strbuf is reinitialized by detach, so there shouldn't be need for
re-initializing it.

> +`strbuf_avail`::
> +
> +	Determines the amount of allocated but not used memory.

s/not used/unused/;

> +`strbuf_grow`::
> +
> +	Ensure that at least this amount of available memory is available. This

"available memory is available"?  "Ensure that at least this amount of
unused memory is available after `len`", perhaps.

> +	is used when you know a typical size for what you will do and want to

s/you will do/you will add/;

> +	avoid repetitive automatic resize of the underlying buffer. This is

s/resize/resizing/;

> +* Adding data to the buffer

You need a blanket comment to say "All of these functions will grow the
buffer as necessary".

> +`strbuf_splice`::
> +
> +	Splice pos..pos+len with given data.

Define "splice".  "Remove the bytes between pos..pos+len and replace it
with the given data".

> +`strbuf_addbuf`::
> +
> +	Add an other buffer to the current one.

Makes the reader wonder what happens to that other buffer after this
operation.  "Copy the contents of the second buffer at the end of the
first buffer", perhaps?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Strbuf documentation: document most functions
  2008-06-04  7:45   ` Junio C Hamano
@ 2008-06-04 21:20     ` Miklos Vajna
  0 siblings, 0 replies; 8+ messages in thread
From: Miklos Vajna @ 2008-06-04 21:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Pierre Habouzit, Ren� Scharfe, git

All functions in strbuf.h are documented, except launch_editor().

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
---

On Wed, Jun 04, 2008 at 12:45:06AM -0700, Junio C Hamano <gitster@pobox.com> wrote:
> > +An strbuf is NUL terminated for convenience, but no function in the
> > +strbuf API actually relies on the string being free of NULs.
> >
> > +strbuf's are meant to be used with all the usual C string and
> > memory
> > +APIs. Given that the length of the buffer is known, it's often
> > better to
> > +use the mem* functions than a str* one (memchr vs. strchr e.g.).
> > +Though, one has to be careful about the fact that str* functions
> > often
> > +stop on NULs and that strbufs may have embedded NULs.
>
> I think these two paragraphs should be swapped.  The first two lines
> are
> only interesting after you establish what it is and what it is used
> for
> with the second paragraph.

Changed.

>
> > +However, it is totally safe to touch anything in the buffer pointed
> > by
> > +the `buf` member between the index `0` and `len` excluded.
>
> Somehow this "excluded" feels not quite right wording.
> Also "touch" is a fuzzy word.  I think.
>
>  - "sb->buf[i] = x" is Ok if and only if "0 <= i < sb->len";
>    col. "sb->buf[0] = 'x'" is a no-no if !sb->len.
>
>  - "if (!sb->buf[0])" is Ok even if !sb->len.

OK, reworded.

> > +NOTE: It is OK to "play" with the buffer directly if you work it
> > that
> > +      way:
>
> If you are refering to the example you talk next, s/that way/this
> way/.

Changed.

> > +strbuf_grow(sb, SOME_SIZE); <1>
> > +strbuf_setlen(sb, sb->len + SOME_OTHER_SIZE);
> > +----
> > +<1> Here, the memory array starting at `buf`, and of length
>
> Here you should say `sb->buf` not just `buf`, as you are explaining
> this
> specific example that uses a concrete instance of strbuf `sb`.

Ah yes, fixed.

>
> > +`strbuf_avail(sb)` is all yours, and you can be sure that
> > +`strbuf_avail(sb)` is at least `SOME_SIZE`.
> > ++
> > +Of course, `SOME_OTHER_SIZE` must be smaller or equal to
> > `strbuf_avail(sb)`.
>
> It is not clear why this is "Of course" here in this document, because
> you
> haven't talked about what "setlen" is.  It is "Of course" only because
> setlen's sole purpose is to truncate and it cannot be used to extend
> the
> buffer (I won't discuss if that is a good API here).

Ok, changed it to a note.

> > +Doing so is safe, though if it has to be done in many places,
> > adding the
> > +missing API to the strbuf module is the way to go.
> > ++
> > +WARNING: Do _not_ assume that the area that is yours is of size
> > `alloc
> > +- 1` even if it's true in the current implementation. Alloc is
> > somehow a
> > +"private" member that should not be messed with.
>
> Talking about "don't" is important, but without guiding the reader
> with
> "instead of that, do this", the document becomes a frustrating read.

I added a sentence to use strbuf_avail() instead.

> > +This is string buffer structure. The `len` variable can be used to
> > +determine the current length of the string, and `buf` provides
> > access
> > +to the string itself.
>
> Call both consistently "member".  I.e. The `len` member can be ....

Ok, replaced.

> > +`strbuf_init`::
> > +
> > +   Initializes the structure. The second parameter can be zero or a
> > bigger
> > +   number to allocate memory, in case you want to prevent further
> > reallocs.
>
> When you start the sentence without a subject, use imperative mood,
> not
> third person singular, for consistency.  I.e. "Initialize the
> structure".
> This comment applies everywhere (you have mixed style).

Hm yes, fixed.

> > +`strbuf_detach`::
> > +
> > +   Detaches the string from the string buffer. The function returns
> > a
> > +   pointer to the old string and releases a buffer, so that if you
> > want to
> > +   use it again, you should initialize it before doing so.
>
> "Detaches the string from the strbuf and returns it; you now own the
> storage the string occupies and it is your responsibility from then on
> to
> release it with free(3) when you are done with it."
>
> The strbuf is reinitialized by detach, so there shouldn't be need for
> re-initializing it.

OK, changed.

> > +`strbuf_avail`::
> > +
> > +   Determines the amount of allocated but not used memory.
>
> s/not used/unused/;

Changed.

> > +`strbuf_grow`::
> > +
> > +   Ensure that at least this amount of available memory is
> > available. This
>
> "available memory is available"?  "Ensure that at least this amount of
> unused memory is available after `len`", perhaps.

Oh, yes. I wanted to say "this amount of memory is available".

> > +   is used when you know a typical size for what you will do and
> > want to
>
> s/you will do/you will add/;

Fixed.

> > +   avoid repetitive automatic resize of the underlying buffer. This
> > is
>
> s/resize/resizing/;

Replaced.

> > +* Adding data to the buffer
>
> You need a blanket comment to say "All of these functions will grow
> the
> buffer as necessary".

Right, added one.

> > +`strbuf_splice`::
> > +
> > +   Splice pos..pos+len with given data.
>
> Define "splice".  "Remove the bytes between pos..pos+len and replace
> it
> with the given data".

Changed.

>
> > +`strbuf_addbuf`::
> > +
> > +   Add an other buffer to the current one.
>
> Makes the reader wonder what happens to that other buffer after this
> operation.  "Copy the contents of the second buffer at the end of the
> first buffer", perhaps?

Yes, I improved the description to mention this.

 Documentation/technical/api-strbuf.txt |  239 +++++++++++++++++++++++++++++++-
 1 files changed, 237 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index a52e4f3..5e3d205 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -1,6 +1,241 @@
 strbuf API
 ==========
 
-Talk about <strbuf.h>
+strbuf's are meant to be used with all the usual C string and memory
+APIs. Given that the length of the buffer is known, it's often better to
+use the mem* functions than a str* one (memchr vs. strchr e.g.).
+Though, one has to be careful about the fact that str* functions often
+stop on NULs and that strbufs may have embedded NULs.
 
-(Pierre, JC)
+An strbuf is NUL terminated for convenience, but no function in the
+strbuf API actually relies on the string being free of NULs.
+
+strbufs has some invariants that are very important to keep in mind:
+
+. The `buf` member is never NULL, so you it can be used in any usual C
+string operations safely. strbuf's _have_ to be initialized either by
+`strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
++
+Do *not* assume anything on what `buf` really is (e.g. if it is
+allocated memory or not), use `strbuf_detach()` to unwrap a memory
+buffer from its strbuf shell in a safe way. That is the sole supported
+way. This will give you a malloced buffer that you can later `free()`.
++
+However, it it totally safe to modify anything in the string pointed by
+the `buf` member, between (including) the indexes `0` and `len -1`.
+
+. The `buf` member is a byte array that has at least `len + 1` bytes
+  allocated. The extra byte is used to store a `'\0'`, allowing the
+  `buf` member to be a valid C-string. Every strbuf function ensure this
+  invariant is preserved.
++
+NOTE: It is OK to "play" with the buffer directly if you work it this
+      way:
++
+----
+strbuf_grow(sb, SOME_SIZE); <1>
+strbuf_setlen(sb, sb->len + SOME_OTHER_SIZE);
+----
+<1> Here, the memory array starting at `sb->buf`, and of length
+`strbuf_avail(sb)` is all yours, and you can be sure that
+`strbuf_avail(sb)` is at least `SOME_SIZE`.
++
+NOTE: `SOME_OTHER_SIZE` must be smaller or equal to `strbuf_avail(sb)`.
++
+Doing so is safe, though if it has to be done in many places, adding the
+missing API to the strbuf module is the way to go.
++
+WARNING: Do _not_ assume that the area that is yours is of size `alloc
+- 1` even if it's true in the current implementation. Alloc is somehow a
+"private" member that should not be messed with. Use `strbuf_avail()`
+instead.
+
+Data structures
+---------------
+
+* `struct strbuf`
+
+This is string buffer structure. The `len` member can be used to
+determine the current length of the string, and `buf` provides access to
+the string itself.
+
+Functions
+---------
+
+* Life cycle
+
+`strbuf_init`::
+
+	Initialize the structure. The second parameter can be zero or a bigger
+	number to allocate memory, in case you want to prevent further reallocs.
+
+`strbuf_release`::
+
+	Release a string buffer and the memory it used. You should not use the
+	string buffer after using this function, unless you initialize it again.
+
+`strbuf_detach`::
+
+	Detaches the string from the strbuf and returns it; you now own the
+	storage the string occupies and it is your responsibility from then on
+	to release it with `free(3)` when you are done with it.
+
+`strbuf_attach`::
+
+	Attach a string to a buffer. You should specify the string to attach,
+	the current length of the string and the amount of allocated memory.
+	The amount must be larger than the string length, because the string you
+	pass is supposed to be a NUL-terminated string.  This string _must_ be
+	malloc()ed, and after attaching, the pointer cannot be relied upon
+	anymore, and neither be free()d directly.
+
+`strbuf_swap`::
+
+	Swap the contents of two string buffers.
+
+* Related to the size of the buffer
+
+`strbuf_avail`::
+
+	Determine the amount of allocated but unused memory.
+
+`strbuf_grow`::
+
+	Ensure that at least this amount of unused memory is available after
+	`len`. This is used when you know a typical size for what you will add
+	and want to avoid repetitive automatic resizing of the underlying buffer.
+	This is never a needed operation, but can be critical for performance in
+	some cases.
+
+`strbuf_setlen`::
+
+	Set the length of the buffer to a given value. This function does *not*
+	allocate new memory, so you should not perform a `strbuf_setlen()` to a
+	length that is larger than `len + strbuf_avail()`. `strbuf_setlen()` is
+	just meant as a 'please fix invariants from this strbuf I just messed
+	with'.
+
+`strbuf_reset`::
+
+	Empty the buffer by setting the size of it to zero.
+
+* Related to the contents of the buffer
+
+`strbuf_rtrim`::
+
+	Strip whitespace from the end of a string.
+
+`strbuf_cmp`::
+
+	Compare two buffers. Returns an integer less than, equal to, or greater
+	than zero if the first buffer is found, respectively, to be less than,
+	to match, or be greater than the second buffer.
+
+* Adding data to the buffer
+
+NOTE: All of these functions in this section will grow the buffer as
+      necessary.
+
+`strbuf_addch`::
+
+	Add a single character to the buffer.
+
+`strbuf_insert`::
+
+	Insert data to the given position of the buffer. The remaining contents
+	will be shifted, not overwritten.
+
+`strbuf_remove`::
+
+	Remove given amount of data from a given position of the buffer.
+
+`strbuf_splice`::
+
+	Remove the bytes between pos..pos+len and replace it with the given
+	data.
+
+`strbuf_add`::
+
+	Add data of given length to the buffer.
+
+`strbuf_addstr`::
+
+Add a NUL-terminated string to the buffer.
++
+NOTE: This function will *always* be implemented as an inline or a macro
+that expands to:
++
+----
+strbuf_add(..., s, strlen(s));
+----
++
+Meaning that this is efficient to write things like:
++
+----
+strbuf_addstr(sb, "immediate string");
+----
+
+`strbuf_addbuf`::
+
+	Copy the contents of an other buffer at the end of the current one.
+
+`strbuf_adddup`::
+
+	Copy part of the buffer from a given position till a given length to the
+	end of the buffer.
+
+`strbuf_expand`::
+
+	This function can be used to expand a format string containing
+	placeholders. To that end, it parses the string and calls the specified
+	function for every percent sign found.
++
+The callback function is given a pointer to the character after the `%`
+and a pointer to the struct strbuf.  It is expected to add the expanded
+version of the placeholder to the strbuf, e.g. to add a newline
+character if the letter `n` appears after a `%`.  The function returns
+the length of the placeholder recognized and `strbuf_expand()` skips
+over it.
++
+All other characters (non-percent and not skipped ones) are copied
+verbatim to the strbuf.  If the callback returned zero, meaning that the
+placeholder is unknown, then the percent sign is copied, too.
++
+In order to facilitate caching and to make it possible to give
+parameters to the callback, `strbuf_expand()` passes a context pointer,
+which can be used by the programmer of the callback as she sees fit.
+
+`strbuf_addf`::
+
+	Add a formatted string to the buffer.
+
+`strbuf_fread`::
+
+	Read a given size of data from a FILE* pointer to the buffer.
++
+NOTE: The buffer is rewinded if the read fails. If -1 is returned,
+`errno` must be consulted, like you would do for `read(3)`.
+`strbuf_read()`, `strbuf_read_file()` and `strbuf_getline()` has the
+same behaviour as well.
+
+`strbuf_read`::
+
+	Read the contents of a given file descriptor. The third argument can be
+	used to give a hint about the file size, to avoid reallocs.
+
+`strbuf_read_file`::
+
+	Read the contents of a file, specified by its path. The third argument
+	can be used to give a hint about the file size, to avoid reallocs.
+
+`strbuf_getline`::
+
+	Read a line from a FILE* pointer. The second argument specifies the line
+	terminator character, typically `'\n'`.
+
+`stripspace`::
+
+	Strip whitespace from a buffer. The second parameter controls if
+	comments are considered contents to be removed or not.
+
+`launch_editor`::
-- 
1.5.6.rc0.dirty

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-04 21:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-02 22:59 [PATCH] Strbuf documentation: document most functions Miklos Vajna
2008-06-02 23:39 ` Junio C Hamano
2008-06-03  0:00 ` Johannes Schindelin
2008-06-03  8:42   ` Pierre Habouzit
2008-06-03 15:41 ` René Scharfe
2008-06-04  1:15 ` Miklos Vajna
2008-06-04  7:45   ` Junio C Hamano
2008-06-04 21:20     ` Miklos Vajna

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).