Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory

All of lore.kernel.org
 help / color / mirror / Atom feed

From: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, antoine.queru@ensimag.grenoble-inp.fr,
	francois.beutin@ensimag.grenoble-inp.fr, mhagger@alum.mit.edu,
	Johannes.Schindelin@gmx.de, peff@peff.net, mh@glandium.org
Subject: Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory
Date: Mon, 6 Jun 2016 22:39:01 +0200	[thread overview]
Message-ID: <20160606203901.GA7667@Messiaen> (raw)
In-Reply-To: <xmqqvb1mxmk4.fsf@gitster.mtv.corp.google.com>

On Mon, Jun 06, 2016 at 10:19:07AM -0700, Junio C Hamano wrote:
> William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes:
> 
>> +#define MAX_ALLOC(a, b) (((a)>(b))?(a):(b))
> 
> I do not see why this macro is called MAX_ALLOC(); is there anything
> "alloc" specific to what this does?  You may happen to use it only
> for "alloc" related things, but that is not a good reason for such a
> name (if the implementation can only be used for "alloc" specific
> purpose for some reason, then the name is appropriate).

Absolutely right. This macro won't be needed anyway, as you pointed out
in the end of this review
 
>> @@ -20,16 +23,47 @@ char strbuf_slopbuf[1];
>>  
>>  void strbuf_init(struct strbuf *sb, size_t hint)
>>  {
>> +	sb->owns_memory = 0;
>> +	sb->fixed_memory = 0;
>>  	sb->alloc = sb->len = 0;
>>  	sb->buf = strbuf_slopbuf;
>>  	if (hint)
>>  		strbuf_grow(sb, hint);
>>  }
> 
> I'll complain about _grow() again later, but if the implementation
> is updated to address that complaint, slopbuf-using strbuf could
> start it as a special case of "starts as fixed_memory".
> 
>> +static void strbuf_wrap_internal(struct strbuf *sb, char *buf,
>> +				 size_t buf_len, size_t alloc_len)
>> +{
>> +	if (!buf)
>> +		die("the buffer of a strbuf cannot be NULL");
>> +
>> +	strbuf_release(sb);
>> +	sb->len = buf_len;
>> +	sb->alloc = alloc_len;
>> +	sb->buf = buf;
>> +}
>> +
>> +void strbuf_wrap(struct strbuf *sb, char *buf,
>> +		 size_t buf_len, size_t alloc_len)
>> +{
>> +	strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
>> +	sb->owns_memory = 0;
>> +	sb->fixed_memory = 0;
>> +}
>>
>> +void strbuf_wrap_fixed(struct strbuf *sb, char *buf,
>> +		       size_t buf_len, size_t alloc_len)
>> +{
>> +	strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
>> +	sb->owns_memory = 0;
>> +	sb->fixed_memory = 1;
>> +}
> 
> After seeing 2/3, I would say that the pretty side deserves the use
> of word "wrap" more.  What the above two does is an enhancement of
> what strbuf API has called "attach".  You are introducing a few new
> and different ways to attach a piece of memory to a strbuf.

The naming is still debatable indeed. If there is a consensus on using
"attach", fine with me.
That being said, I'm still hesitant on being able to initialize a strbuf
with an "attach" function, as the user shouldn't have to use
`strbuf_init()` or `= STRBUF_INIT` beforhand. The users can't use
`strbuf_attach()` with a non-initialized strbuf, is that a good idea to
allow them to use `strbuf_attach_preallocated()` on a non-initialized
buffer?

>> @@ -37,8 +71,13 @@ void strbuf_release(struct strbuf *sb)
>>  char *strbuf_detach(struct strbuf *sb, size_t *sz)
>>  {
>>  	char *res;
>> +
>>  	strbuf_grow(sb, 0);
>> -	res = sb->buf;
>> +	if (sb->owns_memory)
>> +		res = sb->buf;
>> +	else
>> +		res = xmemdupz(sb->buf, sb->len);
>> +
>>  	if (sz)
>>  		*sz = sb->len;
>>  	strbuf_init(sb, 0);
> 
> Note that the user of the API does not have to care if the memory is
> held by the strbuf or if the strbuf is merely borrowing it from the
> initial allocation made from elsewhere (e.g. via strbuf_wrap_*) when
> finalizing its use of the strbuf API to build the contents and
> utilize the resulting buffer.  Theoretically, if the caller knows it
> let the piece memory _it_ owns (i.e. !owns_memory) and the piece of
> memory it originally gave the strbuf hasn't been moved (i.e. your
> fixed_memory, which I will complain again soon), the caller
> shouldn't have to get a copy via xmemdupz(), which would both save
> an allocation here and free() in the caller once it is done.
> 
> But that is not done, and as the API it is a good thing.  It frees
> the caller from having to worry about _where_ the memory originally
> came from.
> 
>>  void strbuf_grow(struct strbuf *sb, size_t extra)
>>  {
>> -	int new_buf = !sb->alloc;
>>  	if (unsigned_add_overflows(extra, 1) ||
>>  	    unsigned_add_overflows(sb->len, extra + 1))
>>  		die("you want to use way too much memory");
>> -	if (new_buf)
>> -		sb->buf = NULL;
>> -	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>> -	if (new_buf)
>> -		sb->buf[0] = '\0';
>> +	if ((sb->fixed_memory) &&
>> +	    sb->len + extra + 1 > sb->alloc)
>> +		die("you try to overflow the buffer of a fixed strbuf");
> 
> Having praised how _detach() frees the user of the API from having
> to worry about minute details of allocation, this is an eyesore.  I
> really think this "fixed_memory" support MUST GO.  This "die()" here
> makes the API useless _unless_ the caller always makes sure that any
> operation it performs does not overflow the originally allocated
> size, which makes it no better than using xsnprintf() and friends.

I'm not a fan of this die() either.

> A better design would be to get rid of fixed_memory bit and replace
> it with "owns_memory" bit.  Its meaning would be "The ->buf pointer
> points at something that is not owned by this strbuf, and must not
> be freed when resizing or releasing ->buf."
> 
> And when you detect that a strbuf with unfreeable memory needs to be
> extended (the above condition you call "die()" on), you use the
> "strbuf doesn't own the buffer" logic below.
> 
> The only difference between your version and an improved one that
> gets rid of "fixed_memory" from the API user's point of view is that
> they must pre-allocate absolute maximum amount of memory expected to
> be consumed with your version in order not to crash, while a "start
> with fixed but dynamically allocate from heap if needed" approach
> lets the API user preallocate just sufficient amount of memory
> expected to be consumed in a typical workload and go without calling
> malloc()/free() most of the time.  Oh, and when a rare data comes in
> that exceeds your initial estimate, "fixed_memory" version will just
> die.
> 
> So, as I said in the previous round, I really do not see a point in
> this fixed_memory business.  It does not make the implementation of
> the API any easier as far as I can tell, either.

I'm not sure to follow you. I agree that the "fixed strbuf" feature is
flawed by the presence of this `die()`. But (unless misunderstanding)
the "owns_memory" bit you talk about does exist in this patch, and allow
the exact behavior you describe.
The patch allow to use a preallocated strbuf OR a preallocated-and-fixed
strbuf. Does the "fixed" part is useful, that's very debatable.

>> +	/*
>> +	 * ALLOC_GROW may do a realloc() if needed, so we must not use it on
>> +	 * a buffer the strbuf doesn't own
>> +	 */
>> +	if (sb->owns_memory) {
>> +		ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>> +	} else {
>> +		/*
>> +		 * The strbuf doesn't own the buffer: to avoid to realloc it,
>> +		 * the strbuf needs to use a new buffer without freeing the old
>> +		 */
>> +		if (sb->len + extra + 1 > sb->alloc) {
>> +			size_t new_alloc =
>> +				MAX_ALLOC(sb->len + extra + 1,
>> +					  alloc_nr(sb->alloc));
> 
> It is true that the you cannot simply realloc(->buf), but otherwise
> the growth pattern of the buffer is the same between ->owns_memory
> case and this case.  And I think the above computes it like so, but
> do you really need a one-shot macro MAX_ALLOC that is named in a
> misleading way?
> 
> I'd find it far easier to read if you wrote the core of ALLOC_GROW()
> logic, like this:
> 
> 	size_t new_nr = sb->len + extra + 1;
> 	if (new_nr > sb->alloc) {
> 		size_t new_alloc;
> 		if (alloc_nr(sb->alloc) < new_nr)
>                 	new_alloc = new_nr;
> 		else
> 			new_alloc = alloc_nr(sb->alloc);
> 		... copy and make it "owned" ...
> 	}
> 
> A better way would be to introduce a new macro ALLOC_GROW_COUNT() to
> cache.h immediately before ALLOC_GROW() is defined and use it to
> redo ALLOC_GROW(), perhaps like this:
> 
> #define ALLOC_GROW_COUNT(nr, alloc) \
>         ((alloc_nr(alloc) < (nr)) ? (nr) : alloc_nr(alloc))
> 
> #define ALLOC_GROW(x, nr, alloc) \
> 	do { \
>                 if ((nr) > alloc) { \
>                 	alloc = ALLOC_GROW_COUNT(nr, alloc); \
> 			REALLOC_ARRAY(x, alloc); \
> 		} \
> 	} while (0)			
> 
> Then you can do
> 
> 	if (sb->len + extra + 1 > sb->alloc) {
> 		size_t new_alloc;
>                 char *new_buf;
> 
>         	new_alloc = ALLOC_GROW_COUNT(sb->len + extra + 1, sb->alloc);
> 		new_buf = xmalloc(new_alloc);
> 		memcpy(...);
> 		...

The second way seems better to me too, I don't know why I were hesitant
to edit the ALLOC_GROW macro

next prev parent reply	other threads:[~2016-06-06 20:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 15:13 [PATCH V2 0/3] strbuf: improve API William Duclot
2016-06-06 15:13 ` [PATCH V2 1/3] strbuf: add tests William Duclot
2016-06-06 16:11   ` Matthieu Moy
2016-06-07  8:44   ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 2/3] pretty.c: rename strbuf_wrap() function William Duclot
2016-06-06 16:12   ` Matthieu Moy
2016-06-07  9:04   ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 3/3] strbuf: allow to use preallocated memory William Duclot
2016-06-06 16:17   ` Matthieu Moy
2016-06-06 17:19   ` Junio C Hamano
2016-06-06 20:39     ` William Duclot [this message]
2016-06-06 22:44       ` Junio C Hamano
2016-06-06 22:58         ` Jeff King
2016-06-06 23:24           ` Junio C Hamano
2016-06-06 23:25             ` Junio C Hamano
2016-06-06 23:30             ` Jeff King
2016-06-07  9:06             ` William Duclot
2016-06-07 18:10               ` Junio C Hamano
2016-06-08 16:20               ` Michael Haggerty
2016-06-08 18:07                 ` Junio C Hamano
2016-06-08 19:19                 ` Jeff King
2016-06-08 19:42                   ` [PATCH] send-pack: use buffered I/O to talk to pack-objects Jeff King
2016-06-09 12:10                     ` Matthieu Moy
2016-06-09 14:34                       ` Ramsay Jones
2016-06-09 17:12                         ` Jeff King
2016-06-09 22:40                           ` Ramsay Jones
2016-06-09 16:40                       ` Junio C Hamano
2016-06-09 17:14                         ` Jeff King
2016-06-09 17:22                         ` Matthieu Moy
2016-06-08 19:48                   ` [PATCH V2 3/3] strbuf: allow to use preallocated memory Junio C Hamano
2016-06-08 19:52                     ` Jeff King
2016-06-08 23:05                       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160606203901.GA7667@Messiaen \
    --to=william.duclot@ensimag.grenoble-inp.fr \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=antoine.queru@ensimag.grenoble-inp.fr \
    --cc=francois.beutin@ensimag.grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mh@glandium.org \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.