Re: [PATCH] strbuf.c: optimize program logic

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "阿德烈 via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org, 阿德烈 <adlternative@gmail.com>
Subject: Re: [PATCH] strbuf.c: optimize program logic
Date: Tue, 26 Jan 2021 13:23:40 -0500	[thread overview]
Message-ID: <YBBeLIhd+VHS25CE@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqy2gg2pdm.fsf@gitster.c.googlers.com>

On Mon, Jan 25, 2021 at 10:17:41PM -0800, Junio C Hamano wrote:

> "阿德烈 via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > the usage in strbuf.h tell us"Alloc is somehow a
> > "private" member that should not be messed with.
> > use `strbuf_avail()`instead."
> 
> When we use the word "private", it generally means it is private to
> the implementation of the API.  IOW, it is usually fine for the
> implementation of the API (i.e. for strbuf API, what you see in
> strbuf.c) to use private members.
> 
> In any case, these changes are _not_ optimizations.  

Yeah, I had both of those thoughts, too. :)

Though...

> Replacing (alloc - len - 1) with strbuf_avail() is at best an
> equivalent rewrite (which is a good thing from readability's point
> of view, but not an optimization).  We know sb->alloc during the
> loop is never 0, but the compiler may miss the fact, so the inlined
> implementation of _avail, i.e.
> 
> 	static inline size_t strbuf_avail(const struct strbuf *sb)
> 	{
> 	        return sb->alloc ? sb->alloc - sb->len - 1 : 0;
>         }
> 
> may not incur call overhead, but may be pessimizing the executed
> code.
> 
> If you compare the code in the loop in the second hunk below with
> what _setlen() does, I think you'll see the overhead of _setlen()
> relative to the original code is even higher, so it may also be
> pessimizing, not optimizing.
> 
> So, overall, I am not all that enthused to see this patch.

I would generally value readability/consistency here over trying to
micro-optimize an if-zero check.

However, if strbuf_avail() ever did return 0, I'm not sure the loop
would make forward progress:

          strbuf_grow(sb, hint ? hint : 8192);
          for (;;) {
                  ssize_t want = strbuf_avail(sb);
                  ssize_t got = read_in_full(fd, sb->buf + sb->len, want);
  
                  if (got < 0) {
                          if (oldalloc == 0)
                                  strbuf_release(sb);
                          else
                                  strbuf_setlen(sb, oldlen);
                          return -1;
                  }
                  strbuf_setlen(sb, sb->len + got);
                  if (got < want)
                          break;
                  strbuf_grow(sb, 8192);
          }

we'd just ask to read 0 bytes over and over. That almost makes me want
to add:

  if (!want)
	BUG("strbuf did not actually grow!?");

or possibly to teach the "if (got < want)" condition to check for a zero
return (though I guess that would probably just end up confusing us into
thinking we hit EOF).

> One thing I noticed is that, whether open coded like sb->len += got
> or made into parameter to strbuf_setlen(sb, sb->len + got), we are
> not careful about sb->len growing too large and overflowing with the
> addition.  That may potentially be an interesting thing to look
> into, but at the same time, unlike the usual "compute the number of
> bytes we need to allocate and then call xmalloc()" pattern, where we
> try to be careful in the "compute" step by using st_add() macros,
> this code actually keep growing the buffer, so by the time the size_t
> overflows and wraps around, we'd certainly have exhausted the memory
> already, so it won't be an issue.

I think "len" is OK here. An invariant of strbuf is that "len" is
smaller than "alloc" for obvious reasons. So as long as the actual
strbuf_grow() is safe, then extending "len".

I'm not sure that strbuf_grow() is safe, though. It relies on
ALLOC_GROW, which does not use st_add(), etc.

-Peff

PS The original patch does not seem to have made it to the list for some
   reason (I didn't get a copy, and neither did lore.kernel.org).

next prev parent reply	other threads:[~2021-01-26 22:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26  5:06 [PATCH] strbuf.c: optimize program logic 阿德烈 via GitGitGadget
2021-01-26  6:17 ` Junio C Hamano
2021-01-26 10:44   ` 胡哲宁
2021-01-26 18:47     ` Junio C Hamano
2021-01-26 18:23   ` Jeff King [this message]
2021-01-26 20:15     ` Junio C Hamano
2021-01-29  6:09     ` 胡哲宁
2021-01-30  8:50       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YBBeLIhd+VHS25CE@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.