From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, git@jeffhostetler.com
Subject: Re: [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf()
Date: Fri, 1 Dec 2017 14:50:05 -0500 [thread overview]
Message-ID: <54da89a5-bf20-cb20-9884-6285033f0d95@gmail.com> (raw)
In-Reply-To: <20171201182238.GA27688@sigill.intra.peff.net>
On 12/1/2017 1:22 PM, Jeff King wrote:
> On Fri, Dec 01, 2017 at 12:49:56PM -0500, Derrick Stolee wrote:
[snip]
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 8ae6cb6285..2160323c4a 100644
> This overall looks good, but I noticed one bug and a few cosmetic
> improvements.
Thanks for finding quality improvements to this patch. I'll let it sit
over the weekend for more feedback before sending a v2.
>
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1914,17 +1914,18 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
>> }
>>
>> oid.hash[0] = subdir_nr;
>> + strbuf_add(path, "/", 1);
> Maybe:
>
> strbuf_addch(path, '/');
>
> would be a bit more readable (it's also a tiny bit faster, but this
> isn't part of the tight loop).
>
>> + baselen = path->len;
> We set this here so that the '/' is included as part of the base. Makes
> sense, but can we now drop the earlier setting of baselen before the
> opendir() call?
Yeah, probably. I had briefly considered just adding the '/' before the
first assignment of "baselen", but didn't want to change the error
output. I also don't know if there are side effects for other platforms
by calling opendir() with a '/'-terminated path.
>> while ((de = readdir(dir))) {
>> if (is_dot_or_dotdot(de->d_name))
>> continue;
>>
>> - strbuf_setlen(path, baselen);
>> - strbuf_addf(path, "/%s", de->d_name);
>> -
>> if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 &&
>> !hex_to_bytes(oid.hash + 1, de->d_name,
>> GIT_SHA1_RAWSZ - 1)) {
>> + strbuf_setlen(path, baselen);
>> + strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2);
> Moving this code into the conditional makes sense here, since that's
> where we know the actual size. But what about the case when we _don't_
> hit this conditional. We still do:
>
> if (cruft_cb)
> cruft_cb(path->buf);
>
> which is now broken (it will just get the directory name without the
> entry appended).
Good catch! A big reason to pull it inside and use strbuf_add over
strbuf_addstr is to avoid a duplicate strlen() calculation. However, I
can store the length before the conditional.
> I think the optimized versions probably needs to be something more like:
>
> while (de = readdir(...)) {
> strbuf_setlen(path, baselen);
> if (size is HEXSZ - 2) {
> strbuf_add(path, de->d_name, HEXSZ - 2);
> obj_cb(path->buf);
> } else if (cruft_cb) {
> strbuf_addstr(path, de->d_name);
> cruft_cb(path->buf);
> }
> }
Small change by storing the length in advance of the conditional:
while (de = readdir(...)) {
int namelen = strlen(de->d_name);
strbuf_setlen(path, baselen);
strbuf_add(path, de->d_name, namelen);
if (namelen == HEXSZ - 2)
obj_cb(path->buf)
else
cruft_cb(path->buf);
}
> Two other thoughts:
>
> - from past discussions, I suspect most of your performance
> improvement actually comes from avoiding addf(), and the difference
> between addstr() and add() may be negligible here. It might be worth
> timing strbuf_addstr(). If that's similarly fast, that means we
> could keep the logic out of the conditional entirely.
addstr() duplicates the strlen(), which isn't much but we might as well
save cycles where we can if it isn't too hard.
> - there's an extra micro-optimization there, which is that if there's
> no obj_cb, we have no need to assemble the full path at all. I doubt
> it makes much of a difference, as most callers would pass an object
> callback (I'd be surprised if we even have one that doesn't).
After doing a few 'git grep' commands, I found several that include a
NULL cruft_cb but none that have a NULL obj_cb.
Thanks,
-Stolee
next prev parent reply other threads:[~2017-12-01 19:50 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-01 17:49 [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf() Derrick Stolee
2017-12-01 18:22 ` Jeff King
2017-12-01 19:50 ` Derrick Stolee [this message]
2017-12-01 22:39 ` Jeff King
2017-12-04 14:06 ` [PATCH v2] " Derrick Stolee
2017-12-04 16:56 ` Jeff King
2017-12-04 17:12 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54da89a5-bf20-cb20-9884-6285033f0d95@gmail.com \
--to=stolee@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).