From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
Date: Sat, 19 Jul 2014 14:11:30 +0200 [thread overview]
Message-ID: <53CA6072.8000009@web.de> (raw)
In-Reply-To: <20140718235706.GA11192@peff.net>
Am 19.07.2014 01:57, schrieb Jeff King:
> On Fri, Jul 18, 2014 at 09:14:05PM +0200, René Scharfe wrote:
>
>> If inlining is really better is another matter; I don't understand how
>> 1a812f3a (hashcmp(): inline memcmp() by hand to optimize) could have made
>> git gc 18% faster, as it claimed. I would expect memcmp(), which can
>> compare more than a byte at a time, to be significantly faster -- or at
>> least just as fast as whatever the compiler does with the inlined version.
>
> I looked into this a while ago[1]. I think with glibc 2.13 and up, the
> memcmp is a win. We should consider switching back if that is what is
> common now.
>
> -Peff
>
> [1] http://article.gmane.org/gmane.comp.version-control.git/218396
On Linux this seems to improve rev-list performance by ca. 10% for me, but
on on FreeBSD it slows it down by ca. 25%. That's less surprising after
looking at their memcmp() implementation:
http://svnweb.freebsd.org/base/release/10.0.0/lib/libc/string/memcmp.c?revision=260789&view=co
It's basically our one-byte-at-a-time inlined version, sans inlining.
I'd say if a platform doesn't bother optimizing memcmp() then they
deserve the resulting performance. And it's probably not too bad a
penalty because such comparisons probably won't make up a significant
part of most applications.
But perhaps we can do better. First, here are the numbers I got for
"/usr/bin/time git rev-list --all --objects v2.0.0 >/dev/null" (best
of five runs):
Debian testing amd64 with gcc 4.9.0 and libc 2.19 on Hyper-V on Intel i5-4670:
master (398dd4bd)
2.14user 0.02system 0:02.16elapsed 99%CPU (0avgtext+0avgdata 71004maxresident)k
0inputs+0outputs (0major+21985minor)pagefaults 0swaps
using memcmp
1.92user 0.02system 0:01.95elapsed 99%CPU (0avgtext+0avgdata 71032maxresident)k
0inputs+0outputs (0major+21987minor)pagefaults 0swaps
four bytes at a time
1.93user 0.03system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 71240maxresident)k
0inputs+0outputs (0major+22024minor)pagefaults 0swaps
Debian testing amd64 with clang 3.3 and libc 2.19 on Hyper-V on Intel i5-4670:
master (398dd4bd)
2.12user 0.04system 0:02.17elapsed 99%CPU (0avgtext+0avgdata 70976maxresident)k
0inputs+0outputs (0major+21995minor)pagefaults 0swaps
memcmp
1.94user 0.01system 0:01.95elapsed 100%CPU (0avgtext+0avgdata 71012maxresident)k
0inputs+0outputs (0major+21952minor)pagefaults 0swaps
four bytes at a time
1.86user 0.01system 0:01.88elapsed 99%CPU (0avgtext+0avgdata 70788maxresident)k
0inputs+0outputs (0major+21909minor)pagefaults 0swaps
FreeBSD 10 amd64 with clang 3.3 on Hyper-V on Intel i5-4670c:
master (398dd4bd)
1.83 real 1.80 user 0.03 sys
using memcmp
2.32 real 2.32 user 0.00 sys
four bytes at a time
1.56 real 1.53 user 0.03 sys
Performance measurements in virtual machines are not very accurate, so
take them with a bag of salt, but the numbers were relatively stable
accross runs.
How come rev-list runs so much faster on FreeBSD? Would it make sense
to apply the four-bytes-at-a-time patch? How would it on other
platforms, especially ARM? Can we do even better? And more
importantly: Does it matter in the first place?
The memcmp version just replaces the body of hashcmp() with
"return memcmp(sha1, sha2, 20);". Here's the four-bytes-at-a-time
patch:
---
cache.h | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/cache.h b/cache.h
index 92fc9f1..c6cd28e 100644
--- a/cache.h
+++ b/cache.h
@@ -732,13 +732,18 @@ extern const unsigned char null_sha1[20];
static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
{
+ const uint32_t *p1 = (const uint32_t *)sha1;
+ const uint32_t *p2 = (const uint32_t *)sha2;
int i;
- for (i = 0; i < 20; i++, sha1++, sha2++) {
- if (*sha1 != *sha2)
- return *sha1 - *sha2;
+ for (i = 0; i < 20 / sizeof(uint32_t); i++, p1++, p2++) {
+ uint32_t v1 = htonl(*p1);
+ uint32_t v2 = htonl(*p2);
+ if (v1 < v2)
+ return -1;
+ if (v1 > v2)
+ return 1;
}
-
return 0;
}
--
2.0.2
next prev parent reply other threads:[~2014-07-19 12:12 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-18 16:00 [PATCH] fast-import: use hashcmp() for SHA1 hash comparison René Scharfe
2014-07-18 18:42 ` Jonathan Nieder
2014-07-18 19:14 ` René Scharfe
2014-07-18 23:57 ` Jeff King
2014-07-19 12:11 ` René Scharfe [this message]
2014-07-19 16:43 ` brian m. carlson
2014-07-19 17:53 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53CA6072.8000009@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).