git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
Date: Sat, 19 Jul 2014 14:11:30 +0200	[thread overview]
Message-ID: <53CA6072.8000009@web.de> (raw)
In-Reply-To: <20140718235706.GA11192@peff.net>

Am 19.07.2014 01:57, schrieb Jeff King:
> On Fri, Jul 18, 2014 at 09:14:05PM +0200, René Scharfe wrote:
> 
>> If inlining is really better is another matter; I don't understand how
>> 1a812f3a (hashcmp(): inline memcmp() by hand to optimize) could have made
>> git gc 18% faster, as it claimed.  I would expect memcmp(), which can
>> compare more than a byte at a time, to be significantly faster -- or at
>> least just as fast as whatever the compiler does with the inlined version.
> 
> I looked into this a while ago[1]. I think with glibc 2.13 and up, the
> memcmp is a win. We should consider switching back if that is what is
> common now.
> 
> -Peff
> 
> [1] http://article.gmane.org/gmane.comp.version-control.git/218396

On Linux this seems to improve rev-list performance by ca. 10% for me, but
on on FreeBSD it slows it down by ca. 25%.  That's less surprising after
looking at their memcmp() implementation:

http://svnweb.freebsd.org/base/release/10.0.0/lib/libc/string/memcmp.c?revision=260789&view=co

It's basically our one-byte-at-a-time inlined version, sans inlining.

I'd say if a platform doesn't bother optimizing memcmp() then they
deserve the resulting performance.  And it's probably not too bad a
penalty because such comparisons probably won't make up a significant
part of most applications.

But perhaps we can do better.  First, here are the numbers I got for
"/usr/bin/time git rev-list --all --objects v2.0.0 >/dev/null" (best
of five runs):

Debian testing amd64 with gcc 4.9.0 and libc 2.19 on Hyper-V on Intel i5-4670:

  master (398dd4bd)
  2.14user 0.02system 0:02.16elapsed 99%CPU (0avgtext+0avgdata 71004maxresident)k
  0inputs+0outputs (0major+21985minor)pagefaults 0swaps

  using memcmp
  1.92user 0.02system 0:01.95elapsed 99%CPU (0avgtext+0avgdata 71032maxresident)k
  0inputs+0outputs (0major+21987minor)pagefaults 0swaps

  four bytes at a time
  1.93user 0.03system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 71240maxresident)k
  0inputs+0outputs (0major+22024minor)pagefaults 0swaps

Debian testing amd64 with clang 3.3 and libc 2.19 on Hyper-V on Intel i5-4670:

  master (398dd4bd)
  2.12user 0.04system 0:02.17elapsed 99%CPU (0avgtext+0avgdata 70976maxresident)k
  0inputs+0outputs (0major+21995minor)pagefaults 0swaps

  memcmp
  1.94user 0.01system 0:01.95elapsed 100%CPU (0avgtext+0avgdata 71012maxresident)k
  0inputs+0outputs (0major+21952minor)pagefaults 0swaps

  four bytes at a time
  1.86user 0.01system 0:01.88elapsed 99%CPU (0avgtext+0avgdata 70788maxresident)k
  0inputs+0outputs (0major+21909minor)pagefaults 0swaps

FreeBSD 10 amd64 with clang 3.3 on Hyper-V on Intel i5-4670c:

  master (398dd4bd)
        1.83 real         1.80 user         0.03 sys

  using memcmp
        2.32 real         2.32 user         0.00 sys

  four bytes at a time
        1.56 real         1.53 user         0.03 sys

Performance measurements in virtual machines are not very accurate, so
take them with a bag of salt, but the numbers were relatively stable
accross runs.

How come rev-list runs so much faster on FreeBSD?  Would it make sense
to apply the four-bytes-at-a-time patch?  How would it on other
platforms, especially ARM?  Can we do even better?  And more
importantly: Does it matter in the first place?

The memcmp version just replaces the body of hashcmp() with
"return memcmp(sha1, sha2, 20);".  Here's the four-bytes-at-a-time
patch:
---
 cache.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 92fc9f1..c6cd28e 100644
--- a/cache.h
+++ b/cache.h
@@ -732,13 +732,18 @@ extern const unsigned char null_sha1[20];
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+	const uint32_t *p1 = (const uint32_t *)sha1;
+	const uint32_t *p2 = (const uint32_t *)sha2;
 	int i;
 
-	for (i = 0; i < 20; i++, sha1++, sha2++) {
-		if (*sha1 != *sha2)
-			return *sha1 - *sha2;
+	for (i = 0; i < 20 / sizeof(uint32_t); i++, p1++, p2++) {
+		uint32_t v1 = htonl(*p1);
+		uint32_t v2 = htonl(*p2);
+		if (v1 < v2)
+			return -1;
+		if (v1 > v2)
+			return 1;
 	}
-
 	return 0;
 }
 
-- 
2.0.2

  reply	other threads:[~2014-07-19 12:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18 16:00 [PATCH] fast-import: use hashcmp() for SHA1 hash comparison René Scharfe
2014-07-18 18:42 ` Jonathan Nieder
2014-07-18 19:14   ` René Scharfe
2014-07-18 23:57     ` Jeff King
2014-07-19 12:11       ` René Scharfe [this message]
2014-07-19 16:43         ` brian m. carlson
2014-07-19 17:53           ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53CA6072.8000009@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).