git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Erik Faye-Lund <kusmabite@gmail.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	git@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
	"Arnaldo Carvalho de Melo" <acme@redhat.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Pekka Enberg" <penberg@cs.helsinki.fi>
Subject: Re: [PATCH] git gc: Speed it up by 18% via faster hash comparisons
Date: Thu, 28 Apr 2011 11:50:06 +0200	[thread overview]
Message-ID: <BANLkTim+Kk_ah_4+pQKCi8bXtA8thRVRjQ@mail.gmail.com> (raw)
In-Reply-To: <20110428093703.GB15349@elte.hu>

2011/4/28 Ingo Molnar <mingo@elte.hu>:
>
> * Erik Faye-Lund <kusmabite@gmail.com> wrote:
>
>> > Secondly, the combined speedup of the cached case with my two patches
>> > appears to be more than 30% on my testbox so it's a very nifty win from two
>> > relatively simple changes.
>>
>> That speed-up was on ONE test vector, no? There are a lot of other uses of
>> hash-comparisons in Git, did you measure those?
>
> I picked this hash function because it showed up in the profile (see the
> profile i posted). There's one other hash that mattered as well in the profile,
> see the lookup_object() patch i sent yesterday.
>

My point was that the 30% improvement was in "git gc", which is not
the only important use-case. How does this affect other git commands?

My suspicion is that it's generally an improvement, but I don't know
for sure. I think something like this might be an acceptable
trade-off, though:

diff --git a/cache.h b/cache.h
index c730c58..50b6f55 100644
--- a/cache.h
+++ b/cache.h
@@ -687,6 +687,10 @@ static inline int is_null_sha1(const unsigned char *sha1)
 }
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+	/* early out for fast mis-match */
+	if (*sha1 != *sha2)
+		return *sha2 - *sha1;
+
 	return memcmp(sha1, sha2, 20);
 }
 static inline void hashcpy(unsigned char *sha_dst, const unsigned
char *sha_src)

>> > I have added some quick debug code and none of the sha1 pointers (in my
>> > admittedly very limited) testing showed misaligned pointers on 64-bit
>> > systems.
>> >
>> > On 32-bit systems the pointer might be 32-bit aligned only - the patch
>> > below implements the function 32-bit comparisons.
>>
>> That's simply wrong. Unsigned char arrays can and will be unaligned, and this
>> causes exceptions on most architectures (x86 is pretty much the exception
>> here). While some systems for these architectures support unaligned reads
>> from the exception handler, others doesn't. So this patch is pretty much
>> guaranteed to cause a crash in some setups.
>
> If unsigned char arrays are allocated unaligned then that's another bug i
> suspect that should be fixed.

We can't. The compiler decides the alignment of variables on the
stack. Some compilers / compiler-setting pairs might align
char-arrays, while others might not.

> Unaligned access on x86 is not free either -
> there's cycle penalties.
>
> Alas, i have not seen these sha1 hash buffers being allocated unaligned (in my
> very limited testing). In which spots are they allocated unaligned?

Like I said above, it can happen when allocated on the stack. But it
can also happen in malloc'ed structs, or in global variables. An array
is aligned to the size of it's base member type. But malloc does
worst-case-allignment, because it happens at run-time without
type-information.

  reply	other threads:[~2011-04-28  9:58 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-27 22:51 [PATCH] git gc: Speed it up by 18% via faster hash comparisons Ingo Molnar
2011-04-27 23:10 ` Ingo Molnar
2011-04-27 23:18 ` Jonathan Nieder
2011-04-28  6:36   ` Ingo Molnar
2011-04-28  9:31     ` Jonathan Nieder
2011-04-28 10:36     ` Ingo Molnar
2011-04-28  9:32   ` Dmitry Potapov
2011-04-27 23:32 ` Junio C Hamano
2011-04-28  0:35   ` Ralf Baechle
2011-04-28  8:18     ` Bernhard R. Link
2011-04-28  9:42       ` Andreas Ericsson
2011-04-28  9:55         ` Erik Faye-Lund
2011-04-28 20:19           ` H. Peter Anvin
2011-04-28  6:27   ` Ingo Molnar
2011-04-28  9:17     ` Erik Faye-Lund
2011-04-28  9:33       ` Ingo Molnar
2011-04-28  9:37       ` Ingo Molnar
2011-04-28  9:50         ` Erik Faye-Lund [this message]
2011-04-28 10:10           ` Pekka Enberg
2011-04-28 10:19             ` Erik Faye-Lund
2011-04-28 10:30               ` Pekka Enberg
2011-04-28 11:59                 ` Erik Faye-Lund
2011-04-28 12:12                   ` Pekka Enberg
2011-04-28 12:36                   ` Jonathan Nieder
2011-04-28 12:40                     ` Erik Faye-Lund
2011-04-28 13:37                     ` Ingo Molnar
2011-04-28 15:14                       ` Ingo Molnar
2011-04-28 16:00                         ` Erik Faye-Lund
2011-04-28 20:32                           ` Ingo Molnar
2011-04-29  7:05                   ` Alex Riesen
2011-04-29 16:24                     ` H. Peter Anvin
2011-04-28 12:16                 ` Tor Arntsen
2011-04-28 20:23                   ` H. Peter Anvin
2011-04-28 12:17                 ` Andreas Ericsson
2011-04-28 12:28                   ` Erik Faye-Lund
2011-04-28 10:19           ` Ingo Molnar
2011-04-28 12:02             ` Nguyen Thai Ngoc Duy
2011-04-28 12:18             ` Erik Faye-Lund
2011-04-28 20:20             ` Junio C Hamano
2011-04-28 16:36         ` Dmitry Potapov
2011-04-28  8:52 ` Dmitry Potapov
2011-04-28  9:11   ` Ingo Molnar
2011-04-28  9:31     ` Dmitry Potapov
2011-04-28  9:44       ` Ingo Molnar
2011-04-28  9:38     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BANLkTim+Kk_ah_4+pQKCi8bXtA8thRVRjQ@mail.gmail.com \
    --to=kusmabite@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hpa@zytor.com \
    --cc=mingo@elte.hu \
    --cc=penberg@cs.helsinki.fi \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).