From: Ingo Molnar <mingo@elte.hu>
To: Erik Faye-Lund <kusmabite@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
git@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
"Arnaldo Carvalho de Melo" <acme@redhat.com>,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
"Pekka Enberg" <penberg@cs.helsinki.fi>
Subject: Re: [PATCH] git gc: Speed it up by 18% via faster hash comparisons
Date: Thu, 28 Apr 2011 12:19:02 +0200 [thread overview]
Message-ID: <20110428101902.GA17257@elte.hu> (raw)
In-Reply-To: <BANLkTim+Kk_ah_4+pQKCi8bXtA8thRVRjQ@mail.gmail.com>
* Erik Faye-Lund <kusmabite@gmail.com> wrote:
> 2011/4/28 Ingo Molnar <mingo@elte.hu>:
> >
> > * Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >
> >> > Secondly, the combined speedup of the cached case with my two patches
> >> > appears to be more than 30% on my testbox so it's a very nifty win from two
> >> > relatively simple changes.
> >>
> >> That speed-up was on ONE test vector, no? There are a lot of other uses of
> >> hash-comparisons in Git, did you measure those?
> >
> > I picked this hash function because it showed up in the profile (see the
> > profile i posted). There's one other hash that mattered as well in the profile,
> > see the lookup_object() patch i sent yesterday.
>
> My point was that the 30% improvement was in "git gc", which is not
> the only important use-case. How does this affect other git commands?
In a followup mail i measured git fsck, which showed a speedup too. (despite
being mostly dependent on external libraries to do most of the processing)
If you'd like to see other things tested please suggest a testcase that you
think uses these hashes extensively, i don't really know what the slowest (and
affected) Git commands are - git gc is the one *i* notice as being pretty slow
(for good reasons).
> >> from the exception handler, others doesn't. So this patch is pretty much
> >> guaranteed to cause a crash in some setups.
> >
> > If unsigned char arrays are allocated unaligned then that's another bug i
> > suspect that should be fixed.
>
> We can't. The compiler decides the alignment of variables on the stack. Some
> compilers / compiler-setting pairs might align char-arrays, while others
> might not.
Even if that were true it can be solved: you'd need to declare the sha1 not as
a char array but as a u32 * array or so. We do have control over the alignment
of data structures, obviously.
> > Unaligned access on x86 is not free either - there's cycle penalties.
> >
> > Alas, i have not seen these sha1 hash buffers being allocated unaligned (in
> > my very limited testing). In which spots are they allocated unaligned?
>
> Like I said above, it can happen when allocated on the stack. But it can also
> happen in malloc'ed structs, or in global variables. An array is aligned to
> the size of it's base member type. But malloc does worst-case-allignment,
> because it happens at run-time without type-information.
Well, should we ready be ready to throw up our hands as if we didnt have
control over the alignment of objects and have to accept suboptimal code as a
result? We do have control over that.
In any case, i'll retract the null case as it really isnt called that often in
the tests i've done - updated patch below - it simply falls back on to hashcmp.
Thanks,
Ingo
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/cache.h b/cache.h
index 2674f4c..39fa9cd 100644
--- a/cache.h
+++ b/cache.h
@@ -675,14 +675,24 @@ extern char *sha1_pack_name(const unsigned char *sha1);
extern char *sha1_pack_index_name(const unsigned char *sha1);
extern const char *find_unique_abbrev(const unsigned char *sha1, int);
extern const unsigned char null_sha1[20];
-static inline int is_null_sha1(const unsigned char *sha1)
+
+static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
{
- return !memcmp(sha1, null_sha1, 20);
+ int i;
+
+ for (i = 0; i < 20; i++, sha1++, sha2++) {
+ if (*sha1 != *sha2)
+ return *sha1 - *sha2;
+ }
+
+ return 0;
}
-static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
+
+static inline int is_null_sha1(const unsigned char *sha1)
{
- return memcmp(sha1, sha2, 20);
+ return !hashcmp(sha1, null_sha1);
}
+
static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
{
memcpy(sha_dst, sha_src, 20);
next prev parent reply other threads:[~2011-04-28 10:19 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-27 22:51 [PATCH] git gc: Speed it up by 18% via faster hash comparisons Ingo Molnar
2011-04-27 23:10 ` Ingo Molnar
2011-04-27 23:18 ` Jonathan Nieder
2011-04-28 6:36 ` Ingo Molnar
2011-04-28 9:31 ` Jonathan Nieder
2011-04-28 10:36 ` Ingo Molnar
2011-04-28 9:32 ` Dmitry Potapov
2011-04-27 23:32 ` Junio C Hamano
2011-04-28 0:35 ` Ralf Baechle
2011-04-28 8:18 ` Bernhard R. Link
2011-04-28 9:42 ` Andreas Ericsson
2011-04-28 9:55 ` Erik Faye-Lund
2011-04-28 20:19 ` H. Peter Anvin
2011-04-28 6:27 ` Ingo Molnar
2011-04-28 9:17 ` Erik Faye-Lund
2011-04-28 9:33 ` Ingo Molnar
2011-04-28 9:37 ` Ingo Molnar
2011-04-28 9:50 ` Erik Faye-Lund
2011-04-28 10:10 ` Pekka Enberg
2011-04-28 10:19 ` Erik Faye-Lund
2011-04-28 10:30 ` Pekka Enberg
2011-04-28 11:59 ` Erik Faye-Lund
2011-04-28 12:12 ` Pekka Enberg
2011-04-28 12:36 ` Jonathan Nieder
2011-04-28 12:40 ` Erik Faye-Lund
2011-04-28 13:37 ` Ingo Molnar
2011-04-28 15:14 ` Ingo Molnar
2011-04-28 16:00 ` Erik Faye-Lund
2011-04-28 20:32 ` Ingo Molnar
2011-04-29 7:05 ` Alex Riesen
2011-04-29 16:24 ` H. Peter Anvin
2011-04-28 12:16 ` Tor Arntsen
2011-04-28 20:23 ` H. Peter Anvin
2011-04-28 12:17 ` Andreas Ericsson
2011-04-28 12:28 ` Erik Faye-Lund
2011-04-28 10:19 ` Ingo Molnar [this message]
2011-04-28 12:02 ` Nguyen Thai Ngoc Duy
2011-04-28 12:18 ` Erik Faye-Lund
2011-04-28 20:20 ` Junio C Hamano
2011-04-28 16:36 ` Dmitry Potapov
2011-04-28 8:52 ` Dmitry Potapov
2011-04-28 9:11 ` Ingo Molnar
2011-04-28 9:31 ` Dmitry Potapov
2011-04-28 9:44 ` Ingo Molnar
2011-04-28 9:38 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110428101902.GA17257@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@redhat.com \
--cc=fweisbec@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=hpa@zytor.com \
--cc=kusmabite@gmail.com \
--cc=penberg@cs.helsinki.fi \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.