git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Jim Meyering <jim@meyering.net>
Cc: Davide Libenzi <davidel@xmailserver.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: git-diff-tree inordinately (O(M*N)) slow on files with many changes
Date: Mon, 16 Oct 2006 10:56:14 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0610161038200.3962@g5.osdl.org> (raw)
In-Reply-To: <87ejt8p5l9.fsf@rho.meyering.net>



On Mon, 16 Oct 2006, Jim Meyering wrote:
> 
> That helps a little.
> Now, instead of taking 63s, my test takes ~30s.
> (32 for XDL_MAX_EQLIMIT = 16, 30 for XDL_MAX_EQLIMIT = 8)

Btw, what architecture is this on?

I'm testing those two files, and I get much more reasonable numbers with 
both ppc32 and x86. Both 32-bit:

	[torvalds@macmini test-perf]$ time git show | wc -l
	25221

	real    0m1.437s
	user    0m1.436s
	sys     0m0.012s

ie it generated the diff in less than a second and a half. Not wonderful, 
but certainly not your 63s either.

HOWEVER. On x86-64, it takes forever (still not 63 seconds, but it takes 
17 seconds on my 2GHz merom machine).

So I think there's something seriously broken with hashing on 64-bit. 

And I think I know what it is.

Try this patch. And make sure to do a "make clean" first, since I think 
the dependencies on xdiff may be broken.

Davide: there's two things wrong with your old XDL_HASHLONG():

 - the GR_PRIME was just 32-bit, so it wouldn't shift low bits up far 
   enough on a 64-bit architecture, so then shifting things down caused 
   pretty much everything to be very small.

 - The whole idea of shifting up by multiplying and then shifting down to 
   get the high bits is _broken_. Even on 32-bit architectures. Think 
   about what happens when "hashbits" is 16 on a 32-bit architecture: the 
   multiply moves the low bits _up_, but it doesn't move the high bits 
   _down_. And with hashbits being a large fraction of the whole word, you 
   need to shift things down, not up.

So just making GR_PRIME be a bigger value on a 64-bit architecture would 
not have fixed it. The whole hash was simply broken. Do it the sane and 
obvious way instead: always pick the low bits, but mix in upper bits there 
too..

This patch brings the time down from 17 seconds to 0.8 seconds for me.

		Linus

---
diff --git a/xdiff/xmacros.h b/xdiff/xmacros.h
index 4c2fde8..bb4830b 100644
--- a/xdiff/xmacros.h
+++ b/xdiff/xmacros.h
@@ -24,14 +24,27 @@ #if !defined(XMACROS_H)
 #define XMACROS_H
 
 
-#define GR_PRIME 0x9e370001UL
+static inline unsigned long xdl_hashlong(unsigned long val, unsigned int bits)
+{
+	unsigned long shift = val >> bits;
+
+	/* Shift in the upper bits too */
+	val += shift;
+
+	/* Do it twice for small values of bits */
+	if (bits < 4*sizeof(unsigned long))
+		val += shift >> bits;
+
+	/* Return the resulting low bits */
+	return val & ((1ul << bits)-1);
+}
 
 
 #define XDL_MIN(a, b) ((a) < (b) ? (a): (b))
 #define XDL_MAX(a, b) ((a) > (b) ? (a): (b))
 #define XDL_ABS(v) ((v) >= 0 ? (v): -(v))
 #define XDL_ISDIGIT(c) ((c) >= '0' && (c) <= '9')
-#define XDL_HASHLONG(v, b) (((unsigned long)(v) * GR_PRIME) >> ((CHAR_BIT * sizeof(unsigned long)) - (b)))
+#define XDL_HASHLONG(v, b) xdl_hashlong(v,b)
 #define XDL_PTRFREE(p) do { if (p) { xdl_free(p); (p) = NULL; } } while (0)
 #define XDL_LE32_PUT(p, v) \
 do { \

  parent reply	other threads:[~2006-10-16 17:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-16 14:12 git-diff-tree inordinately (O(M*N)) slow on files with many changes Jim Meyering
2006-10-16 15:47 ` Linus Torvalds
2006-10-16 16:12   ` Linus Torvalds
2006-10-16 16:33     ` Jim Meyering
2006-10-16 16:42       ` Davide Libenzi
2006-10-16 16:50         ` Jim Meyering
2006-10-16 16:54           ` Davide Libenzi
2006-10-16 16:57             ` Jim Meyering
2006-10-16 17:02               ` Davide Libenzi
2006-10-16 17:56           ` Linus Torvalds [this message]
2006-10-16 18:03             ` Linus Torvalds
2006-10-16 18:41               ` Davide Libenzi
2006-10-16 18:18             ` Davide Libenzi
2006-10-16 18:51               ` Linus Torvalds
2006-10-16 19:44                 ` Davide Libenzi
2006-10-16 20:29                   ` Jakub Narebski
2006-10-16 22:53                 ` Junio C Hamano
2006-10-16 23:24                   ` Linus Torvalds
2006-10-16 23:52                     ` Davide Libenzi
2006-10-16 18:24             ` Jim Meyering
2006-10-16 18:30               ` Davide Libenzi
2006-10-16 18:43                 ` Jim Meyering
2006-10-16 16:54       ` Linus Torvalds
2006-10-16 16:36     ` Davide Libenzi
2006-10-16 16:57       ` Linus Torvalds
2006-10-16 16:24   ` Davide Libenzi
2006-10-16 16:54     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0610161038200.3962@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=davidel@xmailserver.org \
    --cc=git@vger.kernel.org \
    --cc=jim@meyering.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).