From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Fredrik Kuivinen <freku045@student.liu.se>, git@vger.kernel.org
Subject: Re: git-diff-tree -M performance regression in 'next'
Date: Mon, 13 Mar 2006 19:47:41 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0603131941260.3618@g5.osdl.org> (raw)
In-Reply-To: <7vveuhohve.fsf@assigned-by-dhcp.cox.net>
On Mon, 13 Mar 2006, Junio C Hamano wrote:
>
> Here is still an WIP but an insanely fast one (actually this is
> a modified version of what once was in next). I haven't
> verified the sanity of its output fully, but from a cursory look
> what are found look sensible. The same v2.6.12..v2.6.14 test on
> my Duron 750:
Heh. I did something similar, except I wanted mine to work with binary
data too. Not that I know how _well_ it works, but assuming you have
_some_ '\n' characters to fix up offset mismatches, it might do something.
Mine is a bit less hacky than yours, I believe. It doesn't skip
whitespace, instead it just maintains a rolling 64-bit number, where each
character shifts it left by 7 and then adds in the new character value
(overflow in 32 bits just ignored).
Then it uses your old hash function, except it hides the length in the top
byte.
It breaks the hashing on '\n' or on hitting a 64-byte sequence, whichever
comes first.
It's fast and stupid, but doesn't seem to do any worse than your old one.
The speed comes from the fact that it only does the hash comparisons at
the "block boundaries", not at every byte.
Anyway, I don't think something like this is really any good for rename
detection, but it might be good for deciding whether to do a real delta.
Linus
----
diff --git a/diffcore-delta.c b/diffcore-delta.c
index 835d82c..4c6e512 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -127,7 +127,7 @@ static struct spanhash_top *add_spanhash
static struct spanhash_top *hash_chars(unsigned char *buf, unsigned long sz)
{
- int i;
+ int i, n;
unsigned long accum1, accum2, hashval;
struct spanhash_top *hash;
@@ -137,19 +137,21 @@ static struct spanhash_top *hash_chars(u
hash->free = INITIAL_FREE(i);
memset(hash->data, 0, sizeof(struct spanhash) * (1<<i));
- /* an 8-byte shift register made of accum1 and accum2. New
- * bytes come at LSB of accum2, and shifted up to accum1
- */
- for (i = accum1 = accum2 = 0; i < 7; i++, sz--) {
- accum1 = (accum1 << 8) | (accum2 >> 24);
- accum2 = (accum2 << 8) | *buf++;
- }
+ n = 0;
+ accum1 = accum2 = 0;
while (sz) {
- accum1 = (accum1 << 8) | (accum2 >> 24);
- accum2 = (accum2 << 8) | *buf++;
+ unsigned long c = *buf++;
+ sz--;
+ accum1 = (accum1 << 7) | (accum2 >> 25);
+ accum2 = (accum2 << 7) | (accum1 >> 25);
+ accum1 += c;
+ if (++n < 64 && c != '\n')
+ continue;
hashval = (accum1 + accum2 * 0x61) % HASHBASE;
+ hashval |= (n << 24);
hash = add_spanhash(hash, hashval);
- sz--;
+ n = 0;
+ accum1 = accum2 = 0;
}
return hash;
}
@@ -166,9 +168,6 @@ int diffcore_count_changes(void *src, un
struct spanhash_top *src_count, *dst_count;
unsigned long sc, la;
- if (src_size < 8 || dst_size < 8)
- return -1;
-
src_count = dst_count = NULL;
if (src_count_p)
src_count = *src_count_p;
@@ -196,6 +195,8 @@ int diffcore_count_changes(void *src, un
src_cnt = s->cnt;
d = spanhash_find(dst_count, s->hashval);
dst_cnt = d ? d->cnt : 0;
+ dst_cnt *= (d->hashval >> 24);
+ src_cnt *= (d->hashval >> 24);
if (src_cnt < dst_cnt) {
la += dst_cnt - src_cnt;
sc += src_cnt;
next prev parent reply other threads:[~2006-03-14 3:47 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-11 17:28 git-diff-tree -M performance regression in 'next' Fredrik Kuivinen
2006-03-12 3:10 ` Junio C Hamano
2006-03-12 12:28 ` Junio C Hamano
2006-03-12 17:00 ` Linus Torvalds
2006-03-12 19:34 ` Junio C Hamano
2006-03-13 0:42 ` Junio C Hamano
2006-03-13 1:09 ` Linus Torvalds
2006-03-13 1:22 ` Junio C Hamano
2006-03-13 1:39 ` Linus Torvalds
2006-03-13 1:29 ` Linus Torvalds
2006-03-13 1:31 ` Linus Torvalds
2006-03-13 2:29 ` Linus Torvalds
2006-03-13 2:53 ` Linus Torvalds
2006-03-13 4:14 ` Junio C Hamano
2006-03-14 2:55 ` Junio C Hamano
2006-03-14 3:47 ` Linus Torvalds [this message]
2006-03-14 10:26 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0603131941260.3618@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=freku045@student.liu.se \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).