git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Marco Costalba" <mcostalba@gmail.com>
To: "Junio C Hamano" <gitster@pobox.com>
Cc: "Git Mailing List" <git@vger.kernel.org>
Subject: [PATCH] Use Peter J. Weinberger's hash function in xdiff
Date: Mon, 23 Jul 2007 13:17:58 +0200	[thread overview]
Message-ID: <e5bfff550707230417j3d4ac7f4sb310baf43a327684@mail.gmail.com> (raw)

It seems a little bit faster then current one:

WITH CURRENT ONE

git tree
$ time git diff v0.99.. > /dev/null
0.91user 0.02system 0:01.03elapsed 91%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3594minor)pagefaults 0swaps

Linux tree
$ time git diff v2.6.22.. > /dev/null
13.61user 0.19system 0:13.90elapsed 99%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27279minor)pagefaults 0swaps

WITH NEW ONE

git tree
$ time git diff v0.99.. > /dev/null
0.87user 0.01system 0:00.98elapsed 90%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3594minor)pagefaults 0swaps

Linux tree
$ time git diff v2.6.22.. > /dev/null
13.26user 0.21system 0:13.57elapsed 99%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27277minor)pagefaults 0swaps

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
---

This patch is not intended for inclusion, the time diff is very small,
but it would be interestiing to understand from where it comes from
possibly from people more versed then me in hashing functions.

BTW this modified Weinberger's hash function is the one used in Qt
library, for the QHash class.

 xdiff/xutils.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 2ade97b..c19ab6e 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -270,24 +270,32 @@ static unsigned long
 	return ha;
 }

+/*
+    This function is based on Peter J. Weinberger's hash function
+    (from the Dragon Book). The constant 24 in the original function
+    was replaced with 23 to produce fewer collisions on input such as
+    "a", "aa", "aaa", "aaaa", ...
+*/

 unsigned long xdl_hash_record(char const **data, char const *top, long flags) {
-	unsigned long ha = 5381;
+
+	unsigned long ha = 0;
+	unsigned long g;
 	char const *ptr = *data;

 	if (flags & XDF_WHITESPACE_FLAGS)
 		return xdl_hash_record_with_whitespace(data, top, flags);

-	for (; ptr < top && *ptr != '\n'; ptr++) {
-		ha += (ha << 5);
-		ha ^= (unsigned long) *ptr;
+	while (ptr < top && *ptr != '\n') {
+		ha = (ha << 4) + *ptr++;
+		if ((g = (ha & 0xf0000000)) != 0)
+			ha ^= g >> 23;
+		ha &= ~g;
 	}
 	*data = ptr < top ? ptr + 1: ptr;
-
 	return ha;
 }

-
 unsigned int xdl_hashbits(unsigned int size) {
 	unsigned int val = 1, bits = 0;

-- 
1.5.3.rc2.29.gc4640f-dirty

                 reply	other threads:[~2007-07-23 11:18 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5bfff550707230417j3d4ac7f4sb310baf43a327684@mail.gmail.com \
    --to=mcostalba@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).