From: Thomas Rast <trast@student.ethz.ch>
To: Philip Herron <herron.philip@googlemail.com>
Cc: <git@vger.kernel.org>
Subject: Re: Hash Tables
Date: Thu, 6 Aug 2009 10:53:37 +0200 [thread overview]
Message-ID: <200908061053.38739.trast@student.ethz.ch> (raw)
In-Reply-To: <4A7A6756.4010305@googlemail.com>
Philip Herron wrote:
>
> Question still stands is the hashing function [in hash.c], which one and why?
In the spirit of teaching you to fish...
First you'll want to find out where the original users of this code
were. So you run
git blame -- hash.c
and see that most of the lines come from 9027f53 (Do linear-time/space
rename logic for exact renames, 2007-10-25). So you can then look at
this commit:
git show 9027f53c
Aha, it says
In the expectation that we will indeed do the same hashing trick for the
general rename case, this code uses a generic hash-table implementation
that can be used for other things too. In fact, we might be able to
consolidate some of our existing hash tables with the new generic code
in hash.[ch]
and further down in the patch
+ hash = hash_filespec(filespec);
+ pos = insert_hash(hash, entry, table);
and right above that
+static unsigned int hash_filespec(struct diff_filespec *filespec)
+{
+ unsigned int hash;
+ if (!filespec->sha1_valid) {
+ if (diff_populate_filespec(filespec, 0))
+ return 0;
+ hash_sha1_file(filespec->data, filespec->size, "blob", filespec-
+ }
+ memcpy(&hash, filespec->sha1, sizeof(hash));
+ return hash;
+}
See?
As for the *why*, presumably because all of git assumes two objects
with the same SHA1 are indeed the same file; so we can later make the
same optimisation again:
+ if (hashcmp(one->sha1, two->sha1))
+ continue;
And then, as we've already computed the SHA1, any subset of it is as
good a hash as anything else; it'll be uniformly distributed.
--
Thomas Rast
trast@{inf,student}.ethz.ch
prev parent reply other threads:[~2009-08-06 8:54 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-06 4:35 Hash Tables Philip Herron
2009-08-06 5:17 ` Philip Herron
2009-08-06 8:53 ` Thomas Rast [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200908061053.38739.trast@student.ethz.ch \
--to=trast@student.ethz.ch \
--cc=git@vger.kernel.org \
--cc=herron.philip@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).