git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Philip Herron <herron.philip@googlemail.com>
Cc: <git@vger.kernel.org>
Subject: Re: Hash Tables
Date: Thu, 6 Aug 2009 10:53:37 +0200	[thread overview]
Message-ID: <200908061053.38739.trast@student.ethz.ch> (raw)
In-Reply-To: <4A7A6756.4010305@googlemail.com>

Philip Herron wrote:
> 
> Question still stands is the hashing function [in hash.c], which one and why?

In the spirit of teaching you to fish...

First you'll want to find out where the original users of this code
were.  So you run

  git blame -- hash.c

and see that most of the lines come from 9027f53 (Do linear-time/space
rename logic for exact renames, 2007-10-25).  So you can then look at
this commit:

  git show 9027f53c

Aha, it says

    In the expectation that we will indeed do the same hashing trick for the
    general rename case, this code uses a generic hash-table implementation
    that can be used for other things too.  In fact, we might be able to
    consolidate some of our existing hash tables with the new generic code
    in hash.[ch]

and further down in the patch

+       hash = hash_filespec(filespec);
+       pos = insert_hash(hash, entry, table);

and right above that

+static unsigned int hash_filespec(struct diff_filespec *filespec)
+{
+       unsigned int hash;
+       if (!filespec->sha1_valid) {
+               if (diff_populate_filespec(filespec, 0))
+                       return 0;
+               hash_sha1_file(filespec->data, filespec->size, "blob", filespec-
+       }
+       memcpy(&hash, filespec->sha1, sizeof(hash));
+       return hash;
+}

See?

As for the *why*, presumably because all of git assumes two objects
with the same SHA1 are indeed the same file; so we can later make the
same optimisation again:

+                       if (hashcmp(one->sha1, two->sha1))
+                               continue;

And then, as we've already computed the SHA1, any subset of it is as
good a hash as anything else; it'll be uniformly distributed.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

      reply	other threads:[~2009-08-06  8:54 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06  4:35 Hash Tables Philip Herron
2009-08-06  5:17 ` Philip Herron
2009-08-06  8:53   ` Thomas Rast [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908061053.38739.trast@student.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=herron.philip@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).