Re: Libgit2 on the Summer of Code

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Vicent Marti <tanoku@gmail.com>
To: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	git@vger.kernel.org, srabbelier@gmail.com
Subject: Re: Libgit2 on the Summer of Code
Date: Fri, 28 May 2010 02:15:43 +0200	[thread overview]
Message-ID: <AANLkTillXInFUGR0tYqlPbdddfQKPbGDbjPEV0voaBev@mail.gmail.com> (raw)
In-Reply-To: <20100527183534.GA10414@LK-Perkele-V2.elisa-laajakaista.fi>

Hey again,

thanks for the bites, Ilari. I've fixed all the formatting issues in
the latest batch of commits [1], and improved the error handling quite
a bit.

Regarding the hashing of OIDs, it's not a random hash, it's an
adaptation of MurmurHash [2], which is made of win -- specially when
it comes to dispersing stuff on hash tables. You are right however
that SHA1 raw bytes are already random enough, and probably not worth
the performance hit of hashing again on top of it -- I've dropped it
for the time being.

More work tomorrow,

Cheers,
Vicent Martí
http://www.bellverde.org

[1]: http://github.com/tanoku/libgit2-gsoc2010/commits/master
[2]: http://sites.google.com/site/murmurhash/

On Thu, May 27, 2010 at 8:35 PM, Ilari Liusvaara
<ilari.liusvaara@elisanet.fi> wrote:
> On Thu, May 27, 2010 at 11:05:54AM -0700, Shawn O. Pearce wrote:
>> Ilari Liusvaara <ilari.liusvaara@elisanet.fi> wrote:
>> > * Where algorithm in git_revpool_table__hash() is from? Since it appears to
>> > hash binary object IDs, wouldn't just simple sum/xor over words be sufficient
>> > (all SHA-1 output bits are very nearly independent). Or do you need to be
>> > compatible with some other implementation (doesn't appear so, because hash
>> > is computed differently depending on endianess)?
>>
>> If you need a hash value for a SHA-1, why not just cast the unsigned
>> char* to unsigned int* and load the first int as the hash code?
>> The output of SHA-1 is pretty evenly distributed, using the first
>> few bytes as an int should yield a sufficient distribution throughout
>> the hashtable.
>
> Yeah, With pseudorandom function[*], all ways of reducing the output to n bits are
> at most as good as just taking first n bits. But if reducing output to [0, m),
> the best way (distribution-wise, not speed-wise) is to take remainder of the whole
> value divided by m...
>
> [*] SHA-1 is not pseudorandom function, but for virtually all practical purposes
> it is.
>
> -Ilari
>

     prev parent reply	other threads:[~2010-05-28  0:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-27 16:25 Libgit2 on the Summer of Code Vicent Marti
2010-05-27 17:46 ` Ilari Liusvaara
2010-05-27 18:05   ` Shawn O. Pearce
2010-05-27 18:22     ` Alex Riesen
2010-06-02 18:42       ` Junio C Hamano
2010-05-27 18:35     ` Ilari Liusvaara
2010-05-28  0:15       ` Vicent Marti [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTillXInFUGR0tYqlPbdddfQKPbGDbjPEV0voaBev@mail.gmail.com \
    --to=tanoku@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ilari.liusvaara@elisanet.fi \
    --cc=spearce@spearce.org \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).