git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Nicolas Pitre <nico@cam.org>, Sergey Vlasov <vsu@altlinux.ru>,
	Junio C Hamano <junkio@cox.net>,
	git@vger.kernel.org
Subject: Re: heads-up: git-index-pack in "next" is broken
Date: Wed, 18 Oct 2006 14:21:58 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0610181407040.18885@alien.or.mcafeemobile.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0610180938540.3962@g5.osdl.org>

On Wed, 18 Oct 2006, Linus Torvalds wrote:

> On Wed, 18 Oct 2006, Davide Libenzi wrote:
> >
> > Speaking in general, seen at the hash function level, of course an interface 
> > should not give different result for different word sizes or word endianess. 
> > Considering the diff algorithm as interface, as I said, the output was 
> > unaffected by the 64 bits word size. It was just very slow.
> 
> Well, even the output may actually be affected, in the case of _real_ hash 
> collisions (as opposed to just the hash _list_ collision that XDL_HASHLONG 
> caused).
> 
> So I actually think it would be better to have "uint32_t" as the hash 
> value - because that would mean that all diffs (or, in the case of the 
> block-algorithm, the deltas) are guaranteed to give the same results 
> regardless of architecture.
> 
> Right now, we actually generate a 64-bit hash value (BUT: for short lines, 
> it's likely only _interesting_ in the low bits, so the high bits tend to 
> have a very high likelihood of being zero). So hash collisions are 
> different: on a 32-bit architecture, two lines may have the same hash, 
> while on a 64-bit one, they are different.
> 
> And together with some of the limiters we have (eg XDL_MAX_EQLIMIT) hash 
> collisions can sometimes affect the output.
> 
> Admittedly, in _practice_ this is really unlikely to affect anything 
> (you'd get a valid diff in either case, they'd just possibly be subtly 
> different, and the input data must be _really_ strange to even see that 
> case), but I do think that the hash algorithm can matter.
> 
> NOTE! I'm not talking about XDL_HASHLONG(), I'm talking about the 
> xdl_hash_record() hash, which returns differently-sized hash results on 
> 32-bit and 64-bit. And there are cases where we _only_ compare the hashes, 
> and don't actually double-check the contents.

The hash value (hence the hash bucket index) simply directs you to the 
bucket where a real record-compare loop is performed. Collisions here 
means only performance loss (you don't spread buckets enough), nothing 
more. So the different behaviour does not apply to the diff algo.
But, it would apply if the knowledge of the hash indexing would be 
"exported", for example inside an external file. Think about a trivial DB 
file where you store hash buckets on file an you want to lookup records 
based on the store hash layout. In that case, of course, if the hash 
function that generated the DB bucket layout is different from the one 
that you use to get the bucket index at lookup time, you're in trouble.
IOW if the hash function result is not "exported" is some way, it doesn't 
really matter if it's an 'unsigned long' or a 'uint32_t'. In the same way 
you cannot export binary structures using 'int' or 'long', and expect to 
be compatible over different architectures.



- Davide

  reply	other threads:[~2006-10-18 21:22 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-17  4:55 heads-up: git-index-pack in "next" is broken Junio C Hamano
2006-10-17 15:39 ` Nicolas Pitre
2006-10-17 16:07   ` Junio C Hamano
2006-10-17 17:00     ` Nicolas Pitre
2006-10-17 18:11       ` Junio C Hamano
2006-10-17 18:47         ` Nicolas Pitre
2006-10-17 19:36           ` Sergey Vlasov
2006-10-17 20:10             ` Junio C Hamano
2006-10-17 20:25               ` Nicolas Pitre
2006-10-17 20:23             ` Nicolas Pitre
2006-10-17 20:51               ` Linus Torvalds
2006-10-17 21:21                 ` Nicolas Pitre
2006-10-17 21:46                   ` Linus Torvalds
2006-10-18  0:20                     ` Nicolas Pitre
2006-10-18  0:57                       ` Linus Torvalds
2006-10-18  2:08                         ` Nicolas Pitre
2006-10-18  3:12                           ` Linus Torvalds
2006-10-18  6:09                             ` Davide Libenzi
2006-10-18 14:56                               ` Linus Torvalds
2006-10-18 16:17                                 ` Davide Libenzi
2006-10-18 16:52                                   ` Linus Torvalds
2006-10-18 21:21                                     ` Davide Libenzi [this message]
2006-10-18 21:48                                       ` Linus Torvalds
2006-10-18 22:34                                         ` Davide Libenzi
2006-10-18  1:30                       ` Junio C Hamano
2006-10-18  2:23                         ` Nicolas Pitre
2006-10-18  4:16                           ` Junio C Hamano
2006-10-18  5:07                             ` Junio C Hamano
2006-10-18 10:00                               ` Johannes Schindelin
2006-10-18 13:13                                 ` Nicolas Pitre
2006-10-18 13:02                               ` Nicolas Pitre
2006-10-17 21:54                 ` Junio C Hamano
2006-10-18  1:38                   ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0610181407040.18885@alien.or.mcafeemobile.com \
    --to=davidel@xmailserver.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    --cc=torvalds@osdl.org \
    --cc=vsu@altlinux.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).