git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: Nicolas Pitre <nico@cam.org>, Sergey Vlasov <vsu@altlinux.ru>,
	Junio C Hamano <junkio@cox.net>,
	git@vger.kernel.org
Subject: Re: heads-up: git-index-pack in "next" is broken
Date: Wed, 18 Oct 2006 09:52:23 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0610180938540.3962@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0610180845040.18388@alien.or.mcafeemobile.com>



On Wed, 18 Oct 2006, Davide Libenzi wrote:
>
> Speaking in general, seen at the hash function level, of course an interface 
> should not give different result for different word sizes or word endianess. 
> Considering the diff algorithm as interface, as I said, the output was 
> unaffected by the 64 bits word size. It was just very slow.

Well, even the output may actually be affected, in the case of _real_ hash 
collisions (as opposed to just the hash _list_ collision that XDL_HASHLONG 
caused).

So I actually think it would be better to have "uint32_t" as the hash 
value - because that would mean that all diffs (or, in the case of the 
block-algorithm, the deltas) are guaranteed to give the same results 
regardless of architecture.

Right now, we actually generate a 64-bit hash value (BUT: for short lines, 
it's likely only _interesting_ in the low bits, so the high bits tend to 
have a very high likelihood of being zero). So hash collisions are 
different: on a 32-bit architecture, two lines may have the same hash, 
while on a 64-bit one, they are different.

And together with some of the limiters we have (eg XDL_MAX_EQLIMIT) hash 
collisions can sometimes affect the output.

Admittedly, in _practice_ this is really unlikely to affect anything 
(you'd get a valid diff in either case, they'd just possibly be subtly 
different, and the input data must be _really_ strange to even see that 
case), but I do think that the hash algorithm can matter.

NOTE! I'm not talking about XDL_HASHLONG(), I'm talking about the 
xdl_hash_record() hash, which returns differently-sized hash results on 
32-bit and 64-bit. And there are cases where we _only_ compare the hashes, 
and don't actually double-check the contents.

So I think that in _practice_ you can't see differences between a 32-bit 
version and a 64-bit one, but the possibility is there. Using "uint32_t" 
instead of "unsigned long" to keep track of hashes would avoid that 
theoretical problem (and might actually make for better performance on 
64-bit archtiectures, if only because of denser data structures and thus 
better cache behaviour).

			Linus

  reply	other threads:[~2006-10-18 16:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-17  4:55 heads-up: git-index-pack in "next" is broken Junio C Hamano
2006-10-17 15:39 ` Nicolas Pitre
2006-10-17 16:07   ` Junio C Hamano
2006-10-17 17:00     ` Nicolas Pitre
2006-10-17 18:11       ` Junio C Hamano
2006-10-17 18:47         ` Nicolas Pitre
2006-10-17 19:36           ` Sergey Vlasov
2006-10-17 20:10             ` Junio C Hamano
2006-10-17 20:25               ` Nicolas Pitre
2006-10-17 20:23             ` Nicolas Pitre
2006-10-17 20:51               ` Linus Torvalds
2006-10-17 21:21                 ` Nicolas Pitre
2006-10-17 21:46                   ` Linus Torvalds
2006-10-18  0:20                     ` Nicolas Pitre
2006-10-18  0:57                       ` Linus Torvalds
2006-10-18  2:08                         ` Nicolas Pitre
2006-10-18  3:12                           ` Linus Torvalds
2006-10-18  6:09                             ` Davide Libenzi
2006-10-18 14:56                               ` Linus Torvalds
2006-10-18 16:17                                 ` Davide Libenzi
2006-10-18 16:52                                   ` Linus Torvalds [this message]
2006-10-18 21:21                                     ` Davide Libenzi
2006-10-18 21:48                                       ` Linus Torvalds
2006-10-18 22:34                                         ` Davide Libenzi
2006-10-18  1:30                       ` Junio C Hamano
2006-10-18  2:23                         ` Nicolas Pitre
2006-10-18  4:16                           ` Junio C Hamano
2006-10-18  5:07                             ` Junio C Hamano
2006-10-18 10:00                               ` Johannes Schindelin
2006-10-18 13:13                                 ` Nicolas Pitre
2006-10-18 13:02                               ` Nicolas Pitre
2006-10-17 21:54                 ` Junio C Hamano
2006-10-18  1:38                   ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0610180938540.3962@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=davidel@xmailserver.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    --cc=vsu@altlinux.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).