git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Fix object re-hashing
Date: Sun, 12 Feb 2006 11:10:11 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602121101040.3691@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0602121037460.3691@g5.osdl.org>



On Sun, 12 Feb 2006, Linus Torvalds wrote:
> 
>  - the "overflow of the overflow": when the linear probing itself 
>    overflows the size of the hash queue, it will "change direction" by 
>    overflowing back to index zero.
> 
>    Happily, the re-hashing does not need to care about this case, because 
>    the new hash is bigger: the rule we have when doing the re-hashing is 
>    that as we re-hash, the "i" entries we have already re-hashed are all 
>    valid in the new hash, so even if overflow occurs, it will occur the 
>    right way (and if it overflows all the way past the current "i", we'll 
>    re-hash the already re-hashed entry anyway).

Btw, this is only always true if the new hash is at least twice the size 
of the old hash, I think. Otherwise a re-hash can fill up the new entries 
and overflow entirely before we've actually even re-hashed all the old 
entries, and then we'd need to re-hash even the overflowed entries (which 
are now below "i").

If the new size is at least twice the old size, the "upper area" cannot 
overflow completely (there has to be empty room), and we cannot be in the 
situation that we need to move even the overflowed entries when we remove 
an old hash entry.

Anyway, if all this makes you nervous, the conceptually much simpler way 
to do the re-sizing is to not do the in-place re-hashing. Instead of doing 
the xrealloc(), just do a "xmalloc()" of the new area, do the re-hashing 
(which now _must_ re-hash in just the "0..oldcount-1" old area) into the 
new area, and then free the old area after rehashing.

That would make things more obviously correct, and perhaps simpler.

Johannes, do you want to try that?

Btw, as it currently stands, I worry a tiny tiny bit about the

	obj_allocs = (obj_allocs < 32 ? 32 : 2 * obj_allocs)

thing, because I think that second "32" needs to be a "64" to be really 
safe (ie guarantee that the new obj_allocs value is always at least twice 
the old one).

Anyway, I'm pretty sure people smarter than me have already codified 
exactly what needs to be done for a in-place rehash of a linear probe hash 
overflow algorithm. This must all be in some "hashing 101" book. I had to 
think it through from first principles rather than "knowing" what the 
right answer was (which probably means that I slept through some 
fundamental algorithms class in University ;)

		Linus

  reply	other threads:[~2006-02-12 19:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-12 18:04 Fix object re-hashing Linus Torvalds
2006-02-12 18:10 ` Linus Torvalds
2006-02-12 18:32   ` Junio C Hamano
2006-02-12 18:53     ` Linus Torvalds
2006-02-12 19:10       ` Linus Torvalds [this message]
2006-02-12 19:21         ` Junio C Hamano
2006-02-12 19:39           ` Linus Torvalds
2006-02-12 23:55         ` Johannes Schindelin
2006-02-13  0:16           ` Linus Torvalds
2006-02-13  0:31             ` Johannes Schindelin
2006-02-12 19:13       ` Junio C Hamano
2006-02-12 18:16 ` Linus Torvalds
2006-02-12 18:18   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602121101040.3691@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).