From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Fix object re-hashing
Date: Sun, 12 Feb 2006 11:10:11 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0602121101040.3691@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0602121037460.3691@g5.osdl.org>
On Sun, 12 Feb 2006, Linus Torvalds wrote:
>
> - the "overflow of the overflow": when the linear probing itself
> overflows the size of the hash queue, it will "change direction" by
> overflowing back to index zero.
>
> Happily, the re-hashing does not need to care about this case, because
> the new hash is bigger: the rule we have when doing the re-hashing is
> that as we re-hash, the "i" entries we have already re-hashed are all
> valid in the new hash, so even if overflow occurs, it will occur the
> right way (and if it overflows all the way past the current "i", we'll
> re-hash the already re-hashed entry anyway).
Btw, this is only always true if the new hash is at least twice the size
of the old hash, I think. Otherwise a re-hash can fill up the new entries
and overflow entirely before we've actually even re-hashed all the old
entries, and then we'd need to re-hash even the overflowed entries (which
are now below "i").
If the new size is at least twice the old size, the "upper area" cannot
overflow completely (there has to be empty room), and we cannot be in the
situation that we need to move even the overflowed entries when we remove
an old hash entry.
Anyway, if all this makes you nervous, the conceptually much simpler way
to do the re-sizing is to not do the in-place re-hashing. Instead of doing
the xrealloc(), just do a "xmalloc()" of the new area, do the re-hashing
(which now _must_ re-hash in just the "0..oldcount-1" old area) into the
new area, and then free the old area after rehashing.
That would make things more obviously correct, and perhaps simpler.
Johannes, do you want to try that?
Btw, as it currently stands, I worry a tiny tiny bit about the
obj_allocs = (obj_allocs < 32 ? 32 : 2 * obj_allocs)
thing, because I think that second "32" needs to be a "64" to be really
safe (ie guarantee that the new obj_allocs value is always at least twice
the old one).
Anyway, I'm pretty sure people smarter than me have already codified
exactly what needs to be done for a in-place rehash of a linear probe hash
overflow algorithm. This must all be in some "hashing 101" book. I had to
think it through from first principles rather than "knowing" what the
right answer was (which probably means that I slept through some
fundamental algorithms class in University ;)
Linus
next prev parent reply other threads:[~2006-02-12 19:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-12 18:04 Fix object re-hashing Linus Torvalds
2006-02-12 18:10 ` Linus Torvalds
2006-02-12 18:32 ` Junio C Hamano
2006-02-12 18:53 ` Linus Torvalds
2006-02-12 19:10 ` Linus Torvalds [this message]
2006-02-12 19:21 ` Junio C Hamano
2006-02-12 19:39 ` Linus Torvalds
2006-02-12 23:55 ` Johannes Schindelin
2006-02-13 0:16 ` Linus Torvalds
2006-02-13 0:31 ` Johannes Schindelin
2006-02-12 19:13 ` Junio C Hamano
2006-02-12 18:16 ` Linus Torvalds
2006-02-12 18:18 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0602121101040.3691@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).