From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Fix object re-hashing
Date: Sun, 12 Feb 2006 10:53:18 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0602121037460.3691@g5.osdl.org> (raw)
In-Reply-To: <7vaccwbf6v.fsf@assigned-by-dhcp.cox.net>
On Sun, 12 Feb 2006, Junio C Hamano wrote:
>
> This "fix" makes the symptom that me fire two (maybe three)
> Grrrrr messages earlier this morning disappear.
Goodie. I assume that was the fixed fix, not my original "edit out the
useless optimization and then break it totally" fix ;)
> I haven't had my caffeine nor nicotine yet after my short sleep, so I
> need to take some time understanding your explanation first, but I am
> reasonably sure this must be it (not that I do not trust you, not at all
> -- it is that I do not trust *me* applying a patch without understanding
> when I have a bug reproducible).
The basic notion is that this hashing algorithm uses a normal "linear
probing" overflow approach, which basically means that overflows in
a hash bucket always just probe the next few buckets to find an empty one.
That's a really simple (and fairly cache-friendly) approach, and it makes
tons of sense, especially since we always re-size the hash to guarantee
that we'll have empty slots. It's a bit more subtle - especially when
re-hashing - than the probably more common "collission chain" approach,
though.
Now, when we re-hash, the important rule is:
- the re-hashing has to walk in the same direction as the overflow.
This is important, because when we move a hashed entry, that automatically
means that even otherwise _already_correctly_ hashed entries may need to
be moved down (ie even if their "inherent hash" does not change, their
_effective_ hash address changes because their overflow position needs to
be fixed up).
There are two interesting cases:
- the "overflow of the overflow": when the linear probing itself
overflows the size of the hash queue, it will "change direction" by
overflowing back to index zero.
Happily, the re-hashing does not need to care about this case, because
the new hash is bigger: the rule we have when doing the re-hashing is
that as we re-hash, the "i" entries we have already re-hashed are all
valid in the new hash, so even if overflow occurs, it will occur the
right way (and if it overflows all the way past the current "i", we'll
re-hash the already re-hashed entry anyway).
- the old/new border case. In particular, the trivial logic says that we
only need to re-hash entries that were hashed with the old hash. That's
what the broken code did: it only traversed "0..oldcount-1", because
any entries that had an index bigger than or equal to "oldcount" were
obviously _already_ re-hashed.
That logic sounds obvious, but it falls down on exactly the fact that
we may indeed have to re-hash even entries that already were re-hashed
with the new algorithm, exactly because of the overflow changes.
So the boundary for old/new is really: "you need to rehash all entries
that were old, but then you _also_ need to rehash the list of entries that
you rehashed that might need to be moved down to an empty spot vacated by
an old hash".
So the stop condition really ends up being: "stop when you have seen all
old hash entries _and_ at least one empty entry after that", since an
empty entry means that there was no overflow from earlier positions past
that position. But it's just simpler to walk the whole damn new thing and
not worry about it.
Linus
next prev parent reply other threads:[~2006-02-12 18:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-12 18:04 Fix object re-hashing Linus Torvalds
2006-02-12 18:10 ` Linus Torvalds
2006-02-12 18:32 ` Junio C Hamano
2006-02-12 18:53 ` Linus Torvalds [this message]
2006-02-12 19:10 ` Linus Torvalds
2006-02-12 19:21 ` Junio C Hamano
2006-02-12 19:39 ` Linus Torvalds
2006-02-12 23:55 ` Johannes Schindelin
2006-02-13 0:16 ` Linus Torvalds
2006-02-13 0:31 ` Johannes Schindelin
2006-02-12 19:13 ` Junio C Hamano
2006-02-12 18:16 ` Linus Torvalds
2006-02-12 18:18 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0602121037460.3691@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).