git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: Fix object re-hashing
Date: Sun, 12 Feb 2006 10:53:18 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602121037460.3691@g5.osdl.org> (raw)
In-Reply-To: <7vaccwbf6v.fsf@assigned-by-dhcp.cox.net>



On Sun, 12 Feb 2006, Junio C Hamano wrote:
> 
> This "fix" makes the symptom that me fire two (maybe three)
> Grrrrr messages earlier this morning disappear.

Goodie. I assume that was the fixed fix, not my original "edit out the 
useless optimization and then break it totally" fix ;)

> I haven't had my caffeine nor nicotine yet after my short sleep, so I 
> need to take some time understanding your explanation first, but I am 
> reasonably sure this must be it (not that I do not trust you, not at all 
> -- it is that I do not trust *me* applying a patch without understanding 
> when I have a bug reproducible).

The basic notion is that this hashing algorithm uses a normal "linear 
probing" overflow approach, which basically means that overflows in 
a hash bucket always just probe the next few buckets to find an empty one.

That's a really simple (and fairly cache-friendly) approach, and it makes 
tons of sense, especially since we always re-size the hash to guarantee 
that we'll have empty slots. It's a bit more subtle - especially when 
re-hashing - than the probably more common "collission chain" approach, 
though.

Now, when we re-hash, the important rule is:

 - the re-hashing has to walk in the same direction as the overflow.

This is important, because when we move a hashed entry, that automatically 
means that even otherwise _already_correctly_ hashed entries may need to 
be moved down (ie even if their "inherent hash" does not change, their 
_effective_ hash address changes because their overflow position needs to 
be fixed up).

There are two interesting cases:

 - the "overflow of the overflow": when the linear probing itself 
   overflows the size of the hash queue, it will "change direction" by 
   overflowing back to index zero.

   Happily, the re-hashing does not need to care about this case, because 
   the new hash is bigger: the rule we have when doing the re-hashing is 
   that as we re-hash, the "i" entries we have already re-hashed are all 
   valid in the new hash, so even if overflow occurs, it will occur the 
   right way (and if it overflows all the way past the current "i", we'll 
   re-hash the already re-hashed entry anyway).

 - the old/new border case. In particular, the trivial logic says that we 
   only need to re-hash entries that were hashed with the old hash. That's 
   what the broken code did: it only traversed "0..oldcount-1", because 
   any entries that had an index bigger than or equal to "oldcount" were 
   obviously _already_ re-hashed.

   That logic sounds obvious, but it falls down on exactly the fact that 
   we may indeed have to re-hash even entries that already were re-hashed 
   with the new algorithm, exactly because of the overflow changes.

So the boundary for old/new is really: "you need to rehash all entries 
that were old, but then you _also_ need to rehash the list of entries that 
you rehashed that might need to be moved down to an empty spot vacated by 
an old hash".

So the stop condition really ends up being: "stop when you have seen all 
old hash entries _and_ at least one empty entry after that", since an 
empty entry means that there was no overflow from earlier positions past 
that position. But it's just simpler to walk the whole damn new thing and 
not worry about it.

		Linus

  reply	other threads:[~2006-02-12 18:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-12 18:04 Fix object re-hashing Linus Torvalds
2006-02-12 18:10 ` Linus Torvalds
2006-02-12 18:32   ` Junio C Hamano
2006-02-12 18:53     ` Linus Torvalds [this message]
2006-02-12 19:10       ` Linus Torvalds
2006-02-12 19:21         ` Junio C Hamano
2006-02-12 19:39           ` Linus Torvalds
2006-02-12 23:55         ` Johannes Schindelin
2006-02-13  0:16           ` Linus Torvalds
2006-02-13  0:31             ` Johannes Schindelin
2006-02-12 19:13       ` Junio C Hamano
2006-02-12 18:16 ` Linus Torvalds
2006-02-12 18:18   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602121037460.3691@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).