git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: gitster@pobox.com, linux@horizon.com
Cc: git@vger.kernel.org
Subject: Re: [PATCH 07/11] object: try naive cuckoo hashing
Date: 13 Aug 2011 15:12:06 -0400	[thread overview]
Message-ID: <20110813191206.25129.qmail@science.horizon.com> (raw)
In-Reply-To: <20110813102244.9033.qmail@science.horizon.com>

I've been doing a lot of reading on Cuckoo hashing.

Yes, the single-table variant is described and used.  However, the
insertion procedure is not the way you do it.

Also, d-ary Cuckoo hashing (also called d-Cuckoo hashing) where you use
more than 2 hash functions is also used.

The insertion algorithm, however, is not really agreed on.

One algorithm (mostly proposed for hardware) uses d separate tables.
Every entry displaced from table i is displaced to table i+1 (mod d).
http://infoscience.epfl.ch/record/164147/files/cuckoo_dir_hpca2011_camera_ready.pdf

In the single-table case, and in general, however, a displaced entry
has more than one possible new location.  This leads to the question of how
to choose.

One proposal is to do a breadth-first search looking for a path to
a free slot.  It's provable that this will succeed with high probability
before the exponential growth of the breadth of the search tree
gets too bad.
See "Space Efficient Hash Tables with Worst Case Constant Access Time"
http://www.itu.dk/people/pagh/papers/d-cuckoo-jour.pdf

Another suggested tehcnique is to just pick an alternative at random
and proceed.  This is recommended in e.g.
"Efficient Hash Probes on Modern Processors"
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.1189&rep=rep1&type=pdf

Both of these lead to rather complex implementations.  The random number
generator is probably simpler than the breadth-first search, but either way
there's a bunch of auxiliary code.

Sticking with two hash functions, but using multi-entry buckets is
definitely an attractive possibility.

  reply	other threads:[~2011-08-13 19:12 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-13 10:22 [PATCH 07/11] object: try naive cuckoo hashing George Spelvin
2011-08-13 19:12 ` George Spelvin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2011-08-11 17:53 [PATCH 00/11] Micro-optimizing lookup_object() Junio C Hamano
2011-08-11 17:53 ` [PATCH 07/11] object: try naive cuckoo hashing Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110813191206.25129.qmail@science.horizon.com \
    --to=linux@horizon.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).