netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	netdev@vger.kernel.org, eric.dumazet@gmail.com,
	paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
Date: Sat, 15 Nov 2014 11:12:20 -0800	[thread overview]
Message-ID: <20141115191219.GA19060@thin> (raw)
In-Reply-To: <20141115111626.GP19157@casper.infradead.org>

On Sat, Nov 15, 2014 at 11:16:26AM +0000, Thomas Graf wrote:
> On 11/15/14 at 11:25am, Herbert Xu wrote:
> > On Thu, Nov 13, 2014 at 06:43:43PM +0800, Herbert Xu wrote:
> > > On Thu, Nov 13, 2014 at 10:41:24AM +0000, Thomas Graf wrote:
> > > >
> > > > Never mind. You did fix it. I looked at the wrong patch.
> > > 
> > > OK.  Now comes the fun part of shoehorning the xfrm_policy bydst
> > > hash into rhashtable :)
> > 
> > Thomas, it appears that rhashtable as it stands cannot handle
> > changing the random seed because of the way it constructs the
> > new hash table without degrading into a linked list.  Is there
> > something I'm missing?
> > 
> > FWIW my hashtable in net/bridge/br_multicast.c handles rehashing
> > correctly.  Any objections to me converting rhashtable to use my
> > scheme instead?
> 
> Can you elaborate a bit?
> 
> The point of rhashtable is to not require two sets of linked list
> pointers as done by MDB or OVS flow tables to work around the
> increased cache footprint of that approach. The difference of the
> two algos is dicussed in this paper [0].
> 
> The disadvantage of rhashtable is that, AFAIK, the hash function
> cannot change while resizing as it would break the mutual linked
> lists.

You can handle hash function changes with rhashtable without needing a
second set of linked-list pointers, actually.  You'd have to add ~1
unlikely() conditional to the reader common case, and you'd make readers
*during* a hash algorithm change (which I'd hope happens as rarely as a
resize) somewhat less efficient, but you wouldn't use any more memory
than a resize currently does, and you wouldn't use the memory of extra
linked-list pointers in the common case.

Rather than the current approach of switching out the hash table pointer
and having readers only search the new table (which will have
valid-but-crosslinked buckets during a resize), instead keep two hash
table pointers (each with their own hash parameters) and a single
toggleable old/new indicator.  Readers check both hash table pointers,
and if valid, check both tables, first old then new (order important to
make sure they don't miss a node).  Then, the rehashing algorithm can
incrementally move nodes from the old buckets to the new buckets,
*without* disrupting concurrent readers.

Rough rehashing algorithm sketch:

- Set up the new empty table with the new set of hash parameters.
- synchronize_rcu().  Readers will now search both old and new tables.
- Peel nodes off the ends of the old hash table and add them to the new
  hash table (similar to the unzip step in the resize algorithm).
  Because readers search old-then-new, and writers make each node appear
  in the new table before making it disappear from the old, you don't
  need a synchronize_rcu() here, just an smp_wmb() after linking the
  node into the new table and before unlinking the node from the old
  table.  Also, because you remove nodes from the *ends* of old-table
  buckets, you don't have to worry about a reader's linked-list walk
  getting dragged over to the new table and missing nodes from the old
  one.
- Once all nodes have moved to the new table, mark the pointer to the
  old table as NULL, synchronize_rcu(), and free the old table.
- Toggle the old/new flag.

Ordering principles in this algorithm:
- You want readers to see the changes made by the rehasher in order.
- If the reader reads location A then B, and the rehasher writes
  location B then A, the rehasher just needs an smp_wmb() in between.
- If the reader reads location A then B, and the rehasher writes
  location A then B, the rehasher needs a synchronize_rcu() in between.


- Josh Triplett

  parent reply	other threads:[~2014-11-15 19:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-13 10:10 [PATCH 0/4] rhashtable: Allow local locks to be used and tested Herbert Xu
2014-11-13 10:11 ` [PATCH 1/4] netlink: Move mutex_is_held under PROVE_LOCKING Herbert Xu
2014-11-13 10:11 ` [PATCH 2/4] netfilter: " Herbert Xu
2014-11-13 10:11 ` [PATCH 3/4] rhashtable: " Herbert Xu
2014-11-13 10:11 ` [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held Herbert Xu
2014-11-13 10:37   ` Thomas Graf
2014-11-13 10:38     ` Herbert Xu
2014-11-13 10:41       ` Thomas Graf
2014-11-13 10:43         ` Herbert Xu
2014-11-15  3:25           ` Herbert Xu
2014-11-15 11:16             ` Thomas Graf
2014-11-15 11:23               ` Herbert Xu
2014-11-15 15:51                 ` Herbert Xu
2014-11-17  6:20                   ` Thomas Graf
2014-11-15 19:12               ` Josh Triplett [this message]
2014-11-16  2:22                 ` Herbert Xu
2014-11-16  2:37                   ` Josh Triplett
2014-11-17  4:46                     ` Herbert Xu
2014-11-17  6:16                       ` Thomas Graf
2014-11-13 10:41 ` [PATCH 0/4] rhashtable: Allow local locks to be used and tested Thomas Graf
2014-11-13 20:13 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141115191219.GA19060@thin \
    --to=josh@joshtriplett.org \
    --cc=eric.dumazet@gmail.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).