From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: NeilBrown <neilb@suse.com>, Thomas Graf <tgraf@suug.ch>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/5] rhashtable: don't hold lock on first table throughout insertion.
Date: Fri, 20 Jul 2018 07:41:52 -0700 [thread overview]
Message-ID: <20180720144152.GW12945@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180720075409.kfckhodsnvktift7@gondor.apana.org.au>
On Fri, Jul 20, 2018 at 03:54:09PM +0800, Herbert Xu wrote:
> On Fri, Jul 06, 2018 at 05:22:30PM +1000, NeilBrown wrote:
> > rhashtable_try_insert() currently hold a lock on the bucket in
> > the first table, while also locking buckets in subsequent tables.
> > This is unnecessary and looks like a hold-over from some earlier
> > version of the implementation.
> >
> > As insert and remove always lock a bucket in each table in turn, and
> > as insert only inserts in the final table, there cannot be any races
> > that are not covered by simply locking a bucket in each table in turn.
> >
> > When an insert call reaches that last table it can be sure that there
> > is no match entry in any other table as it has searched them all, and
> > insertion never happens anywhere but in the last table. The fact that
> > code tests for the existence of future_tbl while holding a lock on
> > the relevant bucket ensures that two threads inserting the same key
> > will make compatible decisions about which is the "last" table.
> >
> > This simplifies the code and allows the ->rehash field to be
> > discarded.
> >
> > We still need a way to ensure that a dead bucket_table is never
> > re-linked by rhashtable_walk_stop(). This can be achieved by
> > calling call_rcu() inside the locked region, and checking
> > ->rcu.func in rhashtable_walk_stop(). If it is not NULL, then
> > the bucket table is empty and dead.
> >
> > Signed-off-by: NeilBrown <neilb@suse.com>
>
> ...
>
> > @@ -339,13 +338,16 @@ static int rhashtable_rehash_table(struct rhashtable *ht)
> > spin_lock(&ht->lock);
> > list_for_each_entry(walker, &old_tbl->walkers, list)
> > walker->tbl = NULL;
> > - spin_unlock(&ht->lock);
> >
> > /* Wait for readers. All new readers will see the new
> > * table, and thus no references to the old table will
> > * remain.
> > + * We do this inside the locked region so that
> > + * rhashtable_walk_stop() can check ->rcu.func and know
> > + * not to re-link the table.
> > */
> > call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
> > + spin_unlock(&ht->lock);
> >
> > return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
> > }
>
> ...
>
> > @@ -964,7 +942,7 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
> > ht = iter->ht;
> >
> > spin_lock(&ht->lock);
> > - if (tbl->rehash < tbl->size)
> > + if (tbl->rcu.func == NULL)
> > list_add(&iter->walker.list, &tbl->walkers);
> > else
> > iter->walker.tbl = NULL;
>
> This appears to be relying on implementation details within RCU.
> Paul, are you OK with rhashtable doing this trick?
The notion of accessing objects that are already on RCU's callback lists
makes me -very- nervous because this sort of thing is not easy to
get right. After all, if you are accessing something that is already
on one of RCU's callback lists, RCU might invoke the callback it at any
time (thus freeing it in this case), and because it is already on RCU's
callback lists, rcu_read_lock() is going to be of no help whatsoever.
In addition, RCU does no ordering on its store to ->func, but the ht->lock
compensates in this case. But suppose rhashtable_walk_stop() sees the
pointer as non-NULL. What prevents RCU from freeing the bucket table out
from under rhashtable_walk_stop()? In v4.17, bucket_table_free_rcu()
just does some calls to various deallocators, which does not provide
the necessary synchronization.
Does the rhashtable_iter structure use some trick to make this safe?
Or has synchronization been added to bucket_table_free_rcu()? Or is
some other trick in use?
Thanx, Paul
next prev parent reply other threads:[~2018-07-20 15:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-06 7:22 [PATCH 0/5] Rhashtable: convert to bit-spin locks NeilBrown
2018-07-06 7:22 ` [PATCH 5/5] rhashtable: add lockdep tracking to bucket bit-spin-locks NeilBrown
2018-07-06 7:22 ` [PATCH 1/5] rhashtable: use cmpxchg() in nested_table_alloc() NeilBrown
2018-07-06 7:22 ` [PATCH 4/5] rhashtable: use bit_spin_locks to protect hash bucket NeilBrown
2018-07-06 7:22 ` [PATCH 3/5] rhashtable: allow rht_bucket_var to return NULL NeilBrown
2018-07-06 7:22 ` [PATCH 2/5] rhashtable: don't hold lock on first table throughout insertion NeilBrown
2018-07-20 7:54 ` Herbert Xu
2018-07-20 14:41 ` Paul E. McKenney [this message]
2018-07-21 2:25 ` NeilBrown
2018-07-22 21:54 ` Paul E. McKenney
2018-07-22 23:13 ` NeilBrown
2018-07-23 20:56 ` Paul E. McKenney
2018-07-23 21:52 ` NeilBrown
2018-07-24 22:58 ` Paul E. McKenney
2018-07-25 4:53 ` NeilBrown
2018-07-25 15:22 ` Paul E. McKenney
2018-07-27 1:04 ` NeilBrown
2018-07-27 3:18 ` Paul E. McKenney
2018-07-27 14:57 ` Paul E. McKenney
2018-07-31 0:45 ` NeilBrown
2018-07-31 4:14 ` Paul E. McKenney
2018-07-31 5:04 ` NeilBrown
2018-07-31 14:44 ` Paul E. McKenney
2019-03-11 15:27 ` Paul E. McKenney
2019-03-11 21:50 ` NeilBrown
2019-03-11 22:10 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180720144152.GW12945@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=herbert@gondor.apana.org.au \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.com \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).