From: NeilBrown <neilb@suse.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Thomas Graf <tgraf@suug.ch>,
netdev@vger.kernel.org,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion.
Date: Fri, 15 Mar 2019 17:46:37 +1100 [thread overview]
Message-ID: <87o96cocs2.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20190315054709.mlcctcv3qy57wini@gondor.apana.org.au>
[-- Attachment #1: Type: text/plain, Size: 9732 bytes --]
On Fri, Mar 15 2019, Herbert Xu wrote:
> On Thu, Mar 14, 2019 at 04:05:28PM +1100, NeilBrown wrote:
>> rhashtable_try_insert() currently holds a lock on the bucket in
>> the first table, while also locking buckets in subsequent tables.
>> This is unnecessary and looks like a hold-over from some earlier
>> version of the implementation.
>>
>> As insert and remove always lock a bucket in each table in turn, and
>> as insert only inserts in the final table, there cannot be any races
>> that are not covered by simply locking a bucket in each table in turn.
>>
>> When an insert call reaches that last table it can be sure that there
>> is no matchinf entry in any other table as it has searched them all, and
>> insertion never happens anywhere but in the last table. The fact that
>> code tests for the existence of future_tbl while holding a lock on
>> the relevant bucket ensures that two threads inserting the same key
>> will make compatible decisions about which is the "last" table.
>>
>> This simplifies the code and allows the ->rehash field to be
>> discarded.
>>
>> We still need a way to ensure that a dead bucket_table is never
>> re-linked by rhashtable_walk_stop(). This can be achieved by calling
>> call_rcu() inside the locked region, and checking with
>> rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
>> bucket table is empty and dead.
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> include/linux/rhashtable.h | 13 -----------
>> lib/rhashtable.c | 50 +++++++++++++-------------------------------
>> 2 files changed, 15 insertions(+), 48 deletions(-)
>
> Thanks! This looks very nice.
>
>> @@ -580,36 +583,14 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
>> struct bucket_table *new_tbl;
>> struct bucket_table *tbl;
>> unsigned int hash;
>> - spinlock_t *lock;
>> void *data;
>>
>> - tbl = rcu_dereference(ht->tbl);
>> -
>> - /* All insertions must grab the oldest table containing
>> - * the hashed bucket that is yet to be rehashed.
>> - */
>> - for (;;) {
>> - hash = rht_head_hashfn(ht, tbl, obj, ht->p);
>> - lock = rht_bucket_lock(tbl, hash);
>> - spin_lock_bh(lock);
>> -
>> - if (tbl->rehash <= hash)
>> - break;
>> -
>> - spin_unlock_bh(lock);
>> - tbl = rht_dereference_rcu(tbl->future_tbl, ht);
>> - }
>> -
>> - data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
>> - new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
>> - if (PTR_ERR(new_tbl) != -EEXIST)
>> - data = ERR_CAST(new_tbl);
>> + new_tbl = rcu_dereference(ht->tbl);
>>
>> - while (!IS_ERR_OR_NULL(new_tbl)) {
>> + do {
>> tbl = new_tbl;
>> hash = rht_head_hashfn(ht, tbl, obj, ht->p);
>> - spin_lock_nested(rht_bucket_lock(tbl, hash),
>> - SINGLE_DEPTH_NESTING);
>> + spin_lock(rht_bucket_lock(tbl, hash));
>
> One small problem. I think this needs to be spin_lock_bh.
Yes, thanks for the catching that - and the match spin_unlock needs
fixing too. See below.
Thanks,
NeilBrown
From: NeilBrown <neilb@suse.com>
Date: Mon, 25 Jun 2018 08:09:19 +1000
Subject: [PATCH] rhashtable: don't hold lock on first table throughout
insertion.
rhashtable_try_insert() currently holds a lock on the bucket in
the first table, while also locking buckets in subsequent tables.
This is unnecessary and looks like a hold-over from some earlier
version of the implementation.
As insert and remove always lock a bucket in each table in turn, and
as insert only inserts in the final table, there cannot be any races
that are not covered by simply locking a bucket in each table in turn.
When an insert call reaches that last table it can be sure that there
is no matchinf entry in any other table as it has searched them all, and
insertion never happens anywhere but in the last table. The fact that
code tests for the existence of future_tbl while holding a lock on
the relevant bucket ensures that two threads inserting the same key
will make compatible decisions about which is the "last" table.
This simplifies the code and allows the ->rehash field to be
discarded.
We still need a way to ensure that a dead bucket_table is never
re-linked by rhashtable_walk_stop(). This can be achieved by calling
call_rcu() inside the locked region, and checking with
rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
bucket table is empty and dead.
Signed-off-by: NeilBrown <neilb@suse.com>
---
include/linux/rhashtable.h | 13 ------------
lib/rhashtable.c | 52 ++++++++++++++--------------------------------
2 files changed, 16 insertions(+), 49 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index ae9c0f71f311..3864193d5e2e 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -63,7 +63,6 @@
struct bucket_table {
unsigned int size;
unsigned int nest;
- unsigned int rehash;
u32 hash_rnd;
unsigned int locks_mask;
spinlock_t *locks;
@@ -776,12 +775,6 @@ static inline int rhltable_insert(
* @obj: pointer to hash head inside object
* @params: hash table parameters
*
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
* This lookup function may only be used for fixed key hash table (key_len
* parameter set). It will BUG() if used inappropriately.
*
@@ -837,12 +830,6 @@ static inline void *rhashtable_lookup_get_insert_fast(
* @obj: pointer to hash head inside object
* @params: hash table parameters
*
- * Locks down the bucket chain in both the old and new table if a resize
- * is in progress to ensure that writers can't remove from the old table
- * and can't insert to the new table during the atomic operation of search
- * and insertion. Searches for duplicates in both the old and new table if
- * a resize is in progress.
- *
* Lookups may occur in parallel with hashtable mutations and resizing.
*
* Will trigger an automatic deferred table resizing if residency in the
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index c983c0ee15d5..8711c694e359 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -199,6 +199,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
return NULL;
}
+ rcu_head_init(&tbl->rcu);
INIT_LIST_HEAD(&tbl->walkers);
tbl->hash_rnd = get_random_u32();
@@ -282,10 +283,9 @@ static int rhashtable_rehash_chain(struct rhashtable *ht,
while (!(err = rhashtable_rehash_one(ht, old_hash)))
;
- if (err == -ENOENT) {
- old_tbl->rehash++;
+ if (err == -ENOENT)
err = 0;
- }
+
spin_unlock_bh(old_bucket_lock);
return err;
@@ -332,13 +332,16 @@ static int rhashtable_rehash_table(struct rhashtable *ht)
spin_lock(&ht->lock);
list_for_each_entry(walker, &old_tbl->walkers, list)
walker->tbl = NULL;
- spin_unlock(&ht->lock);
/* Wait for readers. All new readers will see the new
* table, and thus no references to the old table will
* remain.
+ * We do this inside the locked region so that
+ * rhashtable_walk_stop() can use rcu_head_after_call_rcu()
+ * to check if it should not re-link the table.
*/
call_rcu(&old_tbl->rcu, bucket_table_free_rcu);
+ spin_unlock(&ht->lock);
return rht_dereference(new_tbl->future_tbl, ht) ? -EAGAIN : 0;
}
@@ -580,46 +583,22 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
struct bucket_table *new_tbl;
struct bucket_table *tbl;
unsigned int hash;
- spinlock_t *lock;
void *data;
- tbl = rcu_dereference(ht->tbl);
-
- /* All insertions must grab the oldest table containing
- * the hashed bucket that is yet to be rehashed.
- */
- for (;;) {
- hash = rht_head_hashfn(ht, tbl, obj, ht->p);
- lock = rht_bucket_lock(tbl, hash);
- spin_lock_bh(lock);
-
- if (tbl->rehash <= hash)
- break;
-
- spin_unlock_bh(lock);
- tbl = rht_dereference_rcu(tbl->future_tbl, ht);
- }
-
- data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
- new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
- if (PTR_ERR(new_tbl) != -EEXIST)
- data = ERR_CAST(new_tbl);
+ new_tbl = rcu_dereference(ht->tbl);
- while (!IS_ERR_OR_NULL(new_tbl)) {
+ do {
tbl = new_tbl;
hash = rht_head_hashfn(ht, tbl, obj, ht->p);
- spin_lock_nested(rht_bucket_lock(tbl, hash),
- SINGLE_DEPTH_NESTING);
+ spin_lock_bh(rht_bucket_lock(tbl, hash));
data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
if (PTR_ERR(new_tbl) != -EEXIST)
data = ERR_CAST(new_tbl);
- spin_unlock(rht_bucket_lock(tbl, hash));
- }
-
- spin_unlock_bh(lock);
+ spin_unlock_bh(rht_bucket_lock(tbl, hash));
+ } while (!IS_ERR_OR_NULL(new_tbl));
if (PTR_ERR(data) == -EAGAIN)
data = ERR_PTR(rhashtable_insert_rehash(ht, tbl) ?:
@@ -941,10 +920,11 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
ht = iter->ht;
spin_lock(&ht->lock);
- if (tbl->rehash < tbl->size)
- list_add(&iter->walker.list, &tbl->walkers);
- else
+ if (rcu_head_after_call_rcu(&tbl->rcu, bucket_table_free_rcu))
+ /* This bucket table is being freed, don't re-link it. */
iter->walker.tbl = NULL;
+ else
+ list_add(&iter->walker.list, &tbl->walkers);
spin_unlock(&ht->lock);
out:
--
2.14.0.rc0.dirty
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2019-03-15 6:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-14 5:05 [PATCH 0/3] Three rhashtable improvements NeilBrown
2019-03-14 5:05 ` [PATCH 2/3] rhashtable: don't hold lock on first table throughout insertion NeilBrown
2019-03-14 14:58 ` Paul E. McKenney
2019-03-15 5:47 ` Herbert Xu
2019-03-15 6:46 ` NeilBrown [this message]
2019-03-20 5:40 ` Herbert Xu
2019-03-14 5:05 ` [PATCH 1/3] rhashtable: use cmpxchg() in nested_table_alloc() NeilBrown
2019-03-15 5:10 ` Herbert Xu
2019-03-15 6:51 ` NeilBrown
2019-03-20 5:41 ` Herbert Xu
2019-03-14 5:05 ` [PATCH 3/3] rhashtable: rename rht_for_each*continue as *from NeilBrown
2019-03-15 5:49 ` Herbert Xu
2019-03-17 7:50 ` Miguel Ojeda
2019-03-18 0:15 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o96cocs2.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=herbert@gondor.apana.org.au \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).