From: Eric Dumazet <dada1@cosmosbay.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>,
David Miller <davem@davemloft.net>
Cc: Corey Minyard <minyard@acm.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Stephen Hemminger <shemminger@vyatta.com>,
benny+usenet@amorsen.dk,
Linux Netdev List <netdev@vger.kernel.org>,
Christoph Lameter <cl@linux-foundation.org>,
Evgeniy Polyakov <zbr@ioremap.net>,
Christian Bell <christian@myri.com>
Subject: [PATCH 4/3] rcu: documents rculist_nulls
Date: Thu, 13 Nov 2008 17:02:48 +0100 [thread overview]
Message-ID: <491C4FA8.4020704@cosmosbay.com> (raw)
In-Reply-To: <491C2F27.2000908@cosmosbay.com>
[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]
Eric Dumazet a écrit :
> Peter Zijlstra a écrit :
>> So by not using some memory barriers (would be nice to have it
>> illustrated which ones), we can race and end up on the wrong chain, in
>> case that happens we detect this by using this per-chain terminator and
>> try again.
>>
>> It would be really good to have it explained in the rculist_nulls.h
>> comments what memory barriers are missing, what races they open, and how
>> the this special terminator trick closes that race.
>
> OK, maybe I should add a Documentation/RCU/rculist_nulls.txt file with
> appropriate examples and documentation.
>
> (Say the lookup/insert algorithms, with standard hlist and memory barriers,
> and with hlist_nulls without those two memory barriers.
>
[PATCH 4/3] rcu: documents rculist_nulls
Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'
end-of-list can help in some RCU algos.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
Documentation/RCU/rculist_nulls.txt | 167 ++++++++++++++++++++++++++
1 files changed, 167 insertions(+)
[-- Attachment #2: rculist_nulls_doc.patch --]
[-- Type: text/plain, Size: 5489 bytes --]
diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt
new file mode 100644
index 0000000..5db5549
--- /dev/null
+++ b/Documentation/RCU/rculist_nulls.txt
@@ -0,0 +1,167 @@
+Using hlist_nulls to protect read-mostly linked lists and
+objects using SLAB_DESTROY_BY_RCU allocations.
+
+Please read the basics in Documentation/RCU/listRCU.txt
+
+Using special makers (called 'nulls') is a convenient way
+to solve following problem :
+
+A typical RCU linked list managing objects which are
+allocated with SLAB_DESTROY_BY_RCU kmem_cache can
+use following algos :
+
+1) Lookup algo
+--------------
+rcu_read_lock()
+begin:
+obj = lockless_lookup(key);
+if (obj) {
+ if (!try_get_ref(obj)) // might fail for free objects
+ goto begin;
+ /*
+ * Because a writer could delete object, and a writer could
+ * reuse these object before the RCU grace period, we
+ * must check key after geting the reference on object
+ */
+ if (obj->key != key) { // not the object we expected
+ put_ref(obj);
+ goto begin;
+ }
+}
+rcu_read_unlock();
+
+Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
+but a version with an additional memory barrier (smp_rmb())
+
+lockless_lookup(key)
+{
+ struct hlist_node *node, *next;
+ for (pos = rcu_dereference((head)->first);
+ pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
+ pos = rcu_dereference(next))
+ if (obj->key == key)
+ return obj;
+ return NULL;
+
+And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :
+
+ struct hlist_node *node;
+ for (pos = rcu_dereference((head)->first);
+ pos && ({ prefetch(pos->next); 1; }) &&
+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
+ pos = rcu_dereference(pos->next))
+ if (obj->key == key)
+ return obj;
+ return NULL;
+}
+
+Quoting Corey Minyard :
+
+"If the object is moved from one list to another list in-between the
+ time the hash is calculated and the next field is accessed, and the
+ object has moved to the end of a new list, the traversal will not
+ complete properly on the list it should have, since the object will
+ be on the end of the new list and there's not a way to tell it's on a
+ new list and restart the list traversal. I think that this can be
+ solved by pre-fetching the "next" field (with proper barriers) before
+ checking the key."
+
+2) Insert algo :
+----------------
+
+We need to make sure a reader cannot read the new 'obj->obj_next' value
+and previous value of 'obj->key'. Or else, an item could be deleted
+from a chain, and inserted into another chain. If new chain was empty
+before the move, 'next' pointer is NULL, and lockless reader can
+not detect it missed following items in original chain.
+
+/*
+ * Please note that new inserts are done at the head of list,
+ * not in the middle or end.
+ */
+obj = kmem_cache_alloc(...);
+lock_chain(); // typically a spin_lock()
+obj->key = key;
+atomic_inc(&obj->refcnt);
+/*
+ * we need to make sure obj->key is updated before obj->next
+ */
+smp_wmb();
+hlist_add_head_rcu(&obj->obj_node, list);
+unlock_chain(); // typically a spin_unlock()
+
+
+3) Remove algo
+--------------
+Nothing special here, we can use a standard RCU hlist deletion.
+But thanks to SLAB_DESTROY_BY_RCU, beware a deleted object can be reused
+very very fast (before the end of RCU grace period)
+
+if (put_last_reference_on(obj) {
+ lock_chain(); // typically a spin_lock()
+ hlist_del_init_rcu(&obj->obj_node);
+ unlock_chain(); // typically a spin_unlock()
+ kmem_cache_free(cachep, obj);
+}
+
+
+
+--------------------------------------------------------------------------
+With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
+and extra smp_wmb() in insert function.
+
+For example, if we choose to store the slot number as the 'nulls'
+end-of-list marker for each slot of the hash table, we can detect
+a race (some writer did a delete and/or a move of an object
+to another chain) checking the final 'nulls' value if
+the lookup met the end of chain. If final 'nulls' value
+is not the slot number, then we must restart the lookup at
+the begining. If the object was moved to same chain,
+then the reader doesnt care : It might eventually
+scan the list again without harm.
+
+
+1) lookup algo
+
+ head = &table[slot];
+ rcu_read_lock();
+begin:
+ hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
+ if (obj->key == key) {
+ if (!try_get_ref(obj)) // might fail for free objects
+ goto begin;
+ if (obj->key != key) { // not the object we expected
+ put_ref(obj);
+ goto begin;
+ }
+ goto out;
+ }
+/*
+ * if the nulls value we got at the end of this lookup is
+ * not the expected one, we must restart lookup.
+ * We probably met an item that was moved to another chain.
+ */
+ if (get_nulls_value(node) != slot)
+ goto begin;
+ obj = NULL;
+
+out:
+ rcu_read_unlock();
+
+2) Insert function :
+--------------------
+
+/*
+ * Please note that new inserts are done at the head of list,
+ * not in the middle or end.
+ */
+obj = kmem_cache_alloc(cachep);
+lock_chain(); // typically a spin_lock()
+obj->key = key;
+atomic_set(&obj->refcnt, 1);
+/*
+ * insert obj in RCU way (readers might be traversing chain)
+ */
+hlist_nulls_add_head_rcu(&obj->obj_node, list);
+unlock_chain(); // typically a spin_unlock()
next prev parent reply other threads:[~2008-11-13 16:04 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-06 18:50 [PATCH 3/3] Convert the UDP hash lock to RCU Corey Minyard
2008-10-06 21:22 ` Eric Dumazet
2008-10-06 21:40 ` David Miller
2008-10-06 23:08 ` Corey Minyard
2008-10-07 8:37 ` Evgeniy Polyakov
2008-10-07 14:16 ` Christoph Lameter
2008-10-07 14:29 ` Evgeniy Polyakov
2008-10-07 14:38 ` Christoph Lameter
2008-10-07 14:33 ` Paul E. McKenney
2008-10-07 14:45 ` Christoph Lameter
2008-10-07 15:07 ` Eric Dumazet
2008-10-07 15:07 ` Paul E. McKenney
2008-10-07 5:24 ` Eric Dumazet
2008-10-07 8:54 ` Benny Amorsen
2008-10-07 12:59 ` Eric Dumazet
2008-10-07 14:07 ` Stephen Hemminger
2008-10-07 20:55 ` David Miller
2008-10-07 21:20 ` Stephen Hemminger
2008-10-08 13:55 ` Eric Dumazet
2008-10-08 18:45 ` David Miller
2008-10-28 20:37 ` [PATCH 1/2] udp: introduce struct udp_table and multiple rwlocks Eric Dumazet
2008-10-28 21:23 ` Christian Bell
2008-10-28 21:31 ` Evgeniy Polyakov
2008-10-28 21:48 ` Eric Dumazet
2008-10-28 21:28 ` Evgeniy Polyakov
2008-10-28 20:42 ` [PATCH 2/2] udp: RCU handling for Unicast packets Eric Dumazet
2008-10-28 22:45 ` Eric Dumazet
2008-10-29 5:05 ` David Miller
2008-10-29 8:23 ` Eric Dumazet
2008-10-29 8:56 ` David Miller
2008-10-29 10:19 ` Eric Dumazet
2008-10-29 18:19 ` David Miller
2008-10-29 9:04 ` Eric Dumazet
2008-10-29 9:17 ` David Miller
2008-10-29 13:17 ` Corey Minyard
2008-10-29 14:36 ` Eric Dumazet
2008-10-29 15:34 ` Corey Minyard
2008-10-29 16:09 ` Eric Dumazet
2008-10-29 16:37 ` Paul E. McKenney
2008-10-29 17:22 ` Corey Minyard
2008-10-29 17:45 ` Eric Dumazet
2008-10-29 18:28 ` Corey Minyard
2008-10-29 18:52 ` Paul E. McKenney
2008-10-29 20:00 ` Eric Dumazet
2008-10-29 20:17 ` Paul E. McKenney
2008-10-29 21:29 ` Corey Minyard
2008-10-29 21:57 ` Eric Dumazet
2008-10-29 21:58 ` Paul E. McKenney
2008-10-29 22:08 ` Eric Dumazet
2008-10-30 3:22 ` Corey Minyard
2008-10-30 5:50 ` Eric Dumazet
2008-11-02 4:19 ` David Miller
2008-10-30 5:40 ` David Miller
2008-10-30 5:51 ` Eric Dumazet
2008-10-30 7:04 ` Eric Dumazet
2008-10-30 7:05 ` David Miller
2008-10-30 15:40 ` [PATCH] udp: Introduce special NULL pointers for hlist termination Eric Dumazet
2008-10-30 15:51 ` Stephen Hemminger
2008-10-30 16:28 ` Corey Minyard
2008-10-31 14:37 ` Eric Dumazet
2008-10-31 14:55 ` Pavel Emelyanov
2008-11-02 4:22 ` David Miller
2008-10-30 17:12 ` Eric Dumazet
2008-10-31 7:51 ` David Miller
2008-10-30 16:01 ` Peter Zijlstra
2008-10-31 0:14 ` Keith Owens
2008-11-13 13:13 ` [PATCH 0/3] net: RCU lookups for UDP, DCCP and TCP protocol Eric Dumazet
2008-11-13 17:20 ` Andi Kleen
2008-11-17 3:41 ` David Miller
2008-11-19 19:52 ` Christoph Lameter
2008-11-13 13:14 ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Eric Dumazet
2008-11-13 13:29 ` Peter Zijlstra
2008-11-13 13:44 ` Eric Dumazet
2008-11-13 16:02 ` Eric Dumazet [this message]
2008-11-14 15:16 ` [PATCH 4/3] rcu: documents rculist_nulls Peter Zijlstra
2008-11-17 3:36 ` David Miller
2008-11-19 17:07 ` Paul E. McKenney
2008-11-14 15:16 ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Peter Zijlstra
2008-11-19 17:01 ` Paul E. McKenney
2008-11-19 17:53 ` Eric Dumazet
2008-11-19 18:46 ` Paul E. McKenney
2008-11-19 18:53 ` Arnaldo Carvalho de Melo
2008-11-19 21:17 ` Paul E. McKenney
2008-11-19 20:39 ` Eric Dumazet
2008-11-19 21:21 ` Paul E. McKenney
2008-11-13 13:15 ` [PATCH 2/3] udp: Use hlist_nulls in UDP RCU code Eric Dumazet
2008-11-19 17:29 ` Paul E. McKenney
2008-11-19 17:53 ` Eric Dumazet
2008-11-13 13:15 ` [PATCH 3/3] net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls Eric Dumazet
2008-11-13 13:34 ` Peter Zijlstra
2008-11-13 13:51 ` Eric Dumazet
2008-11-13 14:08 ` Christoph Lameter
2008-11-13 14:22 ` Peter Zijlstra
2008-11-13 14:27 ` Christoph Lameter
2008-11-19 17:53 ` Paul E. McKenney
2008-11-23 9:33 ` [PATCH] net: Convert TCP/DCCP listening hash tables to use RCU Eric Dumazet
2008-11-23 15:59 ` Paul E. McKenney
2008-11-23 18:42 ` Eric Dumazet
2008-11-23 19:17 ` Paul E. McKenney
2008-11-23 20:18 ` Eric Dumazet
2008-11-23 22:33 ` Paul E. McKenney
2008-11-24 1:23 ` David Miller
2008-10-30 11:04 ` [PATCH 2/2] udp: RCU handling for Unicast packets Peter Zijlstra
2008-10-30 11:30 ` Eric Dumazet
2008-10-30 18:25 ` Paul E. McKenney
2008-10-31 16:40 ` Eric Dumazet
2008-11-01 3:10 ` Paul E. McKenney
2008-10-29 17:32 ` Eric Dumazet
2008-10-29 18:11 ` Paul E. McKenney
2008-10-29 18:29 ` David Miller
2008-10-29 18:38 ` Paul E. McKenney
2008-10-29 18:36 ` Eric Dumazet
2008-10-29 18:20 ` David Miller
2008-10-30 11:12 ` Peter Zijlstra
2008-10-30 11:29 ` Eric Dumazet
2008-10-28 20:37 ` [PATCH 0/2] udp: Convert the UDP hash lock to RCU Eric Dumazet
2008-10-28 21:28 ` Stephen Hemminger
2008-10-28 21:50 ` Eric Dumazet
2008-10-07 16:43 ` [PATCH 3/3] " Corey Minyard
2008-10-07 18:26 ` David Miller
2008-10-08 8:35 ` Eric Dumazet
2008-10-08 16:38 ` David Miller
2008-10-07 8:31 ` Peter Zijlstra
2008-10-07 14:36 ` Paul E. McKenney
2008-10-07 18:29 ` David Miller
2008-10-06 22:07 ` Corey Minyard
2008-10-07 8:17 ` Peter Zijlstra
2008-10-07 9:24 ` Eric Dumazet
2008-10-07 14:15 ` Christoph Lameter
2008-10-07 14:38 ` Paul E. McKenney
2008-10-07 14:50 ` Eric Dumazet
2008-10-07 15:05 ` Paul E. McKenney
2008-10-07 15:09 ` Peter Zijlstra
2008-10-07 15:23 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=491C4FA8.4020704@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=a.p.zijlstra@chello.nl \
--cc=benny+usenet@amorsen.dk \
--cc=christian@myri.com \
--cc=cl@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=minyard@acm.org \
--cc=netdev@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=shemminger@vyatta.com \
--cc=zbr@ioremap.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).