netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	David Miller <davem@davemloft.net>,
	Corey Minyard <minyard@acm.org>,
	Stephen Hemminger <shemminger@vyatta.com>,
	benny+usenet@amorsen.dk,
	Linux Netdev List <netdev@vger.kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	Evgeniy Polyakov <zbr@ioremap.net>,
	Christian Bell <christian@myri.com>
Subject: Re: [PATCH 4/3] rcu: documents rculist_nulls
Date: Wed, 19 Nov 2008 09:07:03 -0800	[thread overview]
Message-ID: <20081119170703.GB6753@linux.vnet.ibm.com> (raw)
In-Reply-To: <491C4FA8.4020704@cosmosbay.com>

On Thu, Nov 13, 2008 at 05:02:48PM +0100, Eric Dumazet wrote:
> Eric Dumazet a écrit :
>> Peter Zijlstra a écrit :
>>> So by not using some memory barriers (would be nice to have it
>>> illustrated which ones), we can race and end up on the wrong chain, in
>>> case that happens we detect this by using this per-chain terminator and
>>> try again.
>>>
>>> It would be really good to have it explained in the rculist_nulls.h
>>> comments what memory barriers are missing, what races they open, and how
>>> the this special terminator trick closes that race.
>> OK, maybe I should add a Documentation/RCU/rculist_nulls.txt file with
>> appropriate examples and documentation.
>> (Say the lookup/insert algorithms, with standard hlist and memory 
>> barriers,
>> and with hlist_nulls without those two memory barriers.
>
> [PATCH 4/3] rcu: documents rculist_nulls

Very good -- only one small suggestion below.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'
> end-of-list can help in some RCU algos.
>
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> Documentation/RCU/rculist_nulls.txt |  167 ++++++++++++++++++++++++++
> 1 files changed, 167 insertions(+)

> diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt
> new file mode 100644
> index 0000000..5db5549
> --- /dev/null
> +++ b/Documentation/RCU/rculist_nulls.txt
> @@ -0,0 +1,167 @@
> +Using hlist_nulls to protect read-mostly linked lists and
> +objects using SLAB_DESTROY_BY_RCU allocations.
> +
> +Please read the basics in Documentation/RCU/listRCU.txt
> +
> +Using special makers (called 'nulls') is a convenient way
> +to solve following problem :
> +
> +A typical RCU linked list managing objects which are
> +allocated with SLAB_DESTROY_BY_RCU kmem_cache can
> +use following algos :
> +
> +1) Lookup algo
> +--------------
> +rcu_read_lock()
> +begin:
> +obj = lockless_lookup(key);
> +if (obj) {
> +  if (!try_get_ref(obj)) // might fail for free objects
> +    goto begin;
> +  /*
> +   * Because a writer could delete object, and a writer could
> +   * reuse these object before the RCU grace period, we
> +   * must check key after geting the reference on object
> +   */
> +  if (obj->key != key) { // not the object we expected

In some cases, a "generation number" will be needed.  For example, there
are algorithms that must detect when an object has been removed and then
re-inserted with the same key.  One increments the generation number
on each free and sometimes also on each allocation.

> +     put_ref(obj);
> +     goto begin;
> +   }
> +}
> +rcu_read_unlock();
> +
> +Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
> +but a version with an additional memory barrier (smp_rmb())
> +
> +lockless_lookup(key)
> +{
> +   struct hlist_node *node, *next;
> +   for (pos = rcu_dereference((head)->first);
> +          pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
> +          ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); 
> +          pos = rcu_dereference(next))
> +      if (obj->key == key)
> +         return obj;
> +   return NULL;
> +
> +And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :
> +
> +   struct hlist_node *node;
> +   for (pos = rcu_dereference((head)->first);
> +		pos && ({ prefetch(pos->next); 1; }) &&
> +		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
> +		pos = rcu_dereference(pos->next))
> +      if (obj->key == key)
> +         return obj;
> +   return NULL;
> +}
> +
> +Quoting Corey Minyard :
> +
> +"If the object is moved from one list to another list in-between the
> + time the hash is calculated and the next field is accessed, and the
> + object has moved to the end of a new list, the traversal will not
> + complete properly on the list it should have, since the object will
> + be on the end of the new list and there's not a way to tell it's on a
> + new list and restart the list traversal.  I think that this can be
> + solved by pre-fetching the "next" field (with proper barriers) before
> + checking the key."
> +
> +2) Insert algo :
> +----------------
> +
> +We need to make sure a reader cannot read the new 'obj->obj_next' value
> +and previous value of 'obj->key'. Or else, an item could be deleted
> +from a chain, and inserted into another chain. If new chain was empty
> +before the move, 'next' pointer is NULL, and lockless reader can
> +not detect it missed following items in original chain.
> +
> +/*
> + * Please note that new inserts are done at the head of list,
> + * not in the middle or end.
> + */
> +obj = kmem_cache_alloc(...);
> +lock_chain(); // typically a spin_lock()
> +obj->key = key;
> +atomic_inc(&obj->refcnt);
> +/*
> + * we need to make sure obj->key is updated before obj->next
> + */
> +smp_wmb();
> +hlist_add_head_rcu(&obj->obj_node, list);
> +unlock_chain(); // typically a spin_unlock()
> +
> +
> +3) Remove algo
> +--------------
> +Nothing special here, we can use a standard RCU hlist deletion.
> +But thanks to SLAB_DESTROY_BY_RCU, beware a deleted object can be reused 
> +very very fast (before the end of RCU grace period)
> +
> +if (put_last_reference_on(obj) {
> +   lock_chain(); // typically a spin_lock()
> +   hlist_del_init_rcu(&obj->obj_node);
> +   unlock_chain(); // typically a spin_unlock()
> +   kmem_cache_free(cachep, obj);
> +}
> +
> +
> +
> +--------------------------------------------------------------------------
> +With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
> +and extra smp_wmb() in insert function.
> +
> +For example, if we choose to store the slot number as the 'nulls'
> +end-of-list marker for each slot of the hash table, we can detect
> +a race (some writer did a delete and/or a move of an object
> +to another chain) checking the final 'nulls' value if
> +the lookup met the end of chain. If final 'nulls' value
> +is not the slot number, then we must restart the lookup at
> +the begining. If the object was moved to same chain,
> +then the reader doesnt care : It might eventually
> +scan the list again without harm.
> +
> +
> +1) lookup algo
> +
> + head = &table[slot];
> + rcu_read_lock();
> +begin:
> + hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
> +   if (obj->key == key) {
> +      if (!try_get_ref(obj)) // might fail for free objects
> +         goto begin;
> +      if (obj->key != key) { // not the object we expected
> +         put_ref(obj);
> +         goto begin;
> +      }
> +  goto out;
> + }
> +/*
> + * if the nulls value we got at the end of this lookup is
> + * not the expected one, we must restart lookup.
> + * We probably met an item that was moved to another chain.
> + */
> + if (get_nulls_value(node) != slot)
> +   goto begin;
> + obj = NULL;
> +
> +out:
> + rcu_read_unlock();
> +
> +2) Insert function :
> +--------------------
> +
> +/*
> + * Please note that new inserts are done at the head of list,
> + * not in the middle or end.
> + */
> +obj = kmem_cache_alloc(cachep);
> +lock_chain(); // typically a spin_lock()
> +obj->key = key;
> +atomic_set(&obj->refcnt, 1);
> +/*
> + * insert obj in RCU way (readers might be traversing chain)
> + */
> +hlist_nulls_add_head_rcu(&obj->obj_node, list);
> +unlock_chain(); // typically a spin_unlock()


  parent reply	other threads:[~2008-11-19 17:07 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-06 18:50 [PATCH 3/3] Convert the UDP hash lock to RCU Corey Minyard
2008-10-06 21:22 ` Eric Dumazet
2008-10-06 21:40   ` David Miller
2008-10-06 23:08     ` Corey Minyard
2008-10-07  8:37       ` Evgeniy Polyakov
2008-10-07 14:16         ` Christoph Lameter
2008-10-07 14:29           ` Evgeniy Polyakov
2008-10-07 14:38             ` Christoph Lameter
2008-10-07 14:33           ` Paul E. McKenney
2008-10-07 14:45             ` Christoph Lameter
2008-10-07 15:07               ` Eric Dumazet
2008-10-07 15:07               ` Paul E. McKenney
2008-10-07  5:24     ` Eric Dumazet
2008-10-07  8:54       ` Benny Amorsen
2008-10-07 12:59         ` Eric Dumazet
2008-10-07 14:07           ` Stephen Hemminger
2008-10-07 20:55             ` David Miller
2008-10-07 21:20               ` Stephen Hemminger
2008-10-08 13:55               ` Eric Dumazet
2008-10-08 18:45                 ` David Miller
2008-10-28 20:37                   ` [PATCH 1/2] udp: introduce struct udp_table and multiple rwlocks Eric Dumazet
2008-10-28 21:23                     ` Christian Bell
2008-10-28 21:31                       ` Evgeniy Polyakov
2008-10-28 21:48                       ` Eric Dumazet
2008-10-28 21:28                     ` Evgeniy Polyakov
2008-10-28 20:42                   ` [PATCH 2/2] udp: RCU handling for Unicast packets Eric Dumazet
2008-10-28 22:45                     ` Eric Dumazet
2008-10-29  5:05                       ` David Miller
2008-10-29  8:23                         ` Eric Dumazet
2008-10-29  8:56                           ` David Miller
2008-10-29 10:19                             ` Eric Dumazet
2008-10-29 18:19                               ` David Miller
2008-10-29  9:04                           ` Eric Dumazet
2008-10-29  9:17                             ` David Miller
2008-10-29 13:17                             ` Corey Minyard
2008-10-29 14:36                               ` Eric Dumazet
2008-10-29 15:34                                 ` Corey Minyard
2008-10-29 16:09                                   ` Eric Dumazet
2008-10-29 16:37                                     ` Paul E. McKenney
2008-10-29 17:22                                       ` Corey Minyard
2008-10-29 17:45                                         ` Eric Dumazet
2008-10-29 18:28                                           ` Corey Minyard
2008-10-29 18:52                                             ` Paul E. McKenney
2008-10-29 20:00                                               ` Eric Dumazet
2008-10-29 20:17                                                 ` Paul E. McKenney
2008-10-29 21:29                                                   ` Corey Minyard
2008-10-29 21:57                                                     ` Eric Dumazet
2008-10-29 21:58                                                     ` Paul E. McKenney
2008-10-29 22:08                                                   ` Eric Dumazet
2008-10-30  3:22                                                     ` Corey Minyard
2008-10-30  5:50                                                       ` Eric Dumazet
2008-11-02  4:19                                                         ` David Miller
2008-10-30  5:40                                                     ` David Miller
2008-10-30  5:51                                                       ` Eric Dumazet
2008-10-30  7:04                                                         ` Eric Dumazet
2008-10-30  7:05                                                           ` David Miller
2008-10-30 15:40                                                     ` [PATCH] udp: Introduce special NULL pointers for hlist termination Eric Dumazet
2008-10-30 15:51                                                       ` Stephen Hemminger
2008-10-30 16:28                                                         ` Corey Minyard
2008-10-31 14:37                                                           ` Eric Dumazet
2008-10-31 14:55                                                             ` Pavel Emelyanov
2008-11-02  4:22                                                               ` David Miller
2008-10-30 17:12                                                         ` Eric Dumazet
2008-10-31  7:51                                                           ` David Miller
2008-10-30 16:01                                                       ` Peter Zijlstra
2008-10-31  0:14                                                       ` Keith Owens
2008-11-13 13:13                                                       ` [PATCH 0/3] net: RCU lookups for UDP, DCCP and TCP protocol Eric Dumazet
2008-11-13 17:20                                                         ` Andi Kleen
2008-11-17  3:41                                                         ` David Miller
2008-11-19 19:52                                                           ` Christoph Lameter
2008-11-13 13:14                                                       ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Eric Dumazet
2008-11-13 13:29                                                         ` Peter Zijlstra
2008-11-13 13:44                                                           ` Eric Dumazet
2008-11-13 16:02                                                             ` [PATCH 4/3] rcu: documents rculist_nulls Eric Dumazet
2008-11-14 15:16                                                               ` Peter Zijlstra
2008-11-17  3:36                                                                 ` David Miller
2008-11-19 17:07                                                               ` Paul E. McKenney [this message]
2008-11-14 15:16                                                         ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Peter Zijlstra
2008-11-19 17:01                                                         ` Paul E. McKenney
2008-11-19 17:53                                                           ` Eric Dumazet
2008-11-19 18:46                                                             ` Paul E. McKenney
2008-11-19 18:53                                                               ` Arnaldo Carvalho de Melo
2008-11-19 21:17                                                                 ` Paul E. McKenney
2008-11-19 20:39                                                               ` Eric Dumazet
2008-11-19 21:21                                                                 ` Paul E. McKenney
2008-11-13 13:15                                                       ` [PATCH 2/3] udp: Use hlist_nulls in UDP RCU code Eric Dumazet
2008-11-19 17:29                                                         ` Paul E. McKenney
2008-11-19 17:53                                                           ` Eric Dumazet
2008-11-13 13:15                                                       ` [PATCH 3/3] net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls Eric Dumazet
2008-11-13 13:34                                                         ` Peter Zijlstra
2008-11-13 13:51                                                           ` Eric Dumazet
2008-11-13 14:08                                                             ` Christoph Lameter
2008-11-13 14:22                                                             ` Peter Zijlstra
2008-11-13 14:27                                                               ` Christoph Lameter
2008-11-19 17:53                                                         ` Paul E. McKenney
2008-11-23  9:33                                                         ` [PATCH] net: Convert TCP/DCCP listening hash tables to use RCU Eric Dumazet
2008-11-23 15:59                                                           ` Paul E. McKenney
2008-11-23 18:42                                                             ` Eric Dumazet
2008-11-23 19:17                                                               ` Paul E. McKenney
2008-11-23 20:18                                                                 ` Eric Dumazet
2008-11-23 22:33                                                                   ` Paul E. McKenney
2008-11-24  1:23                                                           ` David Miller
2008-10-30 11:04                                                 ` [PATCH 2/2] udp: RCU handling for Unicast packets Peter Zijlstra
2008-10-30 11:30                                                   ` Eric Dumazet
2008-10-30 18:25                                                     ` Paul E. McKenney
2008-10-31 16:40                                                       ` Eric Dumazet
2008-11-01  3:10                                                         ` Paul E. McKenney
2008-10-29 17:32                                       ` Eric Dumazet
2008-10-29 18:11                                         ` Paul E. McKenney
2008-10-29 18:29                                           ` David Miller
2008-10-29 18:38                                             ` Paul E. McKenney
2008-10-29 18:36                                           ` Eric Dumazet
2008-10-29 18:20                                 ` David Miller
2008-10-30 11:12                                 ` Peter Zijlstra
2008-10-30 11:29                                   ` Eric Dumazet
2008-10-28 20:37                 ` [PATCH 0/2] udp: Convert the UDP hash lock to RCU Eric Dumazet
2008-10-28 21:28                   ` Stephen Hemminger
2008-10-28 21:50                     ` Eric Dumazet
2008-10-07 16:43           ` [PATCH 3/3] " Corey Minyard
2008-10-07 18:26       ` David Miller
2008-10-08  8:35         ` Eric Dumazet
2008-10-08 16:38           ` David Miller
2008-10-07  8:31     ` Peter Zijlstra
2008-10-07 14:36       ` Paul E. McKenney
2008-10-07 18:29       ` David Miller
2008-10-06 22:07   ` Corey Minyard
2008-10-07  8:17   ` Peter Zijlstra
2008-10-07  9:24     ` Eric Dumazet
2008-10-07 14:15       ` Christoph Lameter
2008-10-07 14:38         ` Paul E. McKenney
2008-10-07 14:50         ` Eric Dumazet
2008-10-07 15:05           ` Paul E. McKenney
2008-10-07 15:09           ` Peter Zijlstra
2008-10-07 15:23           ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081119170703.GB6753@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=benny+usenet@amorsen.dk \
    --cc=christian@myri.com \
    --cc=cl@linux-foundation.org \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=minyard@acm.org \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).