public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org, rusty@rustcorp.com.au, oleg@redhat.com,
	paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
	andi@firstfloor.org, rostedt@goodmis.org, tglx@linutronix.de,
	Michel Lespinasse <walken@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Woodhouse <David.Woodhouse@intel.com>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
Date: Sun, 1 Mar 2015 13:52:09 +0000 (UTC)	[thread overview]
Message-ID: <249901737.193802.1425217929395.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20150228213110.129097991@infradead.org>

----- Original Message -----
> From: "Peter Zijlstra" <peterz@infradead.org>
> To: mingo@kernel.org, rusty@rustcorp.com.au, "mathieu desnoyers" <mathieu.desnoyers@efficios.com>, oleg@redhat.com,
> paulmck@linux.vnet.ibm.com
> Cc: linux-kernel@vger.kernel.org, andi@firstfloor.org, rostedt@goodmis.org, tglx@linutronix.de, peterz@infradead.org,
> "Michel Lespinasse" <walken@google.com>, "Andrea Arcangeli" <aarcange@redhat.com>, "David Woodhouse"
> <David.Woodhouse@intel.com>, "Rik van Riel" <riel@redhat.com>
> Sent: Saturday, February 28, 2015 4:24:52 PM
> Subject: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
> 
> Change the insert and erase code such that lockless searches are
> non-fatal.
> 
> In and of itself an rbtree cannot be correctly searched while
> in-modification, we can however provide weaker guarantees that will
> allow the rbtree to be used in conjunction with other techniques, such
> as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").
> 
> For this to work we need the following guarantees from the rbtree
> code:
> 
>  1) a lockless reader must not see partial stores, this would allow it
>     to observe nodes that are invalid memory.
> 
>  2) there must not be (temporary) loops in the tree structure in the
>     modifier's program order, this would cause a lookup which
>     interrupts the modifier to get stuck indefinitely.

For (2), I don't think this is how the situation should be described.

Let's consider a scenario where an interrupt nests over the modification.
First, the modification will switch the latch to the other version of the
tree. Therefore, the interrupt will see a fully consistent tree, not the
tree being modified. Therefore, a temporary loop in the tree should not
be an issue for that peculiar situation.

However, if we have another thread traversing the tree while we
concurrently switch the latch and modify one version of the tree,
creating a temporary loop in the tree, this thread could possibly:

A) deadlock with the modification if there is a locking dependency
   between tree modification, tree read, and another lock (transitive
   dependency).
B) have the modifier starved by the other thread, if that thread has
   a higher scheduling priority (e.g. RT) than the modifier. The high
   priority thread would then use all its CPU time to perform the
   temporary loop.

So I agree that loops are unwanted there: it allows us to never have
to care about situations A and B. However, the explanation about why
should not involve, AFAIU, an interrupt handler nesting over the tree
modification, because this is precisely one scenario that should not
care about loops.

Thoughs ?

Thanks!

Mathieu


> 
> For 1) we must use WRITE_ONCE() for all updates to the tree structure;
> in particular this patch only does rb_{left,right} as those are the
> only element required for simple searches.
> 
> It generates slightly worse code, probably because gcc stinks at
> volatile. But in pointer chasing heavy code a few instructions more
> should not matter.
> 
> For 2) I have carefully audited the code and drawn every intermediate
> link state and not found a loop.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/rbtree.h           |   10 +++++
>  include/linux/rbtree_augmented.h |   21 +++++++----
>  lib/rbtree.c                     |   71
>  ++++++++++++++++++++++++++-------------
>  3 files changed, 73 insertions(+), 29 deletions(-)
> 
> --- a/include/linux/rbtree.h
> +++ b/include/linux/rbtree.h
> @@ -31,6 +31,7 @@
>  
>  #include <linux/kernel.h>
>  #include <linux/stddef.h>
> +#include <linux/rcupdate.h>
>  
>  struct rb_node {
>  	unsigned long  __rb_parent_color;
> @@ -85,6 +86,15 @@ static inline void rb_link_node(struct r
>  	*rb_link = node;
>  }
>  
> +static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node *
> parent,
> +				    struct rb_node ** rb_link)
> +{
> +	node->__rb_parent_color = (unsigned long)parent;
> +	node->rb_left = node->rb_right = NULL;
> +
> +	rcu_assign_pointer(*rb_link, node);
> +}
> +
>  #define rb_entry_safe(ptr, type, member) \
>  	({ typeof(ptr) ____ptr = (ptr); \
>  	   ____ptr ? rb_entry(____ptr, type, member) : NULL; \
> --- a/include/linux/rbtree_augmented.h
> +++ b/include/linux/rbtree_augmented.h
> @@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s
>  {
>  	if (parent) {
>  		if (parent->rb_left == old)
> -			parent->rb_left = new;
> +			WRITE_ONCE(parent->rb_left, new);
>  		else
> -			parent->rb_right = new;
> +			WRITE_ONCE(parent->rb_right, new);
>  	} else
> -		root->rb_node = new;
> +		WRITE_ONCE(root->rb_node, new);
>  }
>  
>  extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
> @@ -137,7 +137,8 @@ static __always_inline struct rb_node *
>  __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>  		     const struct rb_augment_callbacks *augment)
>  {
> -	struct rb_node *child = node->rb_right, *tmp = node->rb_left;
> +	struct rb_node *child = node->rb_right;
> +	struct rb_node *tmp = node->rb_left;
>  	struct rb_node *parent, *rebalance;
>  	unsigned long pc;
>  
> @@ -167,6 +168,7 @@ __rb_erase_augmented(struct rb_node *nod
>  		tmp = parent;
>  	} else {
>  		struct rb_node *successor = child, *child2;
> +
>  		tmp = child->rb_left;
>  		if (!tmp) {
>  			/*
> @@ -180,6 +182,7 @@ __rb_erase_augmented(struct rb_node *nod
>  			 */
>  			parent = successor;
>  			child2 = successor->rb_right;
> +
>  			augment->copy(node, successor);
>  		} else {
>  			/*
> @@ -201,19 +204,23 @@ __rb_erase_augmented(struct rb_node *nod
>  				successor = tmp;
>  				tmp = tmp->rb_left;
>  			} while (tmp);
> -			parent->rb_left = child2 = successor->rb_right;
> -			successor->rb_right = child;
> +			child2 = successor->rb_right;
> +			WRITE_ONCE(parent->rb_left, child2);
> +			WRITE_ONCE(successor->rb_right, child);
>  			rb_set_parent(child, successor);
> +
>  			augment->copy(node, successor);
>  			augment->propagate(parent, successor);
>  		}
>  
> -		successor->rb_left = tmp = node->rb_left;
> +		tmp = node->rb_left;
> +		WRITE_ONCE(successor->rb_left, tmp);
>  		rb_set_parent(tmp, successor);
>  
>  		pc = node->__rb_parent_color;
>  		tmp = __rb_parent(pc);
>  		__rb_change_child(node, successor, tmp, root);
> +
>  		if (child2) {
>  			successor->__rb_parent_color = pc;
>  			rb_set_parent_color(child2, parent, RB_BLACK);
> --- a/lib/rbtree.c
> +++ b/lib/rbtree.c
> @@ -44,6 +44,25 @@
>   *  parentheses and have some accompanying text comment.
>   */
>  
> +/*
> + * All stores to the tree structure (rb_left and rb_right) must be done
> using
> + * WRITE_ONCE(). And we must not inadvertently cause (temporary) loops in
> the
> + * tree structure as seen in program order.
> + *
> + * These two requirements will allow lockless iteration of the tree -- not
> + * correct iteration mind you, tree rotations are not atomic so a lookup
> might
> + * miss entire subtrees.
> + *
> + * But they do guarantee that any such traversal will only see valid
> elements
> + * and that it will indeed complete -- does not get stuck in a loop.
> + *
> + * NOTE:
> + *
> + * Stores to __rb_parent_color are not important for simple lookups so those
> + * are left undone as of now. Nor did I check for loops involving parent
> + * pointers.
> + */
> +
>  static inline void rb_set_black(struct rb_node *rb)
>  {
>  	rb->__rb_parent_color |= RB_BLACK;
> @@ -129,8 +148,9 @@ __rb_insert(struct rb_node *node, struct
>  				 * This still leaves us in violation of 4), the
>  				 * continuation into Case 3 will fix that.
>  				 */
> -				parent->rb_right = tmp = node->rb_left;
> -				node->rb_left = parent;
> +				tmp = node->rb_left;
> +				WRITE_ONCE(parent->rb_right, tmp);
> +				WRITE_ONCE(node->rb_left, parent);
>  				if (tmp)
>  					rb_set_parent_color(tmp, parent,
>  							    RB_BLACK);
> @@ -149,8 +169,8 @@ __rb_insert(struct rb_node *node, struct
>  			 *     /                 \
>  			 *    n                   U
>  			 */
> -			gparent->rb_left = tmp;  /* == parent->rb_right */
> -			parent->rb_right = gparent;
> +			WRITE_ONCE(gparent->rb_left, tmp); /* == parent->rb_right */
> +			WRITE_ONCE(parent->rb_right, gparent);
>  			if (tmp)
>  				rb_set_parent_color(tmp, gparent, RB_BLACK);
>  			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
> @@ -171,8 +191,9 @@ __rb_insert(struct rb_node *node, struct
>  			tmp = parent->rb_left;
>  			if (node == tmp) {
>  				/* Case 2 - right rotate at parent */
> -				parent->rb_left = tmp = node->rb_right;
> -				node->rb_right = parent;
> +				tmp = node->rb_right;
> +				WRITE_ONCE(parent->rb_left, tmp);
> +				WRITE_ONCE(node->rb_right, parent);
>  				if (tmp)
>  					rb_set_parent_color(tmp, parent,
>  							    RB_BLACK);
> @@ -183,8 +204,8 @@ __rb_insert(struct rb_node *node, struct
>  			}
>  
>  			/* Case 3 - left rotate at gparent */
> -			gparent->rb_right = tmp;  /* == parent->rb_left */
> -			parent->rb_left = gparent;
> +			WRITE_ONCE(gparent->rb_right, tmp); /* == parent->rb_left */
> +			WRITE_ONCE(parent->rb_left, gparent);
>  			if (tmp)
>  				rb_set_parent_color(tmp, gparent, RB_BLACK);
>  			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
> @@ -224,8 +245,9 @@ ____rb_erase_color(struct rb_node *paren
>  				 *      / \         / \
>  				 *     Sl  Sr      N   Sl
>  				 */
> -				parent->rb_right = tmp1 = sibling->rb_left;
> -				sibling->rb_left = parent;
> +				tmp1 = sibling->rb_left;
> +				WRITE_ONCE(parent->rb_right, tmp1);
> +				WRITE_ONCE(sibling->rb_left, parent);
>  				rb_set_parent_color(tmp1, parent, RB_BLACK);
>  				__rb_rotate_set_parents(parent, sibling, root,
>  							RB_RED);
> @@ -275,9 +297,10 @@ ____rb_erase_color(struct rb_node *paren
>  				 *                       \
>  				 *                        Sr
>  				 */
> -				sibling->rb_left = tmp1 = tmp2->rb_right;
> -				tmp2->rb_right = sibling;
> -				parent->rb_right = tmp2;
> +				tmp1 = tmp2->rb_right;
> +				WRITE_ONCE(sibling->rb_left, tmp1);
> +				WRITE_ONCE(tmp2->rb_right, sibling);
> +				WRITE_ONCE(parent->rb_right, tmp2);
>  				if (tmp1)
>  					rb_set_parent_color(tmp1, sibling,
>  							    RB_BLACK);
> @@ -297,8 +320,9 @@ ____rb_erase_color(struct rb_node *paren
>  			 *        / \         / \
>  			 *      (sl) sr      N  (sl)
>  			 */
> -			parent->rb_right = tmp2 = sibling->rb_left;
> -			sibling->rb_left = parent;
> +			tmp2 = sibling->rb_left;
> +			WRITE_ONCE(parent->rb_right, tmp2);
> +			WRITE_ONCE(sibling->rb_left, parent);
>  			rb_set_parent_color(tmp1, sibling, RB_BLACK);
>  			if (tmp2)
>  				rb_set_parent(tmp2, parent);
> @@ -310,8 +334,9 @@ ____rb_erase_color(struct rb_node *paren
>  			sibling = parent->rb_left;
>  			if (rb_is_red(sibling)) {
>  				/* Case 1 - right rotate at parent */
> -				parent->rb_left = tmp1 = sibling->rb_right;
> -				sibling->rb_right = parent;
> +				tmp1 = sibling->rb_right;
> +				WRITE_ONCE(parent->rb_left, tmp1);
> +				WRITE_ONCE(sibling->rb_right, parent);
>  				rb_set_parent_color(tmp1, parent, RB_BLACK);
>  				__rb_rotate_set_parents(parent, sibling, root,
>  							RB_RED);
> @@ -336,9 +361,10 @@ ____rb_erase_color(struct rb_node *paren
>  					break;
>  				}
>  				/* Case 3 - right rotate at sibling */
> -				sibling->rb_right = tmp1 = tmp2->rb_left;
> -				tmp2->rb_left = sibling;
> -				parent->rb_left = tmp2;
> +				tmp1 = tmp2->rb_left;
> +				WRITE_ONCE(sibling->rb_right, tmp1);
> +				WRITE_ONCE(tmp2->rb_left, sibling);
> +				WRITE_ONCE(parent->rb_left, tmp2);
>  				if (tmp1)
>  					rb_set_parent_color(tmp1, sibling,
>  							    RB_BLACK);
> @@ -347,8 +373,9 @@ ____rb_erase_color(struct rb_node *paren
>  				sibling = tmp2;
>  			}
>  			/* Case 4 - left rotate at parent + color flips */
> -			parent->rb_left = tmp2 = sibling->rb_right;
> -			sibling->rb_right = parent;
> +			tmp2 = sibling->rb_right;
> +			WRITE_ONCE(parent->rb_left, tmp2);
> +			WRITE_ONCE(sibling->rb_right, parent);
>  			rb_set_parent_color(tmp1, sibling, RB_BLACK);
>  			if (tmp2)
>  				rb_set_parent(tmp2, parent);
> 
> 
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2015-03-01 13:52 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-28 21:24 [RFC][PATCH 0/9] latched RB-trees and __module_address() Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 1/9] klp: Fix obvious RCU fail Peter Zijlstra
2015-03-01 20:09   ` Jiri Kosina
2015-03-02  8:35     ` Peter Zijlstra
2015-03-02  9:13       ` Jiri Kosina
2015-03-02 10:00         ` Peter Zijlstra
2015-03-02  9:21       ` Petr Mladek
2015-03-02  1:31   ` Masami Hiramatsu
2015-03-02 19:21   ` Paul E. McKenney
2015-03-02 21:07   ` Josh Poimboeuf
2015-02-28 21:24 ` [RFC][PATCH 2/9] module: Sanitize RCU usage and locking Peter Zijlstra
2015-03-02 11:16   ` Rusty Russell
2015-03-02 12:37     ` Peter Zijlstra
2015-03-02 19:37   ` Paul E. McKenney
2015-03-17 17:13     ` Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 3/9] module: Annotate module version magic Peter Zijlstra
2015-03-02 19:38   ` Paul E. McKenney
2015-02-28 21:24 ` [RFC][PATCH 4/9] module, jump_label: Fix module locking Peter Zijlstra
2015-03-02 19:39   ` Paul E. McKenney
2015-02-28 21:24 ` [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal Peter Zijlstra
2015-03-01 13:52   ` Mathieu Desnoyers [this message]
2015-03-02  8:27     ` Peter Zijlstra
2015-03-01 21:11   ` Michel Lespinasse
2015-03-02  7:46     ` Ingo Molnar
2015-03-02  8:23     ` Peter Zijlstra
2015-03-02  9:53       ` Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 6/9] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
2015-03-01 14:02   ` Mathieu Desnoyers
2015-03-02  8:33     ` Peter Zijlstra
2015-03-02  8:51       ` Peter Zijlstra
2015-03-02 19:46         ` Paul E. McKenney
2015-03-01 21:12   ` Michel Lespinasse
2015-02-28 21:24 ` [RFC][PATCH 7/9] rbtree: Implement generic latch_tree Peter Zijlstra
2015-03-01 21:17   ` Michel Lespinasse
2015-03-02  8:05     ` Peter Zijlstra
2015-03-02 19:53   ` Paul E. McKenney
2015-03-17 17:24     ` Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 8/9] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
2015-02-28 21:24 ` [RFC][PATCH 9/9] module: Use __module_address() for module_address_lookup() Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=249901737.193802.1425217929395.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=David.Woodhouse@intel.com \
    --cc=aarcange@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox