linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: David Howells <dhowells@redhat.com>
Cc: torvalds@osdl.org, akpm@linux-foundation.org,
	paulmck@linux.vnet.ibm.com, corbet@lwn.net,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH] radix_tree_tag_get() is not as safe as the docs make out [ver #2]
Date: Wed, 7 Apr 2010 18:17:33 +1000	[thread overview]
Message-ID: <20100407081733.GC5433@laptop> (raw)
In-Reply-To: <20100406213620.28251.90764.stgit@warthog.procyon.org.uk>

On Tue, Apr 06, 2010 at 10:36:20PM +0100, David Howells wrote:
> radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
> or radix_tree_tag_clear().  The problem is that the double tag_get() in
> radix_tree_tag_get():
> 
> 		if (!tag_get(node, tag, offset))
> 			saw_unset_tag = 1;
> 		if (height == 1) {
> 			int ret = tag_get(node, tag, offset);
> 
> may see the value change due to the action of set/clear.  RCU is no protection
> against this as no pointers are being changed, no nodes are being replaced
> according to a COW protocol - set/clear alter the node directly.
> 
> The documentation in linux/radix-tree.h, however, says that
> radix_tree_tag_get() is an exception to the rule that "any function modifying
> the tree or tags (...) must exclude other modifications, and exclude any
> functions reading the tree".
> 
> The problem is that the next statement in radix_tree_tag_get() checks that the
> tag doesn't vary over time:
> 
> 			BUG_ON(ret && saw_unset_tag);

Yep, it's just a simple bug because the pagecache hadn't been using
radix_tree_tag_get.

> 
> This has been seen happening in FS-Cache:
> 
> 	https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html
> 
> To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
> comments that the value of the tag may change whilst the RCU read lock is held,

I was thinking more of adding it to the radix-tree.h comment, because
this is true of all radix-tree functions when called under RCU. I don't
know it is considered "unreliable", it is just that concurrent
modifications may run unless the caller has excluded them. So the
value returned is reliably a value that has been observed.

Thanks for spotting this bug.

> and thus that the return value of radix_tree_tag_get() may not be relied upon
> unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
> running concurrently with it.
> 
> Reported-by: Romain DEGEZ <romain.degez@smartjog.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  include/linux/radix-tree.h |    7 +++++++
>  lib/radix-tree.c           |   12 ++++++------
>  2 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index c5da749..55ca73c 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -121,6 +121,13 @@ do {									\
>   * (Note, rcu_assign_pointer and rcu_dereference are not needed to control
>   * access to data items when inserting into or looking up from the radix tree)
>   *
> + * Note that the value returned by radix_tree_tag_get() may not be relied upon
> + * if only the RCU read lock is held.  Functions to set/clear tags and to
> + * delete nodes running concurrently with it may affect its result such that
> + * two consecutive reads in the same locked section may return different
> + * values.  If reliability is required, modification functions must also be
> + * excluded from concurrency.
> + *
>   * radix_tree_tagged is able to be called without locking or RCU.
>   */
>  
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 6b9670d..6a75327 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -556,6 +556,10 @@ EXPORT_SYMBOL(radix_tree_tag_clear);
>   *
>   *  0: tag not present or not set
>   *  1: tag set
> + *
> + * Note that the return value of this function may not be relied on, even if
> + * the RCU lock is held, unless tag modification and node deletion are excluded
> + * from concurrency.
>   */
>  int radix_tree_tag_get(struct radix_tree_root *root,
>  			unsigned long index, unsigned int tag)
> @@ -596,12 +600,8 @@ int radix_tree_tag_get(struct radix_tree_root *root,
>  		 */
>  		if (!tag_get(node, tag, offset))
>  			saw_unset_tag = 1;
> -		if (height == 1) {
> -			int ret = tag_get(node, tag, offset);
> -
> -			BUG_ON(ret && saw_unset_tag);
> -			return !!ret;
> -		}
> +		if (height == 1)
> +			return !!tag_get(node, tag, offset);
>  		node = rcu_dereference_raw(node->slots[offset]);
>  		shift -= RADIX_TREE_MAP_SHIFT;
>  		height--;

      parent reply	other threads:[~2010-04-07  8:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-06 21:36 [PATCH] radix_tree_tag_get() is not as safe as the docs make out [ver #2] David Howells
2010-04-06 21:38 ` David Howells
2010-04-07  8:17 ` Nick Piggin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100407081733.GC5433@laptop \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).