* [PATCH] radix_tree_tag_get() is not as safe as the docs make out
@ 2010-04-06 19:31 David Howells
2010-04-06 19:39 ` Linus Torvalds
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: David Howells @ 2010-04-06 19:31 UTC (permalink / raw)
To: torvalds, akpm, npiggin
Cc: paulmck, corbet, linux-kernel, linux-fsdevel, linux-mm,
David Howells
radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
or radix_tree_tag_clear(). The problem is that the double tag_get() in
radix_tree_tag_get():
if (!tag_get(node, tag, offset))
saw_unset_tag = 1;
if (height == 1) {
int ret = tag_get(node, tag, offset);
may see the value change due to the action of set/clear. RCU is no protection
against this as no pointers are being changed, no nodes are being replaced
according to a COW protocol - set/clear alter the node directly.
The documentation in linux/radix-tree.h, however, proclaims that
radix_tree_tag_get() is an exception to the rule that "any function modifying
the tree or tags (...) must exclude other modifications, and exclude any
functions reading the tree".
To this end, remove radix_tree_tag_get() from that list, and comment on its
definition that the caller is responsible for preventing concurrent access with
set/clear.
Furthermore, radix_tree_tag_get() is not safe with respect to
radix_tree_delete() either as that also modifies the tags directly.
An alternative would be to drop the BUG_ON() from radix_tree_tag_get() and note
that it may produce an untrustworthy answer if not so protected.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/linux/radix-tree.h | 3 +--
lib/radix-tree.c | 4 ++++
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index c5da749..33daa70 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -100,14 +100,13 @@ do { \
* The notable exceptions to this rule are the following functions:
* radix_tree_lookup
* radix_tree_lookup_slot
- * radix_tree_tag_get
* radix_tree_gang_lookup
* radix_tree_gang_lookup_slot
* radix_tree_gang_lookup_tag
* radix_tree_gang_lookup_tag_slot
* radix_tree_tagged
*
- * The first 7 functions are able to be called locklessly, using RCU. The
+ * The first 6 functions are able to be called locklessly, using RCU. The
* caller must ensure calls to these functions are made within rcu_read_lock()
* regions. Other readers (lock-free or otherwise) and modifications may be
* running concurrently.
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6b9670d..795a3bb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -556,6 +556,10 @@ EXPORT_SYMBOL(radix_tree_tag_clear);
*
* 0: tag not present or not set
* 1: tag set
+ *
+ * The caller must make sure this function does not run concurrently with
+ * radix_tree_tag_set/clear() or radix_tree_delete() as these modify the nodes
+ * directly to alter the tags.
*/
int radix_tree_tag_get(struct radix_tree_root *root,
unsigned long index, unsigned int tag)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] radix_tree_tag_get() is not as safe as the docs make out
2010-04-06 19:31 [PATCH] radix_tree_tag_get() is not as safe as the docs make out David Howells
@ 2010-04-06 19:39 ` Linus Torvalds
2010-04-06 19:48 ` Nick Piggin
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2010-04-06 19:39 UTC (permalink / raw)
To: David Howells
Cc: akpm, npiggin, paulmck, corbet, linux-kernel, linux-fsdevel,
linux-mm
On Tue, 6 Apr 2010, David Howells wrote:
>
> radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
> or radix_tree_tag_clear(). The problem is that the double tag_get() in
> radix_tree_tag_get():
[ snip snip ]
Looks like a reasonable patch, but the one thing you didn't say is whether
there is any code that relies on the incorrectly documented behavior?
How did you find this? Do we need to fix actual code too? The only user
seems to be your fscache/page.c thing, and I'm not seeing any locking
except for the rcu locking that is apparently not sufficient.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] radix_tree_tag_get() is not as safe as the docs make out
2010-04-06 19:31 [PATCH] radix_tree_tag_get() is not as safe as the docs make out David Howells
2010-04-06 19:39 ` Linus Torvalds
@ 2010-04-06 19:48 ` Nick Piggin
2010-04-06 21:12 ` David Howells
2010-04-06 21:18 ` David Howells
3 siblings, 0 replies; 5+ messages in thread
From: Nick Piggin @ 2010-04-06 19:48 UTC (permalink / raw)
To: David Howells
Cc: torvalds, akpm, paulmck, corbet, linux-kernel, linux-fsdevel,
linux-mm
On Tue, Apr 06, 2010 at 08:31:34PM +0100, David Howells wrote:
> radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
> or radix_tree_tag_clear(). The problem is that the double tag_get() in
> radix_tree_tag_get():
>
> if (!tag_get(node, tag, offset))
> saw_unset_tag = 1;
> if (height == 1) {
> int ret = tag_get(node, tag, offset);
>
> may see the value change due to the action of set/clear. RCU is no protection
> against this as no pointers are being changed, no nodes are being replaced
> according to a COW protocol - set/clear alter the node directly.
>
> The documentation in linux/radix-tree.h, however, proclaims that
> radix_tree_tag_get() is an exception to the rule that "any function modifying
> the tree or tags (...) must exclude other modifications, and exclude any
> functions reading the tree".
>
> To this end, remove radix_tree_tag_get() from that list, and comment on its
> definition that the caller is responsible for preventing concurrent access with
> set/clear.
>
> Furthermore, radix_tree_tag_get() is not safe with respect to
> radix_tree_delete() either as that also modifies the tags directly.
>
> An alternative would be to drop the BUG_ON() from radix_tree_tag_get() and note
> that it may produce an untrustworthy answer if not so protected.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
Nack, just drop the BUG_ON.
I don't know what you mean by "untrustworthy answer".
radix_tree_tag_get, when called under RCU, will always return 1 if the
tag is guaranteed to have been set. And always 0 if it was clear. If it
can have been flipped at some point, then radix_tree_tag_get might
return either.
This would be the same whether you are using "COW protocol" or whatever
to do the RCU protection. And it is the same for all other RCU radix
tree operations.
No, rcu_read_lock does not give atomicity over multiple operations
because it can't prevent writers from modifying the data structure. This
shouldn't be surprising, but feel free to add a comment that a lock or
seqlock is required to give such atomicity.
> ---
>
> include/linux/radix-tree.h | 3 +--
> lib/radix-tree.c | 4 ++++
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index c5da749..33daa70 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -100,14 +100,13 @@ do { \
> * The notable exceptions to this rule are the following functions:
> * radix_tree_lookup
> * radix_tree_lookup_slot
> - * radix_tree_tag_get
> * radix_tree_gang_lookup
> * radix_tree_gang_lookup_slot
> * radix_tree_gang_lookup_tag
> * radix_tree_gang_lookup_tag_slot
> * radix_tree_tagged
> *
> - * The first 7 functions are able to be called locklessly, using RCU. The
> + * The first 6 functions are able to be called locklessly, using RCU. The
> * caller must ensure calls to these functions are made within rcu_read_lock()
> * regions. Other readers (lock-free or otherwise) and modifications may be
> * running concurrently.
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 6b9670d..795a3bb 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -556,6 +556,10 @@ EXPORT_SYMBOL(radix_tree_tag_clear);
> *
> * 0: tag not present or not set
> * 1: tag set
> + *
> + * The caller must make sure this function does not run concurrently with
> + * radix_tree_tag_set/clear() or radix_tree_delete() as these modify the nodes
> + * directly to alter the tags.
> */
> int radix_tree_tag_get(struct radix_tree_root *root,
> unsigned long index, unsigned int tag)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] radix_tree_tag_get() is not as safe as the docs make out
2010-04-06 19:31 [PATCH] radix_tree_tag_get() is not as safe as the docs make out David Howells
2010-04-06 19:39 ` Linus Torvalds
2010-04-06 19:48 ` Nick Piggin
@ 2010-04-06 21:12 ` David Howells
2010-04-06 21:18 ` David Howells
3 siblings, 0 replies; 5+ messages in thread
From: David Howells @ 2010-04-06 21:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: dhowells, akpm, npiggin, paulmck, corbet, linux-kernel,
linux-fsdevel, linux-mm
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Looks like a reasonable patch, but the one thing you didn't say is whether
> there is any code that relies on the incorrectly documented behavior?
Sorry, yes. I've made an assumption in FS-Cache that I can rely on the result
of radix_tree_tag_get() simply by wrapping it in an rcu_read_lock()'d section.
This has proven not to be so, since the BUG_ON() at line 602 in
lib/radix-tree.c triggered.
I was protecting set/clear/delete from each other, but not protecting get from
set/clear/delete.
> How did you find this? Do we need to fix actual code too? The only user
> seems to be your fscache/page.c thing, and I'm not seeing any locking
> except for the rcu locking that is apparently not sufficient.
As mentioned above, someone reported a bug in fscache that led me to this:
https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html
I may need to fix fscache, but I wanted to see if anyone would suggest an
alternate patch that would continue to let me make a test without having to
grab the spinlock first.
I'll update the patch to reflect this, whatever the final patch ends up being.
David
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] radix_tree_tag_get() is not as safe as the docs make out
2010-04-06 19:31 [PATCH] radix_tree_tag_get() is not as safe as the docs make out David Howells
` (2 preceding siblings ...)
2010-04-06 21:12 ` David Howells
@ 2010-04-06 21:18 ` David Howells
3 siblings, 0 replies; 5+ messages in thread
From: David Howells @ 2010-04-06 21:18 UTC (permalink / raw)
To: Nick Piggin
Cc: dhowells, torvalds, akpm, paulmck, corbet, linux-kernel,
linux-fsdevel, linux-mm
Nick Piggin <npiggin@suse.de> wrote:
> Nack, just drop the BUG_ON.
I can do that.
> I don't know what you mean by "untrustworthy answer".
I was thinking that the answer you get from radix_tree_tag_get() may be invalid
if the tag chain is being modified as you read it. So if you do:
rcu_read_lock()
...
x = radix_tree_tag_get(r, i, t);
...
y = radix_tree_tag_get(r, i, t);
...
rcu_read_unlock()
Then you can't guarantee that x == y, even though you were holding the RCU read
lock.
As you suggested, I'll try and come up with a comment modification to this
effect.
David
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-04-06 21:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-06 19:31 [PATCH] radix_tree_tag_get() is not as safe as the docs make out David Howells
2010-04-06 19:39 ` Linus Torvalds
2010-04-06 19:48 ` Nick Piggin
2010-04-06 21:12 ` David Howells
2010-04-06 21:18 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).