* [PATCH 1/5] radix-tree: Fix race in gang lookup
2016-01-27 21:17 [PATCH 0/5] Fix races & improve the radix tree iterator patterns Matthew Wilcox
@ 2016-01-27 21:17 ` Matthew Wilcox
2016-02-03 21:37 ` Konstantin Khlebnikov
2016-03-04 13:21 ` zhong jiang
2016-01-27 21:17 ` [PATCH 2/5] hwspinlock: Fix race between radix tree insertion and lookup Matthew Wilcox
` (4 subsequent siblings)
5 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
To: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen
Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
linux-fsdevel, linux-mm, stable
From: Matthew Wilcox <willy@linux.intel.com>
If the indirect_ptr bit is set on a slot, that indicates we need to
redo the lookup. Introduce a new function radix_tree_iter_retry()
which forces the loop to retry the lookup by setting 'slot' to NULL and
turning the iterator back to point at the problematic entry.
This is a pretty rare problem to hit at the moment; the lookup has to
race with a grow of the radix tree from a height of 0. The consequences
of hitting this race are that gang lookup could return a pointer to a
radix_tree_node instead of a pointer to whatever the user had inserted
in the tree.
Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: stable@vger.kernel.org
---
include/linux/radix-tree.h | 16 ++++++++++++++++
lib/radix-tree.c | 12 ++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index f9a3da5bf892..db0ed595749b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
struct radix_tree_iter *iter, unsigned flags);
/**
+ * radix_tree_iter_retry - retry this chunk of the iteration
+ * @iter: iterator state
+ *
+ * If we iterate over a tree protected only by the RCU lock, a race
+ * against deletion or creation may result in seeing a slot for which
+ * radix_tree_deref_retry() returns true. If so, call this function
+ * and continue the iteration.
+ */
+static inline __must_check
+void **radix_tree_iter_retry(struct radix_tree_iter *iter)
+{
+ iter->next_index = iter->index;
+ return NULL;
+}
+
+/**
* radix_tree_chunk_size - get current chunk size
*
* @iter: pointer to radix tree iterator
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index a25f635dcc56..65422ac17114 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
return 0;
radix_tree_for_each_slot(slot, root, &iter, first_index) {
- results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
+ results[ret] = rcu_dereference_raw(*slot);
if (!results[ret])
continue;
+ if (radix_tree_is_indirect_ptr(results[ret])) {
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
if (++ret == max_items)
break;
}
@@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
return 0;
radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
- results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
+ results[ret] = rcu_dereference_raw(*slot);
if (!results[ret])
continue;
+ if (radix_tree_is_indirect_ptr(results[ret])) {
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
if (++ret == max_items)
break;
}
--
2.7.0.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] radix-tree: Fix race in gang lookup
2016-01-27 21:17 ` [PATCH 1/5] radix-tree: Fix race in gang lookup Matthew Wilcox
@ 2016-02-03 21:37 ` Konstantin Khlebnikov
2016-02-04 8:44 ` Konstantin Khlebnikov
2016-03-04 13:21 ` zhong jiang
1 sibling, 1 reply; 16+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-03 21:37 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
Linux Kernel Mailing List, linux-fsdevel, linux-mm@kvack.org,
Stable
On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> If the indirect_ptr bit is set on a slot, that indicates we need to
> redo the lookup. Introduce a new function radix_tree_iter_retry()
> which forces the loop to retry the lookup by setting 'slot' to NULL and
> turning the iterator back to point at the problematic entry.
>
> This is a pretty rare problem to hit at the moment; the lookup has to
> race with a grow of the radix tree from a height of 0. The consequences
> of hitting this race are that gang lookup could return a pointer to a
> radix_tree_node instead of a pointer to whatever the user had inserted
> in the tree.
>
> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
> include/linux/radix-tree.h | 16 ++++++++++++++++
> lib/radix-tree.c | 12 ++++++++++--
> 2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index f9a3da5bf892..db0ed595749b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
> struct radix_tree_iter *iter, unsigned flags);
>
> /**
> + * radix_tree_iter_retry - retry this chunk of the iteration
> + * @iter: iterator state
> + *
> + * If we iterate over a tree protected only by the RCU lock, a race
> + * against deletion or creation may result in seeing a slot for which
> + * radix_tree_deref_retry() returns true. If so, call this function
> + * and continue the iteration.
> + */
> +static inline __must_check
> +void **radix_tree_iter_retry(struct radix_tree_iter *iter)
> +{
> + iter->next_index = iter->index;
> + return NULL;
> +}
> +
> +/**
> * radix_tree_chunk_size - get current chunk size
> *
> * @iter: pointer to radix tree iterator
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index a25f635dcc56..65422ac17114 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
> return 0;
>
> radix_tree_for_each_slot(slot, root, &iter, first_index) {
> - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> + results[ret] = rcu_dereference_raw(*slot);
> if (!results[ret])
> continue;
> + if (radix_tree_is_indirect_ptr(results[ret])) {
> + slot = radix_tree_iter_retry(&iter);
> + continue;
> + }
> if (++ret == max_items)
> break;
> }
Looks like your fix doesn't work.
After radix_tree_iter_retry: radix_tree_for_each_slot will call
radix_tree_next_slot which isn't safe to call for NULL slot.
#define radix_tree_for_each_slot(slot, root, iter, start) \
for (slot = radix_tree_iter_init(iter, start) ; \
slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \
slot = radix_tree_next_slot(slot, iter, 0))
tagged iterator works becase restart happens only at root - tags
filled with single bit.
quick (untested) fix for that
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -457,9 +457,9 @@ radix_tree_next_slot(void **slot, struct
radix_tree_iter *iter, unsigned flags)
return slot + offset + 1;
}
} else {
- unsigned size = radix_tree_chunk_size(iter) - 1;
+ int size = radix_tree_chunk_size(iter) - 1;
- while (size--) {
+ while (size-- > 0) {
slot++;
iter->index++;
if (likely(*slot))
> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
> return 0;
>
> radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
> - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> + results[ret] = rcu_dereference_raw(*slot);
> if (!results[ret])
> continue;
> + if (radix_tree_is_indirect_ptr(results[ret])) {
> + slot = radix_tree_iter_retry(&iter);
> + continue;
> + }
> if (++ret == max_items)
> break;
> }
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] radix-tree: Fix race in gang lookup
2016-02-03 21:37 ` Konstantin Khlebnikov
@ 2016-02-04 8:44 ` Konstantin Khlebnikov
0 siblings, 0 replies; 16+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-04 8:44 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
Linux Kernel Mailing List, linux-fsdevel, linux-mm@kvack.org,
Stable
[-- Attachment #1: Type: text/plain, Size: 5224 bytes --]
On Thu, Feb 4, 2016 at 12:37 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
> <matthew.r.wilcox@intel.com> wrote:
>> From: Matthew Wilcox <willy@linux.intel.com>
>>
>> If the indirect_ptr bit is set on a slot, that indicates we need to
>> redo the lookup. Introduce a new function radix_tree_iter_retry()
>> which forces the loop to retry the lookup by setting 'slot' to NULL and
>> turning the iterator back to point at the problematic entry.
>>
>> This is a pretty rare problem to hit at the moment; the lookup has to
>> race with a grow of the radix tree from a height of 0. The consequences
>> of hitting this race are that gang lookup could return a pointer to a
>> radix_tree_node instead of a pointer to whatever the user had inserted
>> in the tree.
>>
>> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
>> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
>> Cc: stable@vger.kernel.org
>> ---
>> include/linux/radix-tree.h | 16 ++++++++++++++++
>> lib/radix-tree.c | 12 ++++++++++--
>> 2 files changed, 26 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
>> index f9a3da5bf892..db0ed595749b 100644
>> --- a/include/linux/radix-tree.h
>> +++ b/include/linux/radix-tree.h
>> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
>> struct radix_tree_iter *iter, unsigned flags);
>>
>> /**
>> + * radix_tree_iter_retry - retry this chunk of the iteration
>> + * @iter: iterator state
>> + *
>> + * If we iterate over a tree protected only by the RCU lock, a race
>> + * against deletion or creation may result in seeing a slot for which
>> + * radix_tree_deref_retry() returns true. If so, call this function
>> + * and continue the iteration.
>> + */
>> +static inline __must_check
>> +void **radix_tree_iter_retry(struct radix_tree_iter *iter)
>> +{
>> + iter->next_index = iter->index;
>> + return NULL;
>> +}
>> +
>> +/**
>> * radix_tree_chunk_size - get current chunk size
>> *
>> * @iter: pointer to radix tree iterator
>> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
>> index a25f635dcc56..65422ac17114 100644
>> --- a/lib/radix-tree.c
>> +++ b/lib/radix-tree.c
>> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
>> return 0;
>>
>> radix_tree_for_each_slot(slot, root, &iter, first_index) {
>> - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
>> + results[ret] = rcu_dereference_raw(*slot);
>> if (!results[ret])
>> continue;
>> + if (radix_tree_is_indirect_ptr(results[ret])) {
>> + slot = radix_tree_iter_retry(&iter);
>> + continue;
>> + }
>> if (++ret == max_items)
>> break;
>> }
>
> Looks like your fix doesn't work.
>
> After radix_tree_iter_retry: radix_tree_for_each_slot will call
> radix_tree_next_slot which isn't safe to call for NULL slot.
>
> #define radix_tree_for_each_slot(slot, root, iter, start) \
> for (slot = radix_tree_iter_init(iter, start) ; \
> slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \
> slot = radix_tree_next_slot(slot, iter, 0))
>
> tagged iterator works becase restart happens only at root - tags
> filled with single bit.
>
> quick (untested) fix for that
>
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -457,9 +457,9 @@ radix_tree_next_slot(void **slot, struct
> radix_tree_iter *iter, unsigned flags)
> return slot + offset + 1;
> }
> } else {
> - unsigned size = radix_tree_chunk_size(iter) - 1;
> + int size = radix_tree_chunk_size(iter) - 1;
>
> - while (size--) {
> + while (size-- > 0) {
> slot++;
> iter->index++;
> if (likely(*slot))
>
>
Yep. Kernel crashes. Test in attachment.
fix: https://lkml.kernel.org/r/145457528789.31321.4441662473067711123.stgit@zurg
>> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
>> return 0;
>>
>> radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
>> - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
>> + results[ret] = rcu_dereference_raw(*slot);
>> if (!results[ret])
>> continue;
>> + if (radix_tree_is_indirect_ptr(results[ret])) {
>> + slot = radix_tree_iter_retry(&iter);
>> + continue;
>> + }
>> if (++ret == max_items)
>> break;
>> }
>> --
>> 2.7.0.rc3
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
[-- Attachment #2: radix-tree-test-radix_tree_iter_retry --]
[-- Type: application/octet-stream, Size: 2218 bytes --]
radix-tree: test radix_tree_iter_retry
From: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
---
lib/radix-tree.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6b79e9026e24..f489334b9cb7 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1491,6 +1491,66 @@ static int radix_tree_callback(struct notifier_block *nfb,
return NOTIFY_OK;
}
+static void test_iter_retry(void)
+{
+ RADIX_TREE(root, GFP_KERNEL);
+ void *ptr = (void *)4ul;
+ struct radix_tree_iter iter;
+ void **slot;
+ bool first;
+
+ radix_tree_insert(&root, 0, ptr);
+ radix_tree_tag_set(&root, 0, 0);
+
+ first = true;
+ radix_tree_for_each_tagged(slot, &root, &iter, 0, 0) {
+ printk("tagged %ld %p\n", iter.index, *slot);
+ if (first) {
+ radix_tree_insert(&root, 1, ptr);
+ radix_tree_tag_set(&root, 1, 0);
+ first = false;
+ }
+ if (radix_tree_deref_retry(*slot)) {
+ printk("retry %ld\n", iter.index);
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
+ }
+ radix_tree_delete(&root, 1);
+
+ first = true;
+ radix_tree_for_each_slot(slot, &root, &iter, 0) {
+ printk("slot %ld %p\n", iter.index, *slot);
+ if (first) {
+ radix_tree_insert(&root, 1, ptr);
+ first = false;
+ }
+ if (radix_tree_deref_retry(*slot)) {
+ printk("retry %ld\n", iter.index);
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
+ }
+ radix_tree_delete(&root, 1);
+
+ first = true;
+ radix_tree_for_each_contig(slot, &root, &iter, 0) {
+ printk("contig %ld %p\n", iter.index, *slot);
+ if (first) {
+ radix_tree_insert(&root, 1, ptr);
+ first = false;
+ }
+ if (radix_tree_deref_retry(*slot)) {
+ printk("retry %ld\n", iter.index);
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
+ }
+
+ radix_tree_delete(&root, 0);
+ radix_tree_delete(&root, 1);
+}
+
void __init radix_tree_init(void)
{
radix_tree_node_cachep = kmem_cache_create("radix_tree_node",
@@ -1499,4 +1559,6 @@ void __init radix_tree_init(void)
radix_tree_node_ctor);
radix_tree_init_maxindex();
hotcpu_notifier(radix_tree_callback, 0);
+
+ test_iter_retry();
}
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] radix-tree: Fix race in gang lookup
2016-01-27 21:17 ` [PATCH 1/5] radix-tree: Fix race in gang lookup Matthew Wilcox
2016-02-03 21:37 ` Konstantin Khlebnikov
@ 2016-03-04 13:21 ` zhong jiang
1 sibling, 0 replies; 16+ messages in thread
From: zhong jiang @ 2016-03-04 13:21 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm,
stable
On 2016/1/28 5:17, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> If the indirect_ptr bit is set on a slot, that indicates we need to
> redo the lookup. Introduce a new function radix_tree_iter_retry()
> which forces the loop to retry the lookup by setting 'slot' to NULL and
> turning the iterator back to point at the problematic entry.
>
> This is a pretty rare problem to hit at the moment; the lookup has to
> race with a grow of the radix tree from a height of 0. The consequences
> of hitting this race are that gang lookup could return a pointer to a
> radix_tree_node instead of a pointer to whatever the user had inserted
> in the tree.
>
> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
> include/linux/radix-tree.h | 16 ++++++++++++++++
> lib/radix-tree.c | 12 ++++++++++--
> 2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index f9a3da5bf892..db0ed595749b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
> struct radix_tree_iter *iter, unsigned flags);
>
> /**
> + * radix_tree_iter_retry - retry this chunk of the iteration
> + * @iter: iterator state
> + *
> + * If we iterate over a tree protected only by the RCU lock, a race
> + * against deletion or creation may result in seeing a slot for which
> + * radix_tree_deref_retry() returns true. If so, call this function
> + * and continue the iteration.
> + */
> +static inline __must_check
> +void **radix_tree_iter_retry(struct radix_tree_iter *iter)
> +{
> + iter->next_index = iter->index;
> + return NULL;
> +}
> +
> +/**
> * radix_tree_chunk_size - get current chunk size
> *
> * @iter: pointer to radix tree iterator
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index a25f635dcc56..65422ac17114 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
> return 0;
>
> radix_tree_for_each_slot(slot, root, &iter, first_index) {
> - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> + results[ret] = rcu_dereference_raw(*slot);
> if (!results[ret])
> continue;
> + if (radix_tree_is_indirect_ptr(results[ret])) {
> + slot = radix_tree_iter_retry(&iter);
> + continue;
> + }
> if (++ret == max_items)
> break;
> }
according to your patch, after A race occur, slot equals to null. radix_tree_next_slot() will continue
to work. Therefore, it will not return the problematic entry.
> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
> return 0;
>
> radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
> - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> + results[ret] = rcu_dereference_raw(*slot);
> if (!results[ret])
> continue;
> + if (radix_tree_is_indirect_ptr(results[ret])) {
> + slot = radix_tree_iter_retry(&iter);
> + continue;
> + }
> if (++ret == max_items)
> break;
> }
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/5] hwspinlock: Fix race between radix tree insertion and lookup
2016-01-27 21:17 [PATCH 0/5] Fix races & improve the radix tree iterator patterns Matthew Wilcox
2016-01-27 21:17 ` [PATCH 1/5] radix-tree: Fix race in gang lookup Matthew Wilcox
@ 2016-01-27 21:17 ` Matthew Wilcox
2016-01-27 21:17 ` [PATCH 3/5] btrfs: Use radix_tree_iter_retry() Matthew Wilcox
` (3 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
To: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen
Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
linux-fsdevel, linux-mm, stable
From: Matthew Wilcox <willy@linux.intel.com>
of_hwspin_lock_get_id() is protected by the RCU lock, which means that
insertions can occur simultaneously with the lookup. If the radix tree
transitions from a height of 0, we can see a slot with the indirect_ptr
bit set, which will cause us to at least read random memory, and could
cause other havoc.
Fix this by using the newly introduced radix_tree_iter_retry().
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: stable@vger.kernel.org
---
drivers/hwspinlock/hwspinlock_core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/hwspinlock/hwspinlock_core.c b/drivers/hwspinlock/hwspinlock_core.c
index 52f708bcf77f..d50c701b19d6 100644
--- a/drivers/hwspinlock/hwspinlock_core.c
+++ b/drivers/hwspinlock/hwspinlock_core.c
@@ -313,6 +313,10 @@ int of_hwspin_lock_get_id(struct device_node *np, int index)
hwlock = radix_tree_deref_slot(slot);
if (unlikely(!hwlock))
continue;
+ if (radix_tree_is_indirect_ptr(hwlock)) {
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
if (hwlock->bank->dev->of_node == args.np) {
ret = 0;
--
2.7.0.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/5] btrfs: Use radix_tree_iter_retry()
2016-01-27 21:17 [PATCH 0/5] Fix races & improve the radix tree iterator patterns Matthew Wilcox
2016-01-27 21:17 ` [PATCH 1/5] radix-tree: Fix race in gang lookup Matthew Wilcox
2016-01-27 21:17 ` [PATCH 2/5] hwspinlock: Fix race between radix tree insertion and lookup Matthew Wilcox
@ 2016-01-27 21:17 ` Matthew Wilcox
2016-02-01 14:34 ` David Sterba
2016-01-27 21:17 ` [PATCH 4/5] mm: " Matthew Wilcox
` (2 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
To: Andrew Morton, Hugh Dickins
Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
linux-fsdevel, linux-mm
From: Matthew Wilcox <willy@linux.intel.com>
Even though this is a 'can't happen' situation, use the new
radix_tree_iter_retry() pattern to eliminate a goto.
---
fs/btrfs/tests/btrfs-tests.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index b1d920b30070..0da78c54317a 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -137,7 +137,6 @@ static void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info)
void **slot;
spin_lock(&fs_info->buffer_lock);
-restart:
radix_tree_for_each_slot(slot, &fs_info->buffer_radix, &iter, 0) {
struct extent_buffer *eb;
@@ -147,7 +146,7 @@ restart:
/* Shouldn't happen but that kind of thinking creates CVE's */
if (radix_tree_exception(eb)) {
if (radix_tree_deref_retry(eb))
- goto restart;
+ slot = radix_tree_iter_retry(iter);
continue;
}
spin_unlock(&fs_info->buffer_lock);
--
2.7.0.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] btrfs: Use radix_tree_iter_retry()
2016-01-27 21:17 ` [PATCH 3/5] btrfs: Use radix_tree_iter_retry() Matthew Wilcox
@ 2016-02-01 14:34 ` David Sterba
0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2016-02-01 14:34 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hugh Dickins, Matthew Wilcox,
Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm
On Wed, Jan 27, 2016 at 04:17:50PM -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> Even though this is a 'can't happen' situation, use the new
> radix_tree_iter_retry() pattern to eliminate a goto.
Andrew's tree contains a fixup for a build failure
> @@ -147,7 +146,7 @@ restart:
> /* Shouldn't happen but that kind of thinking creates CVE's */
> if (radix_tree_exception(eb)) {
> if (radix_tree_deref_retry(eb))
> - goto restart;
> + slot = radix_tree_iter_retry(iter);
slot = radix_tree_iter_retry(&iter);
http://ozlabs.org/~akpm/mmots/broken-out/btrfs-use-radix_tree_iter_retry-fix.patch
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/5] mm: Use radix_tree_iter_retry()
2016-01-27 21:17 [PATCH 0/5] Fix races & improve the radix tree iterator patterns Matthew Wilcox
` (2 preceding siblings ...)
2016-01-27 21:17 ` [PATCH 3/5] btrfs: Use radix_tree_iter_retry() Matthew Wilcox
@ 2016-01-27 21:17 ` Matthew Wilcox
2016-01-29 14:45 ` Vlastimil Babka
2016-02-19 18:02 ` Sasha Levin
2016-01-27 21:17 ` [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next() Matthew Wilcox
2016-01-28 7:17 ` [PATCH 0/5] Fix races & improve the radix tree iterator patterns Konstantin Khlebnikov
5 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
To: Andrew Morton, Hugh Dickins
Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
linux-fsdevel, linux-mm
From: Matthew Wilcox <willy@linux.intel.com>
Instead of a 'goto restart', we can now use radix_tree_iter_retry()
to restart from our current position. This will make a difference
when there are more ways to happen across an indirect pointer. And it
eliminates some confusing gotos.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
mm/filemap.c | 53 +++++++++++++++++------------------------------------
mm/shmem.c | 18 ++++++++++++------
2 files changed, 29 insertions(+), 42 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 7705dac561ba..e0c4b905fe1c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1245,7 +1245,6 @@ unsigned find_get_entries(struct address_space *mapping,
return 0;
rcu_read_lock();
-restart:
radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
struct page *page;
repeat:
@@ -1253,8 +1252,10 @@ repeat:
if (unlikely(!page))
continue;
if (radix_tree_exception(page)) {
- if (radix_tree_deref_retry(page))
- goto restart;
+ if (radix_tree_deref_retry(page)) {
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
/*
* A shadow entry of a recently evicted page, a swap
* entry from shmem/tmpfs or a DAX entry. Return it
@@ -1307,7 +1308,6 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
return 0;
rcu_read_lock();
-restart:
radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
struct page *page;
repeat:
@@ -1317,13 +1317,8 @@ repeat:
if (radix_tree_exception(page)) {
if (radix_tree_deref_retry(page)) {
- /*
- * Transient condition which can only trigger
- * when entry at index 0 moves out of or back
- * to root: none yet gotten, safe to restart.
- */
- WARN_ON(iter.index);
- goto restart;
+ slot = radix_tree_iter_retry(&iter);
+ continue;
}
/*
* A shadow entry of a recently evicted page,
@@ -1374,7 +1369,6 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index,
return 0;
rcu_read_lock();
-restart:
radix_tree_for_each_contig(slot, &mapping->page_tree, &iter, index) {
struct page *page;
repeat:
@@ -1385,12 +1379,8 @@ repeat:
if (radix_tree_exception(page)) {
if (radix_tree_deref_retry(page)) {
- /*
- * Transient condition which can only trigger
- * when entry at index 0 moves out of or back
- * to root: none yet gotten, safe to restart.
- */
- goto restart;
+ slot = radix_tree_iter_retry(&iter);
+ continue;
}
/*
* A shadow entry of a recently evicted page,
@@ -1450,7 +1440,6 @@ unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
return 0;
rcu_read_lock();
-restart:
radix_tree_for_each_tagged(slot, &mapping->page_tree,
&iter, *index, tag) {
struct page *page;
@@ -1461,12 +1450,8 @@ repeat:
if (radix_tree_exception(page)) {
if (radix_tree_deref_retry(page)) {
- /*
- * Transient condition which can only trigger
- * when entry at index 0 moves out of or back
- * to root: none yet gotten, safe to restart.
- */
- goto restart;
+ slot = radix_tree_iter_retry(&iter);
+ continue;
}
/*
* A shadow entry of a recently evicted page.
@@ -1529,7 +1514,6 @@ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
return 0;
rcu_read_lock();
-restart:
radix_tree_for_each_tagged(slot, &mapping->page_tree,
&iter, start, tag) {
struct page *page;
@@ -1539,12 +1523,8 @@ repeat:
continue;
if (radix_tree_exception(page)) {
if (radix_tree_deref_retry(page)) {
- /*
- * Transient condition which can only trigger
- * when entry at index 0 moves out of or back
- * to root: none yet gotten, safe to restart.
- */
- goto restart;
+ slot = radix_tree_iter_retry(&iter);
+ continue;
}
/*
@@ -2151,10 +2131,11 @@ repeat:
if (unlikely(!page))
goto next;
if (radix_tree_exception(page)) {
- if (radix_tree_deref_retry(page))
- break;
- else
- goto next;
+ if (radix_tree_deref_retry(page)) {
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
+ goto next;
}
if (!page_cache_get_speculative(page))
diff --git a/mm/shmem.c b/mm/shmem.c
index fa2ceb2d2655..6ec14b70d82d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -388,8 +388,10 @@ restart:
* don't need to reset the counter, nor do we risk infinite
* restarts.
*/
- if (radix_tree_deref_retry(page))
- goto restart;
+ if (radix_tree_deref_retry(page)) {
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
if (radix_tree_exceptional_entry(page))
swapped++;
@@ -1952,8 +1954,10 @@ restart:
radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
page = radix_tree_deref_slot(slot);
if (!page || radix_tree_exception(page)) {
- if (radix_tree_deref_retry(page))
- goto restart;
+ if (radix_tree_deref_retry(page)) {
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
} else if (page_count(page) - page_mapcount(page) > 1) {
spin_lock_irq(&mapping->tree_lock);
radix_tree_tag_set(&mapping->page_tree, iter.index,
@@ -2007,8 +2011,10 @@ restart:
page = radix_tree_deref_slot(slot);
if (radix_tree_exception(page)) {
- if (radix_tree_deref_retry(page))
- goto restart;
+ if (radix_tree_deref_retry(page)) {
+ slot = radix_tree_iter_retry(&iter);
+ continue;
+ }
page = NULL;
}
--
2.7.0.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
2016-01-27 21:17 ` [PATCH 4/5] mm: " Matthew Wilcox
@ 2016-01-29 14:45 ` Vlastimil Babka
2016-01-29 14:50 ` Matthew Wilcox
2016-02-19 18:02 ` Sasha Levin
1 sibling, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2016-01-29 14:45 UTC (permalink / raw)
To: Matthew Wilcox, Andrew Morton, Hugh Dickins
Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
linux-fsdevel, linux-mm
On 01/27/2016 10:17 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> Instead of a 'goto restart', we can now use radix_tree_iter_retry()
> to restart from our current position. This will make a difference
> when there are more ways to happen across an indirect pointer. And it
> eliminates some confusing gotos.
>
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
[...]
> diff --git a/mm/shmem.c b/mm/shmem.c
> index fa2ceb2d2655..6ec14b70d82d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -388,8 +388,10 @@ restart:
> * don't need to reset the counter, nor do we risk infinite
> * restarts.
> */
> - if (radix_tree_deref_retry(page))
> - goto restart;
> + if (radix_tree_deref_retry(page)) {
> + slot = radix_tree_iter_retry(&iter);
> + continue;
> + }
>
> if (radix_tree_exceptional_entry(page))
> swapped++;
This should be applied on top. There are no restarts anymore.
----8<----
>From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 29 Jan 2016 15:41:31 +0100
Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix
Remove now-obsolete-and-misleading comment.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/shmem.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 8f89abd4eaee..4d758938340c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
page = radix_tree_deref_slot(slot);
- /*
- * This should only be possible to happen at index 0, so we
- * don't need to reset the counter, nor do we risk infinite
- * restarts.
- */
if (radix_tree_deref_retry(page)) {
slot = radix_tree_iter_retry(&iter);
continue;
--
2.7.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
2016-01-29 14:45 ` Vlastimil Babka
@ 2016-01-29 14:50 ` Matthew Wilcox
0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2016-01-29 14:50 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Matthew Wilcox, Andrew Morton, Hugh Dickins,
Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm
On Fri, Jan 29, 2016 at 03:45:59PM +0100, Vlastimil Babka wrote:
> This should be applied on top. There are no restarts anymore.
Quite right. Sorry I missed the comment.
Acked-by: Matthwe Wilcox <willy@linux.intel.com>
> ----8<----
> >From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Fri, 29 Jan 2016 15:41:31 +0100
> Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix
>
> Remove now-obsolete-and-misleading comment.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/shmem.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 8f89abd4eaee..4d758938340c 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>
> page = radix_tree_deref_slot(slot);
>
> - /*
> - * This should only be possible to happen at index 0, so we
> - * don't need to reset the counter, nor do we risk infinite
> - * restarts.
> - */
> if (radix_tree_deref_retry(page)) {
> slot = radix_tree_iter_retry(&iter);
> continue;
> --
> 2.7.0
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
2016-01-27 21:17 ` [PATCH 4/5] mm: " Matthew Wilcox
2016-01-29 14:45 ` Vlastimil Babka
@ 2016-02-19 18:02 ` Sasha Levin
1 sibling, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2016-02-19 18:02 UTC (permalink / raw)
To: Matthew Wilcox, Andrew Morton, Hugh Dickins
Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
linux-fsdevel, linux-mm
On 01/27/2016 04:17 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> Instead of a 'goto restart', we can now use radix_tree_iter_retry()
> to restart from our current position. This will make a difference
> when there are more ways to happen across an indirect pointer. And it
> eliminates some confusing gotos.
Hey Matthew,
I'm seeing the following NULL ptr deref while fuzzing:
[ 3380.120501] general protection fault: 0000 [#1] SMP KASAN
[ 3380.120529] Modules linked in:
[ 3380.120555] CPU: 2 PID: 23271 Comm: syz-executor Not tainted 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978
[ 3380.120569] task: ffff8800a5181000 ti: ffff8801a63b8000 task.ti: ffff8801a63b8000
[ 3380.120681] RIP: shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100)
[ 3380.120692] RSP: 0018:ffff8801a63bfd58 EFLAGS: 00010202
[ 3380.120703] RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 0000000000940000
[ 3380.120714] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8800a5181b3c
[ 3380.120725] RBP: ffff8801a63bfe58 R08: ffff8800a5181b40 R09: 0000000000000001
[ 3380.120736] R10: fffff44e6f425fff R11: ffffffffbdb0a420 R12: 0000000000000008
[ 3380.120745] R13: 0000000000000001 R14: 0000000000000001 R15: ffffea0002ad1660
[ 3380.120759] FS: 00007fbc71e9c700(0000) GS:ffff8801d3c00000(0000) knlGS:0000000000000000
[ 3380.120769] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3380.120780] CR2: 0000000020010ff7 CR3: 00000001a0728000 CR4: 00000000000406e0
[ 3380.120794] Stack:
[ 3380.120815] ffffffffa2738239 00000008a63bfdc8 ffff8800a499e740 1ffff10034c77fba
[ 3380.120834] ffff8801ac446da0 ffff8800a499e8f0 0000000000000000 1ffff10034c77001
[ 3380.120852] ffff8801a63b8000 ffff8801a63b8008 ffff8801ac446f90 ffff8801ac446f98
[ 3380.120856] Call Trace:
[ 3380.120929] shmem_fcntl (mm/shmem.c:2135)
[ 3380.120963] SyS_fcntl (fs/fcntl.c:336 fs/fcntl.c:372 fs/fcntl.c:357)
[ 3380.121112] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200)
[ 3380.122294] Code: c7 45 a0 00 00 00 00 e9 86 02 00 00 e8 cf a8 ee ff 4d 85 e4 0f 84 b2 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 d4 08 00 00 49 8b 1c 24 e8 12 34 de ff 85 c0
All code
========
0: c7 45 a0 00 00 00 00 movl $0x0,-0x60(%rbp)
7: e9 86 02 00 00 jmpq 0x292
c: e8 cf a8 ee ff callq 0xffffffffffeea8e0
11: 4d 85 e4 test %r12,%r12
14: 0f 84 b2 07 00 00 je 0x7cc
1a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
21: fc ff df
24: 4c 89 e2 mov %r12,%rdx
27: 48 c1 ea 03 shr $0x3,%rdx
2b:* 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
2f: 0f 85 d4 08 00 00 jne 0x909
35: 49 8b 1c 24 mov (%r12),%rbx
39: e8 12 34 de ff callq 0xffffffffffde3450
3e: 85 c0 test %eax,%eax
...
Code starting with the faulting instruction
===========================================
0: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
4: 0f 85 d4 08 00 00 jne 0x8de
a: 49 8b 1c 24 mov (%r12),%rbx
e: e8 12 34 de ff callq 0xffffffffffde3425
13: 85 c0 test %eax,%eax
...
[ 3380.122312] RIP shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100)
[ 3380.122317] RSP <ffff8801a63bfd58>
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next()
2016-01-27 21:17 [PATCH 0/5] Fix races & improve the radix tree iterator patterns Matthew Wilcox
` (3 preceding siblings ...)
2016-01-27 21:17 ` [PATCH 4/5] mm: " Matthew Wilcox
@ 2016-01-27 21:17 ` Matthew Wilcox
2016-02-04 8:50 ` Konstantin Khlebnikov
2016-01-28 7:17 ` [PATCH 0/5] Fix races & improve the radix tree iterator patterns Konstantin Khlebnikov
5 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
To: Andrew Morton, Hugh Dickins
Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
linux-fsdevel, linux-mm
From: Matthew Wilcox <willy@linux.intel.com>
shmem likes to occasionally drop the lock, schedule, then reacqire
the lock and continue with the iteration from the last place it
left off. This is currently done with a pretty ugly goto. Introduce
radix_tree_iter_next() and use it throughout shmem.c.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
include/linux/radix-tree.h | 15 +++++++++++++++
mm/shmem.c | 12 +++---------
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index db0ed595749b..dec2c6c77eea 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter)
}
/**
+ * radix_tree_iter_next - resume iterating when the chunk may be invalid
+ * @iter: iterator state
+ *
+ * If the iterator needs to release then reacquire a lock, the chunk may
+ * have been invalidated by an insertion or deletion. Call this function
+ * to continue the iteration from the next index.
+ */
+static inline __must_check
+void **radix_tree_iter_next(struct radix_tree_iter *iter)
+{
+ iter->next_index = iter->index + 1;
+ return NULL;
+}
+
+/**
* radix_tree_chunk_size - get current chunk size
*
* @iter: pointer to radix tree iterator
diff --git a/mm/shmem.c b/mm/shmem.c
index 6ec14b70d82d..438ea8004c26 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
rcu_read_lock();
-restart:
radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
if (iter.index >= end)
break;
@@ -398,8 +397,7 @@ restart:
if (need_resched()) {
cond_resched_rcu();
- start = iter.index + 1;
- goto restart;
+ slot = radix_tree_iter_next(&iter);
}
}
@@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping)
start = 0;
rcu_read_lock();
-restart:
radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
page = radix_tree_deref_slot(slot);
if (!page || radix_tree_exception(page)) {
@@ -1967,8 +1964,7 @@ restart:
if (need_resched()) {
cond_resched_rcu();
- start = iter.index + 1;
- goto restart;
+ slot = radix_tree_iter_next(&iter);
}
}
rcu_read_unlock();
@@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping)
start = 0;
rcu_read_lock();
-restart:
radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter,
start, SHMEM_TAG_PINNED) {
@@ -2039,8 +2034,7 @@ restart:
continue_resched:
if (need_resched()) {
cond_resched_rcu();
- start = iter.index + 1;
- goto restart;
+ slot = radix_tree_iter_next(&iter);
}
}
rcu_read_unlock();
--
2.7.0.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next()
2016-01-27 21:17 ` [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next() Matthew Wilcox
@ 2016-02-04 8:50 ` Konstantin Khlebnikov
0 siblings, 0 replies; 16+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-04 8:50 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hugh Dickins, Matthew Wilcox,
Linux Kernel Mailing List, linux-fsdevel, linux-mm@kvack.org
On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> shmem likes to occasionally drop the lock, schedule, then reacqire
> the lock and continue with the iteration from the last place it
> left off. This is currently done with a pretty ugly goto. Introduce
> radix_tree_iter_next() and use it throughout shmem.c.
>
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> ---
> include/linux/radix-tree.h | 15 +++++++++++++++
> mm/shmem.c | 12 +++---------
> 2 files changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index db0ed595749b..dec2c6c77eea 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter)
> }
>
> /**
> + * radix_tree_iter_next - resume iterating when the chunk may be invalid
> + * @iter: iterator state
> + *
> + * If the iterator needs to release then reacquire a lock, the chunk may
> + * have been invalidated by an insertion or deletion. Call this function
> + * to continue the iteration from the next index.
> + */
> +static inline __must_check
> +void **radix_tree_iter_next(struct radix_tree_iter *iter)
> +{
> + iter->next_index = iter->index + 1;
> + return NULL;
> +}
> +
> +/**
This works for normal iterator but not for tagged.
It must also reset iter->tags to zero.
> * radix_tree_chunk_size - get current chunk size
> *
> * @iter: pointer to radix tree iterator
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 6ec14b70d82d..438ea8004c26 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>
> rcu_read_lock();
>
> -restart:
> radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
> if (iter.index >= end)
> break;
> @@ -398,8 +397,7 @@ restart:
>
> if (need_resched()) {
> cond_resched_rcu();
> - start = iter.index + 1;
> - goto restart;
> + slot = radix_tree_iter_next(&iter);
> }
> }
>
> @@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping)
> start = 0;
> rcu_read_lock();
>
> -restart:
> radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
> page = radix_tree_deref_slot(slot);
> if (!page || radix_tree_exception(page)) {
> @@ -1967,8 +1964,7 @@ restart:
>
> if (need_resched()) {
> cond_resched_rcu();
> - start = iter.index + 1;
> - goto restart;
> + slot = radix_tree_iter_next(&iter);
> }
> }
> rcu_read_unlock();
> @@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping)
>
> start = 0;
> rcu_read_lock();
> -restart:
> radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter,
> start, SHMEM_TAG_PINNED) {
>
> @@ -2039,8 +2034,7 @@ restart:
> continue_resched:
> if (need_resched()) {
> cond_resched_rcu();
> - start = iter.index + 1;
> - goto restart;
> + slot = radix_tree_iter_next(&iter);
> }
> }
> rcu_read_unlock();
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns
2016-01-27 21:17 [PATCH 0/5] Fix races & improve the radix tree iterator patterns Matthew Wilcox
` (4 preceding siblings ...)
2016-01-27 21:17 ` [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next() Matthew Wilcox
@ 2016-01-28 7:17 ` Konstantin Khlebnikov
2016-02-03 6:27 ` Konstantin Khlebnikov
5 siblings, 1 reply; 16+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-28 7:17 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
Linux Kernel Mailing List, linux-fsdevel, linux-mm@kvack.org
On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> The first two patches here are bugfixes, and I would like to see them
> make their way into stable ASAP since they can lead to data corruption
> (very low probabilty).
>
> The last three patches do not qualify as bugfixes. They simply improve
> the standard pattern used to do radix tree iterations by removing the
> 'goto restart' part. Partially this is because this is an ugly &
> confusing goto, and partially because with multi-order entries in the
> tree, it'll be more likely that we'll see an indirect_ptr bit, and
> it's more efficient to kep going from the point of the iteration we're
> currently in than restart from the beginning each time.
Ack whole set.
I think we should go deeper in hide dereference/retry inside iterator.
Something like radix_tree_for_each_data(data, slot, root, iter, start).
I'll prepare patch for that.
>
> Matthew Wilcox (5):
> radix-tree: Fix race in gang lookup
> hwspinlock: Fix race between radix tree insertion and lookup
> btrfs: Use radix_tree_iter_retry()
> mm: Use radix_tree_iter_retry()
> radix-tree,shmem: Introduce radix_tree_iter_next()
>
> drivers/hwspinlock/hwspinlock_core.c | 4 +++
> fs/btrfs/tests/btrfs-tests.c | 3 +-
> include/linux/radix-tree.h | 31 +++++++++++++++++++++
> lib/radix-tree.c | 12 ++++++--
> mm/filemap.c | 53 ++++++++++++------------------------
> mm/shmem.c | 30 ++++++++++----------
> 6 files changed, 78 insertions(+), 55 deletions(-)
>
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns
2016-01-28 7:17 ` [PATCH 0/5] Fix races & improve the radix tree iterator patterns Konstantin Khlebnikov
@ 2016-02-03 6:27 ` Konstantin Khlebnikov
0 siblings, 0 replies; 16+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-03 6:27 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
Linux Kernel Mailing List, linux-fsdevel, linux-mm@kvack.org
On Thu, Jan 28, 2016 at 10:17 AM, Konstantin Khlebnikov
<koct9i@gmail.com> wrote:
> On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
> <matthew.r.wilcox@intel.com> wrote:
>> From: Matthew Wilcox <willy@linux.intel.com>
>>
>> The first two patches here are bugfixes, and I would like to see them
>> make their way into stable ASAP since they can lead to data corruption
>> (very low probabilty).
>>
>> The last three patches do not qualify as bugfixes. They simply improve
>> the standard pattern used to do radix tree iterations by removing the
>> 'goto restart' part. Partially this is because this is an ugly &
>> confusing goto, and partially because with multi-order entries in the
>> tree, it'll be more likely that we'll see an indirect_ptr bit, and
>> it's more efficient to kep going from the point of the iteration we're
>> currently in than restart from the beginning each time.
>
> Ack whole set.
>
> I think we should go deeper in hide dereference/retry inside iterator.
> Something like radix_tree_for_each_data(data, slot, root, iter, start).
> I'll prepare patch for that.
After second thought: there'ra not so many users for new sugar.
This scheme with radix_tree_deref_retry - radix_tree_iter_retry
complicated but fine.
>
>>
>> Matthew Wilcox (5):
>> radix-tree: Fix race in gang lookup
>> hwspinlock: Fix race between radix tree insertion and lookup
>> btrfs: Use radix_tree_iter_retry()
>> mm: Use radix_tree_iter_retry()
>> radix-tree,shmem: Introduce radix_tree_iter_next()
>>
>> drivers/hwspinlock/hwspinlock_core.c | 4 +++
>> fs/btrfs/tests/btrfs-tests.c | 3 +-
>> include/linux/radix-tree.h | 31 +++++++++++++++++++++
>> lib/radix-tree.c | 12 ++++++--
>> mm/filemap.c | 53 ++++++++++++------------------------
>> mm/shmem.c | 30 ++++++++++----------
>> 6 files changed, 78 insertions(+), 55 deletions(-)
>>
>> --
>> 2.7.0.rc3
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread