From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 29 Jan 2016 09:50:40 -0500 From: Matthew Wilcox To: Vlastimil Babka Cc: Matthew Wilcox , Andrew Morton , Hugh Dickins , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 4/5] mm: Use radix_tree_iter_retry() Message-ID: <20160129145040.GW2948@linux.intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> <56AB7B27.3090805@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56AB7B27.3090805@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: On Fri, Jan 29, 2016 at 03:45:59PM +0100, Vlastimil Babka wrote: > This should be applied on top. There are no restarts anymore. Quite right. Sorry I missed the comment. Acked-by: Matthwe Wilcox > ----8<---- > >From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka > Date: Fri, 29 Jan 2016 15:41:31 +0100 > Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix > > Remove now-obsolete-and-misleading comment. > > Signed-off-by: Vlastimil Babka > --- > mm/shmem.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index 8f89abd4eaee..4d758938340c 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, > > page = radix_tree_deref_slot(slot); > > - /* > - * This should only be possible to happen at index 0, so we > - * don't need to reset the counter, nor do we risk infinite > - * restarts. > - */ > if (radix_tree_deref_retry(page)) { > slot = radix_tree_iter_retry(&iter); > continue; > -- > 2.7.0 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH 4/5] mm: Use radix_tree_iter_retry() To: Matthew Wilcox , Andrew Morton , Hugh Dickins References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org From: Vlastimil Babka Message-ID: <56AB7B27.3090805@suse.cz> Date: Fri, 29 Jan 2016 15:45:59 +0100 MIME-Version: 1.0 In-Reply-To: <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: On 01/27/2016 10:17 PM, Matthew Wilcox wrote: > From: Matthew Wilcox > > Instead of a 'goto restart', we can now use radix_tree_iter_retry() > to restart from our current position. This will make a difference > when there are more ways to happen across an indirect pointer. And it > eliminates some confusing gotos. > > Signed-off-by: Matthew Wilcox [...] > diff --git a/mm/shmem.c b/mm/shmem.c > index fa2ceb2d2655..6ec14b70d82d 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -388,8 +388,10 @@ restart: > * don't need to reset the counter, nor do we risk infinite > * restarts. > */ > - if (radix_tree_deref_retry(page)) > - goto restart; > + if (radix_tree_deref_retry(page)) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > > if (radix_tree_exceptional_entry(page)) > swapped++; This should be applied on top. There are no restarts anymore. ----8<---- >>From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 29 Jan 2016 15:41:31 +0100 Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix Remove now-obsolete-and-misleading comment. Signed-off-by: Vlastimil Babka --- mm/shmem.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 8f89abd4eaee..4d758938340c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, page = radix_tree_deref_slot(slot); - /* - * This should only be possible to happen at index 0, so we - * don't need to reset the counter, nor do we risk infinite - * restarts. - */ if (radix_tree_deref_retry(page)) { slot = radix_tree_iter_retry(&iter); continue; -- 2.7.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Matthew Wilcox To: Andrew Morton , Hugh Dickins Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 4/5] mm: Use radix_tree_iter_retry() Date: Wed, 27 Jan 2016 16:17:51 -0500 Message-Id: <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: owner-linux-mm@kvack.org List-ID: From: Matthew Wilcox Instead of a 'goto restart', we can now use radix_tree_iter_retry() to restart from our current position. This will make a difference when there are more ways to happen across an indirect pointer. And it eliminates some confusing gotos. Signed-off-by: Matthew Wilcox --- mm/filemap.c | 53 +++++++++++++++++------------------------------------ mm/shmem.c | 18 ++++++++++++------ 2 files changed, 29 insertions(+), 42 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 7705dac561ba..e0c4b905fe1c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1245,7 +1245,6 @@ unsigned find_get_entries(struct address_space *mapping, return 0; rcu_read_lock(); -restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { struct page *page; repeat: @@ -1253,8 +1252,10 @@ repeat: if (unlikely(!page)) continue; if (radix_tree_exception(page)) { - if (radix_tree_deref_retry(page)) - goto restart; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } /* * A shadow entry of a recently evicted page, a swap * entry from shmem/tmpfs or a DAX entry. Return it @@ -1307,7 +1308,6 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start, return 0; rcu_read_lock(); -restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { struct page *page; repeat: @@ -1317,13 +1317,8 @@ repeat: if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { - /* - * Transient condition which can only trigger - * when entry at index 0 moves out of or back - * to root: none yet gotten, safe to restart. - */ - WARN_ON(iter.index); - goto restart; + slot = radix_tree_iter_retry(&iter); + continue; } /* * A shadow entry of a recently evicted page, @@ -1374,7 +1369,6 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index, return 0; rcu_read_lock(); -restart: radix_tree_for_each_contig(slot, &mapping->page_tree, &iter, index) { struct page *page; repeat: @@ -1385,12 +1379,8 @@ repeat: if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { - /* - * Transient condition which can only trigger - * when entry at index 0 moves out of or back - * to root: none yet gotten, safe to restart. - */ - goto restart; + slot = radix_tree_iter_retry(&iter); + continue; } /* * A shadow entry of a recently evicted page, @@ -1450,7 +1440,6 @@ unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index, return 0; rcu_read_lock(); -restart: radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, *index, tag) { struct page *page; @@ -1461,12 +1450,8 @@ repeat: if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { - /* - * Transient condition which can only trigger - * when entry at index 0 moves out of or back - * to root: none yet gotten, safe to restart. - */ - goto restart; + slot = radix_tree_iter_retry(&iter); + continue; } /* * A shadow entry of a recently evicted page. @@ -1529,7 +1514,6 @@ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start, return 0; rcu_read_lock(); -restart: radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, start, tag) { struct page *page; @@ -1539,12 +1523,8 @@ repeat: continue; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { - /* - * Transient condition which can only trigger - * when entry at index 0 moves out of or back - * to root: none yet gotten, safe to restart. - */ - goto restart; + slot = radix_tree_iter_retry(&iter); + continue; } /* @@ -2151,10 +2131,11 @@ repeat: if (unlikely(!page)) goto next; if (radix_tree_exception(page)) { - if (radix_tree_deref_retry(page)) - break; - else - goto next; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } + goto next; } if (!page_cache_get_speculative(page)) diff --git a/mm/shmem.c b/mm/shmem.c index fa2ceb2d2655..6ec14b70d82d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -388,8 +388,10 @@ restart: * don't need to reset the counter, nor do we risk infinite * restarts. */ - if (radix_tree_deref_retry(page)) - goto restart; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } if (radix_tree_exceptional_entry(page)) swapped++; @@ -1952,8 +1954,10 @@ restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { page = radix_tree_deref_slot(slot); if (!page || radix_tree_exception(page)) { - if (radix_tree_deref_retry(page)) - goto restart; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } } else if (page_count(page) - page_mapcount(page) > 1) { spin_lock_irq(&mapping->tree_lock); radix_tree_tag_set(&mapping->page_tree, iter.index, @@ -2007,8 +2011,10 @@ restart: page = radix_tree_deref_slot(slot); if (radix_tree_exception(page)) { - if (radix_tree_deref_retry(page)) - goto restart; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } page = NULL; } -- 2.7.0.rc3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Matthew Wilcox To: Andrew Morton , Hugh Dickins Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 3/5] btrfs: Use radix_tree_iter_retry() Date: Wed, 27 Jan 2016 16:17:50 -0500 Message-Id: <1453929472-25566-4-git-send-email-matthew.r.wilcox@intel.com> In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: owner-linux-mm@kvack.org List-ID: From: Matthew Wilcox Even though this is a 'can't happen' situation, use the new radix_tree_iter_retry() pattern to eliminate a goto. --- fs/btrfs/tests/btrfs-tests.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index b1d920b30070..0da78c54317a 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -137,7 +137,6 @@ static void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info) void **slot; spin_lock(&fs_info->buffer_lock); -restart: radix_tree_for_each_slot(slot, &fs_info->buffer_radix, &iter, 0) { struct extent_buffer *eb; @@ -147,7 +146,7 @@ restart: /* Shouldn't happen but that kind of thinking creates CVE's */ if (radix_tree_exception(eb)) { if (radix_tree_deref_retry(eb)) - goto restart; + slot = radix_tree_iter_retry(iter); continue; } spin_unlock(&fs_info->buffer_lock); -- 2.7.0.rc3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Matthew Wilcox To: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org Subject: [PATCH 2/5] hwspinlock: Fix race between radix tree insertion and lookup Date: Wed, 27 Jan 2016 16:17:49 -0500 Message-Id: <1453929472-25566-3-git-send-email-matthew.r.wilcox@intel.com> In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: owner-linux-mm@kvack.org List-ID: From: Matthew Wilcox of_hwspin_lock_get_id() is protected by the RCU lock, which means that insertions can occur simultaneously with the lookup. If the radix tree transitions from a height of 0, we can see a slot with the indirect_ptr bit set, which will cause us to at least read random memory, and could cause other havoc. Fix this by using the newly introduced radix_tree_iter_retry(). Signed-off-by: Matthew Wilcox Cc: stable@vger.kernel.org --- drivers/hwspinlock/hwspinlock_core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/hwspinlock/hwspinlock_core.c b/drivers/hwspinlock/hwspinlock_core.c index 52f708bcf77f..d50c701b19d6 100644 --- a/drivers/hwspinlock/hwspinlock_core.c +++ b/drivers/hwspinlock/hwspinlock_core.c @@ -313,6 +313,10 @@ int of_hwspin_lock_get_id(struct device_node *np, int index) hwlock = radix_tree_deref_slot(slot); if (unlikely(!hwlock)) continue; + if (radix_tree_is_indirect_ptr(hwlock)) { + slot = radix_tree_iter_retry(&iter); + continue; + } if (hwlock->bank->dev->of_node == args.np) { ret = 0; -- 2.7.0.rc3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Date: Thu, 28 Jan 2016 10:17:41 +0300 Message-ID: Subject: Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox wrote: > From: Matthew Wilcox > > The first two patches here are bugfixes, and I would like to see them > make their way into stable ASAP since they can lead to data corruption > (very low probabilty). > > The last three patches do not qualify as bugfixes. They simply improve > the standard pattern used to do radix tree iterations by removing the > 'goto restart' part. Partially this is because this is an ugly & > confusing goto, and partially because with multi-order entries in the > tree, it'll be more likely that we'll see an indirect_ptr bit, and > it's more efficient to kep going from the point of the iteration we're > currently in than restart from the beginning each time. Ack whole set. I think we should go deeper in hide dereference/retry inside iterator. Something like radix_tree_for_each_data(data, slot, root, iter, start). I'll prepare patch for that. > > Matthew Wilcox (5): > radix-tree: Fix race in gang lookup > hwspinlock: Fix race between radix tree insertion and lookup > btrfs: Use radix_tree_iter_retry() > mm: Use radix_tree_iter_retry() > radix-tree,shmem: Introduce radix_tree_iter_next() > > drivers/hwspinlock/hwspinlock_core.c | 4 +++ > fs/btrfs/tests/btrfs-tests.c | 3 +- > include/linux/radix-tree.h | 31 +++++++++++++++++++++ > lib/radix-tree.c | 12 ++++++-- > mm/filemap.c | 53 ++++++++++++------------------------ > mm/shmem.c | 30 ++++++++++---------- > 6 files changed, 78 insertions(+), 55 deletions(-) > > -- > 2.7.0.rc3 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Matthew Wilcox To: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 0/5] Fix races & improve the radix tree iterator patterns Date: Wed, 27 Jan 2016 16:17:47 -0500 Message-Id: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: owner-linux-mm@kvack.org List-ID: From: Matthew Wilcox The first two patches here are bugfixes, and I would like to see them make their way into stable ASAP since they can lead to data corruption (very low probabilty). The last three patches do not qualify as bugfixes. They simply improve the standard pattern used to do radix tree iterations by removing the 'goto restart' part. Partially this is because this is an ugly & confusing goto, and partially because with multi-order entries in the tree, it'll be more likely that we'll see an indirect_ptr bit, and it's more efficient to kep going from the point of the iteration we're currently in than restart from the beginning each time. Matthew Wilcox (5): radix-tree: Fix race in gang lookup hwspinlock: Fix race between radix tree insertion and lookup btrfs: Use radix_tree_iter_retry() mm: Use radix_tree_iter_retry() radix-tree,shmem: Introduce radix_tree_iter_next() drivers/hwspinlock/hwspinlock_core.c | 4 +++ fs/btrfs/tests/btrfs-tests.c | 3 +- include/linux/radix-tree.h | 31 +++++++++++++++++++++ lib/radix-tree.c | 12 ++++++-- mm/filemap.c | 53 ++++++++++++------------------------ mm/shmem.c | 30 ++++++++++---------- 6 files changed, 78 insertions(+), 55 deletions(-) -- 2.7.0.rc3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Matthew Wilcox To: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org Subject: [PATCH 1/5] radix-tree: Fix race in gang lookup Date: Wed, 27 Jan 2016 16:17:48 -0500 Message-Id: <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: owner-linux-mm@kvack.org List-ID: From: Matthew Wilcox If the indirect_ptr bit is set on a slot, that indicates we need to redo the lookup. Introduce a new function radix_tree_iter_retry() which forces the loop to retry the lookup by setting 'slot' to NULL and turning the iterator back to point at the problematic entry. This is a pretty rare problem to hit at the moment; the lookup has to race with a grow of the radix tree from a height of 0. The consequences of hitting this race are that gang lookup could return a pointer to a radix_tree_node instead of a pointer to whatever the user had inserted in the tree. Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") Signed-off-by: Matthew Wilcox Cc: stable@vger.kernel.org --- include/linux/radix-tree.h | 16 ++++++++++++++++ lib/radix-tree.c | 12 ++++++++++-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index f9a3da5bf892..db0ed595749b 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root, struct radix_tree_iter *iter, unsigned flags); /** + * radix_tree_iter_retry - retry this chunk of the iteration + * @iter: iterator state + * + * If we iterate over a tree protected only by the RCU lock, a race + * against deletion or creation may result in seeing a slot for which + * radix_tree_deref_retry() returns true. If so, call this function + * and continue the iteration. + */ +static inline __must_check +void **radix_tree_iter_retry(struct radix_tree_iter *iter) +{ + iter->next_index = iter->index; + return NULL; +} + +/** * radix_tree_chunk_size - get current chunk size * * @iter: pointer to radix tree iterator diff --git a/lib/radix-tree.c b/lib/radix-tree.c index a25f635dcc56..65422ac17114 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results, return 0; radix_tree_for_each_slot(slot, root, &iter, first_index) { - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); + results[ret] = rcu_dereference_raw(*slot); if (!results[ret]) continue; + if (radix_tree_is_indirect_ptr(results[ret])) { + slot = radix_tree_iter_retry(&iter); + continue; + } if (++ret == max_items) break; } @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, return 0; radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) { - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); + results[ret] = rcu_dereference_raw(*slot); if (!results[ret]) continue; + if (radix_tree_is_indirect_ptr(results[ret])) { + slot = radix_tree_iter_retry(&iter); + continue; + } if (++ret == max_items) break; } -- 2.7.0.rc3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Matthew Wilcox To: Andrew Morton , Hugh Dickins Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next() Date: Wed, 27 Jan 2016 16:17:52 -0500 Message-Id: <1453929472-25566-6-git-send-email-matthew.r.wilcox@intel.com> In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: owner-linux-mm@kvack.org List-ID: From: Matthew Wilcox shmem likes to occasionally drop the lock, schedule, then reacqire the lock and continue with the iteration from the last place it left off. This is currently done with a pretty ugly goto. Introduce radix_tree_iter_next() and use it throughout shmem.c. Signed-off-by: Matthew Wilcox --- include/linux/radix-tree.h | 15 +++++++++++++++ mm/shmem.c | 12 +++--------- 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index db0ed595749b..dec2c6c77eea 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter) } /** + * radix_tree_iter_next - resume iterating when the chunk may be invalid + * @iter: iterator state + * + * If the iterator needs to release then reacquire a lock, the chunk may + * have been invalidated by an insertion or deletion. Call this function + * to continue the iteration from the next index. + */ +static inline __must_check +void **radix_tree_iter_next(struct radix_tree_iter *iter) +{ + iter->next_index = iter->index + 1; + return NULL; +} + +/** * radix_tree_chunk_size - get current chunk size * * @iter: pointer to radix tree iterator diff --git a/mm/shmem.c b/mm/shmem.c index 6ec14b70d82d..438ea8004c26 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, rcu_read_lock(); -restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { if (iter.index >= end) break; @@ -398,8 +397,7 @@ restart: if (need_resched()) { cond_resched_rcu(); - start = iter.index + 1; - goto restart; + slot = radix_tree_iter_next(&iter); } } @@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping) start = 0; rcu_read_lock(); -restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { page = radix_tree_deref_slot(slot); if (!page || radix_tree_exception(page)) { @@ -1967,8 +1964,7 @@ restart: if (need_resched()) { cond_resched_rcu(); - start = iter.index + 1; - goto restart; + slot = radix_tree_iter_next(&iter); } } rcu_read_unlock(); @@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping) start = 0; rcu_read_lock(); -restart: radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, start, SHMEM_TAG_PINNED) { @@ -2039,8 +2034,7 @@ restart: continue_resched: if (need_resched()) { cond_resched_rcu(); - start = iter.index + 1; - goto restart; + slot = radix_tree_iter_next(&iter); } } rcu_read_unlock(); -- 2.7.0.rc3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH 4/5] mm: Use radix_tree_iter_retry() To: Matthew Wilcox , Andrew Morton , Hugh Dickins References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org From: Sasha Levin Message-ID: <56C758A0.4060600@oracle.com> Date: Fri, 19 Feb 2016 13:02:08 -0500 MIME-Version: 1.0 In-Reply-To: <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: On 01/27/2016 04:17 PM, Matthew Wilcox wrote: > From: Matthew Wilcox > > Instead of a 'goto restart', we can now use radix_tree_iter_retry() > to restart from our current position. This will make a difference > when there are more ways to happen across an indirect pointer. And it > eliminates some confusing gotos. Hey Matthew, I'm seeing the following NULL ptr deref while fuzzing: [ 3380.120501] general protection fault: 0000 [#1] SMP KASAN [ 3380.120529] Modules linked in: [ 3380.120555] CPU: 2 PID: 23271 Comm: syz-executor Not tainted 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978 [ 3380.120569] task: ffff8800a5181000 ti: ffff8801a63b8000 task.ti: ffff8801a63b8000 [ 3380.120681] RIP: shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100) [ 3380.120692] RSP: 0018:ffff8801a63bfd58 EFLAGS: 00010202 [ 3380.120703] RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 0000000000940000 [ 3380.120714] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8800a5181b3c [ 3380.120725] RBP: ffff8801a63bfe58 R08: ffff8800a5181b40 R09: 0000000000000001 [ 3380.120736] R10: fffff44e6f425fff R11: ffffffffbdb0a420 R12: 0000000000000008 [ 3380.120745] R13: 0000000000000001 R14: 0000000000000001 R15: ffffea0002ad1660 [ 3380.120759] FS: 00007fbc71e9c700(0000) GS:ffff8801d3c00000(0000) knlGS:0000000000000000 [ 3380.120769] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3380.120780] CR2: 0000000020010ff7 CR3: 00000001a0728000 CR4: 00000000000406e0 [ 3380.120794] Stack: [ 3380.120815] ffffffffa2738239 00000008a63bfdc8 ffff8800a499e740 1ffff10034c77fba [ 3380.120834] ffff8801ac446da0 ffff8800a499e8f0 0000000000000000 1ffff10034c77001 [ 3380.120852] ffff8801a63b8000 ffff8801a63b8008 ffff8801ac446f90 ffff8801ac446f98 [ 3380.120856] Call Trace: [ 3380.120929] shmem_fcntl (mm/shmem.c:2135) [ 3380.120963] SyS_fcntl (fs/fcntl.c:336 fs/fcntl.c:372 fs/fcntl.c:357) [ 3380.121112] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200) [ 3380.122294] Code: c7 45 a0 00 00 00 00 e9 86 02 00 00 e8 cf a8 ee ff 4d 85 e4 0f 84 b2 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 d4 08 00 00 49 8b 1c 24 e8 12 34 de ff 85 c0 All code ======== 0: c7 45 a0 00 00 00 00 movl $0x0,-0x60(%rbp) 7: e9 86 02 00 00 jmpq 0x292 c: e8 cf a8 ee ff callq 0xffffffffffeea8e0 11: 4d 85 e4 test %r12,%r12 14: 0f 84 b2 07 00 00 je 0x7cc 1a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 21: fc ff df 24: 4c 89 e2 mov %r12,%rdx 27: 48 c1 ea 03 shr $0x3,%rdx 2b:* 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction 2f: 0f 85 d4 08 00 00 jne 0x909 35: 49 8b 1c 24 mov (%r12),%rbx 39: e8 12 34 de ff callq 0xffffffffffde3450 3e: 85 c0 test %eax,%eax ... Code starting with the faulting instruction =========================================== 0: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) 4: 0f 85 d4 08 00 00 jne 0x8de a: 49 8b 1c 24 mov (%r12),%rbx e: e8 12 34 de ff callq 0xffffffffffde3425 13: 85 c0 test %eax,%eax ... [ 3380.122312] RIP shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100) [ 3380.122317] RSP Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 1 Feb 2016 15:34:32 +0100 From: David Sterba To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 3/5] btrfs: Use radix_tree_iter_retry() Message-ID: <20160201143432.GH31992@twin.jikos.cz> Reply-To: dsterba@suse.cz References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-4-git-send-email-matthew.r.wilcox@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1453929472-25566-4-git-send-email-matthew.r.wilcox@intel.com> Sender: owner-linux-mm@kvack.org List-ID: On Wed, Jan 27, 2016 at 04:17:50PM -0500, Matthew Wilcox wrote: > From: Matthew Wilcox > > Even though this is a 'can't happen' situation, use the new > radix_tree_iter_retry() pattern to eliminate a goto. Andrew's tree contains a fixup for a build failure > @@ -147,7 +146,7 @@ restart: > /* Shouldn't happen but that kind of thinking creates CVE's */ > if (radix_tree_exception(eb)) { > if (radix_tree_deref_retry(eb)) > - goto restart; > + slot = radix_tree_iter_retry(iter); slot = radix_tree_iter_retry(&iter); http://ozlabs.org/~akpm/mmots/broken-out/btrfs-use-radix_tree_iter_retry-fix.patch -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Date: Wed, 3 Feb 2016 09:27:03 +0300 Message-ID: Subject: Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: On Thu, Jan 28, 2016 at 10:17 AM, Konstantin Khlebnikov wrote: > On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox > wrote: >> From: Matthew Wilcox >> >> The first two patches here are bugfixes, and I would like to see them >> make their way into stable ASAP since they can lead to data corruption >> (very low probabilty). >> >> The last three patches do not qualify as bugfixes. They simply improve >> the standard pattern used to do radix tree iterations by removing the >> 'goto restart' part. Partially this is because this is an ugly & >> confusing goto, and partially because with multi-order entries in the >> tree, it'll be more likely that we'll see an indirect_ptr bit, and >> it's more efficient to kep going from the point of the iteration we're >> currently in than restart from the beginning each time. > > Ack whole set. > > I think we should go deeper in hide dereference/retry inside iterator. > Something like radix_tree_for_each_data(data, slot, root, iter, start). > I'll prepare patch for that. After second thought: there'ra not so many users for new sugar. This scheme with radix_tree_deref_retry - radix_tree_iter_retry complicated but fine. > >> >> Matthew Wilcox (5): >> radix-tree: Fix race in gang lookup >> hwspinlock: Fix race between radix tree insertion and lookup >> btrfs: Use radix_tree_iter_retry() >> mm: Use radix_tree_iter_retry() >> radix-tree,shmem: Introduce radix_tree_iter_next() >> >> drivers/hwspinlock/hwspinlock_core.c | 4 +++ >> fs/btrfs/tests/btrfs-tests.c | 3 +- >> include/linux/radix-tree.h | 31 +++++++++++++++++++++ >> lib/radix-tree.c | 12 ++++++-- >> mm/filemap.c | 53 ++++++++++++------------------------ >> mm/shmem.c | 30 ++++++++++---------- >> 6 files changed, 78 insertions(+), 55 deletions(-) >> >> -- >> 2.7.0.rc3 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> Date: Thu, 4 Feb 2016 11:44:02 +0300 Message-ID: Subject: Re: [PATCH 1/5] radix-tree: Fix race in gang lookup From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" , Stable Content-Type: multipart/mixed; boundary=001a113dc134935644052aedbcc2 Sender: owner-linux-mm@kvack.org List-ID: --001a113dc134935644052aedbcc2 Content-Type: text/plain; charset=UTF-8 On Thu, Feb 4, 2016 at 12:37 AM, Konstantin Khlebnikov wrote: > On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox > wrote: >> From: Matthew Wilcox >> >> If the indirect_ptr bit is set on a slot, that indicates we need to >> redo the lookup. Introduce a new function radix_tree_iter_retry() >> which forces the loop to retry the lookup by setting 'slot' to NULL and >> turning the iterator back to point at the problematic entry. >> >> This is a pretty rare problem to hit at the moment; the lookup has to >> race with a grow of the radix tree from a height of 0. The consequences >> of hitting this race are that gang lookup could return a pointer to a >> radix_tree_node instead of a pointer to whatever the user had inserted >> in the tree. >> >> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") >> Signed-off-by: Matthew Wilcox >> Cc: stable@vger.kernel.org >> --- >> include/linux/radix-tree.h | 16 ++++++++++++++++ >> lib/radix-tree.c | 12 ++++++++++-- >> 2 files changed, 26 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h >> index f9a3da5bf892..db0ed595749b 100644 >> --- a/include/linux/radix-tree.h >> +++ b/include/linux/radix-tree.h >> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root, >> struct radix_tree_iter *iter, unsigned flags); >> >> /** >> + * radix_tree_iter_retry - retry this chunk of the iteration >> + * @iter: iterator state >> + * >> + * If we iterate over a tree protected only by the RCU lock, a race >> + * against deletion or creation may result in seeing a slot for which >> + * radix_tree_deref_retry() returns true. If so, call this function >> + * and continue the iteration. >> + */ >> +static inline __must_check >> +void **radix_tree_iter_retry(struct radix_tree_iter *iter) >> +{ >> + iter->next_index = iter->index; >> + return NULL; >> +} >> + >> +/** >> * radix_tree_chunk_size - get current chunk size >> * >> * @iter: pointer to radix tree iterator >> diff --git a/lib/radix-tree.c b/lib/radix-tree.c >> index a25f635dcc56..65422ac17114 100644 >> --- a/lib/radix-tree.c >> +++ b/lib/radix-tree.c >> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results, >> return 0; >> >> radix_tree_for_each_slot(slot, root, &iter, first_index) { >> - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); >> + results[ret] = rcu_dereference_raw(*slot); >> if (!results[ret]) >> continue; >> + if (radix_tree_is_indirect_ptr(results[ret])) { >> + slot = radix_tree_iter_retry(&iter); >> + continue; >> + } >> if (++ret == max_items) >> break; >> } > > Looks like your fix doesn't work. > > After radix_tree_iter_retry: radix_tree_for_each_slot will call > radix_tree_next_slot which isn't safe to call for NULL slot. > > #define radix_tree_for_each_slot(slot, root, iter, start) \ > for (slot = radix_tree_iter_init(iter, start) ; \ > slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \ > slot = radix_tree_next_slot(slot, iter, 0)) > > tagged iterator works becase restart happens only at root - tags > filled with single bit. > > quick (untested) fix for that > > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -457,9 +457,9 @@ radix_tree_next_slot(void **slot, struct > radix_tree_iter *iter, unsigned flags) > return slot + offset + 1; > } > } else { > - unsigned size = radix_tree_chunk_size(iter) - 1; > + int size = radix_tree_chunk_size(iter) - 1; > > - while (size--) { > + while (size-- > 0) { > slot++; > iter->index++; > if (likely(*slot)) > > Yep. Kernel crashes. Test in attachment. fix: https://lkml.kernel.org/r/145457528789.31321.4441662473067711123.stgit@zurg >> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, >> return 0; >> >> radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) { >> - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); >> + results[ret] = rcu_dereference_raw(*slot); >> if (!results[ret]) >> continue; >> + if (radix_tree_is_indirect_ptr(results[ret])) { >> + slot = radix_tree_iter_retry(&iter); >> + continue; >> + } >> if (++ret == max_items) >> break; >> } >> -- >> 2.7.0.rc3 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org --001a113dc134935644052aedbcc2 Content-Type: application/octet-stream; name=radix-tree-test-radix_tree_iter_retry Content-Disposition: attachment; filename=radix-tree-test-radix_tree_iter_retry Content-Transfer-Encoding: base64 X-Attachment-Id: f_ik80mj2u0 cmFkaXgtdHJlZTogdGVzdCByYWRpeF90cmVlX2l0ZXJfcmV0cnkKCkZyb206IEtvbnN0YW50aW4g S2hsZWJuaWtvdiA8a29jdDlpQGdtYWlsLmNvbT4KClNpZ25lZC1vZmYtYnk6IEtvbnN0YW50aW4g S2hsZWJuaWtvdiA8a29jdDlpQGdtYWlsLmNvbT4KLS0tCiBsaWIvcmFkaXgtdHJlZS5jIHwgICA2 MiArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysK IDEgZmlsZSBjaGFuZ2VkLCA2MiBpbnNlcnRpb25zKCspCgpkaWZmIC0tZ2l0IGEvbGliL3JhZGl4 LXRyZWUuYyBiL2xpYi9yYWRpeC10cmVlLmMKaW5kZXggNmI3OWU5MDI2ZTI0Li5mNDg5MzM0Yjlj YjcgMTAwNjQ0Ci0tLSBhL2xpYi9yYWRpeC10cmVlLmMKKysrIGIvbGliL3JhZGl4LXRyZWUuYwpA QCAtMTQ5MSw2ICsxNDkxLDY2IEBAIHN0YXRpYyBpbnQgcmFkaXhfdHJlZV9jYWxsYmFjayhzdHJ1 Y3Qgbm90aWZpZXJfYmxvY2sgKm5mYiwKICAgICAgICByZXR1cm4gTk9USUZZX09LOwogfQogCitz dGF0aWMgdm9pZCB0ZXN0X2l0ZXJfcmV0cnkodm9pZCkKK3sKKwlSQURJWF9UUkVFKHJvb3QsIEdG UF9LRVJORUwpOworCXZvaWQgKnB0ciA9ICh2b2lkICopNHVsOworCXN0cnVjdCByYWRpeF90cmVl X2l0ZXIgaXRlcjsKKwl2b2lkICoqc2xvdDsKKwlib29sIGZpcnN0OworCisJcmFkaXhfdHJlZV9p bnNlcnQoJnJvb3QsIDAsIHB0cik7CisJcmFkaXhfdHJlZV90YWdfc2V0KCZyb290LCAwLCAwKTsK KworCWZpcnN0ID0gdHJ1ZTsKKwlyYWRpeF90cmVlX2Zvcl9lYWNoX3RhZ2dlZChzbG90LCAmcm9v dCwgJml0ZXIsIDAsIDApIHsKKwkJcHJpbnRrKCJ0YWdnZWQgJWxkICVwXG4iLCBpdGVyLmluZGV4 LCAqc2xvdCk7CisJCWlmIChmaXJzdCkgeworCQkJcmFkaXhfdHJlZV9pbnNlcnQoJnJvb3QsIDEs IHB0cik7CisJCQlyYWRpeF90cmVlX3RhZ19zZXQoJnJvb3QsIDEsIDApOworCQkJZmlyc3QgPSBm YWxzZTsKKwkJfQorCQlpZiAocmFkaXhfdHJlZV9kZXJlZl9yZXRyeSgqc2xvdCkpIHsKKwkJCXBy aW50aygicmV0cnkgJWxkXG4iLCBpdGVyLmluZGV4KTsKKwkJCXNsb3QgPSByYWRpeF90cmVlX2l0 ZXJfcmV0cnkoJml0ZXIpOworCQkJY29udGludWU7CisJCX0KKwl9CisJcmFkaXhfdHJlZV9kZWxl dGUoJnJvb3QsIDEpOworCisJZmlyc3QgPSB0cnVlOworCXJhZGl4X3RyZWVfZm9yX2VhY2hfc2xv dChzbG90LCAmcm9vdCwgJml0ZXIsIDApIHsKKwkJcHJpbnRrKCJzbG90ICVsZCAlcFxuIiwgaXRl ci5pbmRleCwgKnNsb3QpOworCQlpZiAoZmlyc3QpIHsKKwkJCXJhZGl4X3RyZWVfaW5zZXJ0KCZy b290LCAxLCBwdHIpOworCQkJZmlyc3QgPSBmYWxzZTsKKwkJfQorCQlpZiAocmFkaXhfdHJlZV9k ZXJlZl9yZXRyeSgqc2xvdCkpIHsKKwkJCXByaW50aygicmV0cnkgJWxkXG4iLCBpdGVyLmluZGV4 KTsKKwkJCXNsb3QgPSByYWRpeF90cmVlX2l0ZXJfcmV0cnkoJml0ZXIpOworCQkJY29udGludWU7 CisJCX0KKwl9CisJcmFkaXhfdHJlZV9kZWxldGUoJnJvb3QsIDEpOworCisJZmlyc3QgPSB0cnVl OworCXJhZGl4X3RyZWVfZm9yX2VhY2hfY29udGlnKHNsb3QsICZyb290LCAmaXRlciwgMCkgewor CQlwcmludGsoImNvbnRpZyAlbGQgJXBcbiIsIGl0ZXIuaW5kZXgsICpzbG90KTsKKwkJaWYgKGZp cnN0KSB7CisJCQlyYWRpeF90cmVlX2luc2VydCgmcm9vdCwgMSwgcHRyKTsKKwkJCWZpcnN0ID0g ZmFsc2U7CisJCX0KKwkJaWYgKHJhZGl4X3RyZWVfZGVyZWZfcmV0cnkoKnNsb3QpKSB7CisJCQlw cmludGsoInJldHJ5ICVsZFxuIiwgaXRlci5pbmRleCk7CisJCQlzbG90ID0gcmFkaXhfdHJlZV9p dGVyX3JldHJ5KCZpdGVyKTsKKwkJCWNvbnRpbnVlOworCQl9CisJfQorCisJcmFkaXhfdHJlZV9k ZWxldGUoJnJvb3QsIDApOworCXJhZGl4X3RyZWVfZGVsZXRlKCZyb290LCAxKTsKK30KKwogdm9p ZCBfX2luaXQgcmFkaXhfdHJlZV9pbml0KHZvaWQpCiB7CiAJcmFkaXhfdHJlZV9ub2RlX2NhY2hl cCA9IGttZW1fY2FjaGVfY3JlYXRlKCJyYWRpeF90cmVlX25vZGUiLApAQCAtMTQ5OSw0ICsxNTU5 LDYgQEAgdm9pZCBfX2luaXQgcmFkaXhfdHJlZV9pbml0KHZvaWQpCiAJCQlyYWRpeF90cmVlX25v ZGVfY3Rvcik7CiAJcmFkaXhfdHJlZV9pbml0X21heGluZGV4KCk7CiAJaG90Y3B1X25vdGlmaWVy KHJhZGl4X3RyZWVfY2FsbGJhY2ssIDApOworCisJdGVzdF9pdGVyX3JldHJ5KCk7CiB9Cg== --001a113dc134935644052aedbcc2-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> Date: Thu, 4 Feb 2016 00:37:17 +0300 Message-ID: Subject: Re: [PATCH 1/5] radix-tree: Fix race in gang lookup From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" , Stable Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox wrote: > From: Matthew Wilcox > > If the indirect_ptr bit is set on a slot, that indicates we need to > redo the lookup. Introduce a new function radix_tree_iter_retry() > which forces the loop to retry the lookup by setting 'slot' to NULL and > turning the iterator back to point at the problematic entry. > > This is a pretty rare problem to hit at the moment; the lookup has to > race with a grow of the radix tree from a height of 0. The consequences > of hitting this race are that gang lookup could return a pointer to a > radix_tree_node instead of a pointer to whatever the user had inserted > in the tree. > > Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") > Signed-off-by: Matthew Wilcox > Cc: stable@vger.kernel.org > --- > include/linux/radix-tree.h | 16 ++++++++++++++++ > lib/radix-tree.c | 12 ++++++++++-- > 2 files changed, 26 insertions(+), 2 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index f9a3da5bf892..db0ed595749b 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root, > struct radix_tree_iter *iter, unsigned flags); > > /** > + * radix_tree_iter_retry - retry this chunk of the iteration > + * @iter: iterator state > + * > + * If we iterate over a tree protected only by the RCU lock, a race > + * against deletion or creation may result in seeing a slot for which > + * radix_tree_deref_retry() returns true. If so, call this function > + * and continue the iteration. > + */ > +static inline __must_check > +void **radix_tree_iter_retry(struct radix_tree_iter *iter) > +{ > + iter->next_index = iter->index; > + return NULL; > +} > + > +/** > * radix_tree_chunk_size - get current chunk size > * > * @iter: pointer to radix tree iterator > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index a25f635dcc56..65422ac17114 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results, > return 0; > > radix_tree_for_each_slot(slot, root, &iter, first_index) { > - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); > + results[ret] = rcu_dereference_raw(*slot); > if (!results[ret]) > continue; > + if (radix_tree_is_indirect_ptr(results[ret])) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > if (++ret == max_items) > break; > } Looks like your fix doesn't work. After radix_tree_iter_retry: radix_tree_for_each_slot will call radix_tree_next_slot which isn't safe to call for NULL slot. #define radix_tree_for_each_slot(slot, root, iter, start) \ for (slot = radix_tree_iter_init(iter, start) ; \ slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \ slot = radix_tree_next_slot(slot, iter, 0)) tagged iterator works becase restart happens only at root - tags filled with single bit. quick (untested) fix for that --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -457,9 +457,9 @@ radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) return slot + offset + 1; } } else { - unsigned size = radix_tree_chunk_size(iter) - 1; + int size = radix_tree_chunk_size(iter) - 1; - while (size--) { + while (size-- > 0) { slot++; iter->index++; if (likely(*slot)) > @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, > return 0; > > radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) { > - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); > + results[ret] = rcu_dereference_raw(*slot); > if (!results[ret]) > continue; > + if (radix_tree_is_indirect_ptr(results[ret])) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > if (++ret == max_items) > break; > } > -- > 2.7.0.rc3 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <1453929472-25566-6-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-6-git-send-email-matthew.r.wilcox@intel.com> Date: Thu, 4 Feb 2016 11:50:35 +0300 Message-ID: Subject: Re: [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next() From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox wrote: > From: Matthew Wilcox > > shmem likes to occasionally drop the lock, schedule, then reacqire > the lock and continue with the iteration from the last place it > left off. This is currently done with a pretty ugly goto. Introduce > radix_tree_iter_next() and use it throughout shmem.c. > > Signed-off-by: Matthew Wilcox > --- > include/linux/radix-tree.h | 15 +++++++++++++++ > mm/shmem.c | 12 +++--------- > 2 files changed, 18 insertions(+), 9 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index db0ed595749b..dec2c6c77eea 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter) > } > > /** > + * radix_tree_iter_next - resume iterating when the chunk may be invalid > + * @iter: iterator state > + * > + * If the iterator needs to release then reacquire a lock, the chunk may > + * have been invalidated by an insertion or deletion. Call this function > + * to continue the iteration from the next index. > + */ > +static inline __must_check > +void **radix_tree_iter_next(struct radix_tree_iter *iter) > +{ > + iter->next_index = iter->index + 1; > + return NULL; > +} > + > +/** This works for normal iterator but not for tagged. It must also reset iter->tags to zero. > * radix_tree_chunk_size - get current chunk size > * > * @iter: pointer to radix tree iterator > diff --git a/mm/shmem.c b/mm/shmem.c > index 6ec14b70d82d..438ea8004c26 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, > > rcu_read_lock(); > > -restart: > radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { > if (iter.index >= end) > break; > @@ -398,8 +397,7 @@ restart: > > if (need_resched()) { > cond_resched_rcu(); > - start = iter.index + 1; > - goto restart; > + slot = radix_tree_iter_next(&iter); > } > } > > @@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping) > start = 0; > rcu_read_lock(); > > -restart: > radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { > page = radix_tree_deref_slot(slot); > if (!page || radix_tree_exception(page)) { > @@ -1967,8 +1964,7 @@ restart: > > if (need_resched()) { > cond_resched_rcu(); > - start = iter.index + 1; > - goto restart; > + slot = radix_tree_iter_next(&iter); > } > } > rcu_read_unlock(); > @@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping) > > start = 0; > rcu_read_lock(); > -restart: > radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, > start, SHMEM_TAG_PINNED) { > > @@ -2039,8 +2034,7 @@ restart: > continue_resched: > if (need_resched()) { > cond_resched_rcu(); > - start = iter.index + 1; > - goto restart; > + slot = radix_tree_iter_next(&iter); > } > } > rcu_read_unlock(); > -- > 2.7.0.rc3 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <56D98BDD.3060806@huawei.com> Date: Fri, 4 Mar 2016 21:21:33 +0800 From: zhong jiang MIME-Version: 1.0 To: Matthew Wilcox CC: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Konstantin Khlebnikov , , , , Subject: Re: [PATCH 1/5] radix-tree: Fix race in gang lookup References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> In-Reply-To: <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: stable-owner@vger.kernel.org List-ID: On 2016/1/28 5:17, Matthew Wilcox wrote: > From: Matthew Wilcox > > If the indirect_ptr bit is set on a slot, that indicates we need to > redo the lookup. Introduce a new function radix_tree_iter_retry() > which forces the loop to retry the lookup by setting 'slot' to NULL and > turning the iterator back to point at the problematic entry. > > This is a pretty rare problem to hit at the moment; the lookup has to > race with a grow of the radix tree from a height of 0. The consequences > of hitting this race are that gang lookup could return a pointer to a > radix_tree_node instead of a pointer to whatever the user had inserted > in the tree. > > Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") > Signed-off-by: Matthew Wilcox > Cc: stable@vger.kernel.org > --- > include/linux/radix-tree.h | 16 ++++++++++++++++ > lib/radix-tree.c | 12 ++++++++++-- > 2 files changed, 26 insertions(+), 2 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index f9a3da5bf892..db0ed595749b 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root, > struct radix_tree_iter *iter, unsigned flags); > > /** > + * radix_tree_iter_retry - retry this chunk of the iteration > + * @iter: iterator state > + * > + * If we iterate over a tree protected only by the RCU lock, a race > + * against deletion or creation may result in seeing a slot for which > + * radix_tree_deref_retry() returns true. If so, call this function > + * and continue the iteration. > + */ > +static inline __must_check > +void **radix_tree_iter_retry(struct radix_tree_iter *iter) > +{ > + iter->next_index = iter->index; > + return NULL; > +} > + > +/** > * radix_tree_chunk_size - get current chunk size > * > * @iter: pointer to radix tree iterator > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index a25f635dcc56..65422ac17114 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results, > return 0; > > radix_tree_for_each_slot(slot, root, &iter, first_index) { > - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); > + results[ret] = rcu_dereference_raw(*slot); > if (!results[ret]) > continue; > + if (radix_tree_is_indirect_ptr(results[ret])) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > if (++ret == max_items) > break; > } according to your patch, after A race occur, slot equals to null. radix_tree_next_slot() will continue to work. Therefore, it will not return the problematic entry. > @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, > return 0; > > radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) { > - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); > + results[ret] = rcu_dereference_raw(*slot); > if (!results[ret]) > continue; > + if (radix_tree_is_indirect_ptr(results[ret])) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > if (++ret == max_items) > break; > } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by kanga.kvack.org (Postfix) with ESMTP id A0CE26B0253 for ; Fri, 29 Jan 2016 09:46:02 -0500 (EST) Received: by mail-wm0-f53.google.com with SMTP id l66so57534448wml.0 for ; Fri, 29 Jan 2016 06:46:02 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id it4si22516894wjb.239.2016.01.29.06.46.01 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 29 Jan 2016 06:46:01 -0800 (PST) Subject: Re: [PATCH 4/5] mm: Use radix_tree_iter_retry() References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> From: Vlastimil Babka Message-ID: <56AB7B27.3090805@suse.cz> Date: Fri, 29 Jan 2016 15:45:59 +0100 MIME-Version: 1.0 In-Reply-To: <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox , Andrew Morton , Hugh Dickins Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org On 01/27/2016 10:17 PM, Matthew Wilcox wrote: > From: Matthew Wilcox > > Instead of a 'goto restart', we can now use radix_tree_iter_retry() > to restart from our current position. This will make a difference > when there are more ways to happen across an indirect pointer. And it > eliminates some confusing gotos. > > Signed-off-by: Matthew Wilcox [...] > diff --git a/mm/shmem.c b/mm/shmem.c > index fa2ceb2d2655..6ec14b70d82d 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -388,8 +388,10 @@ restart: > * don't need to reset the counter, nor do we risk infinite > * restarts. > */ > - if (radix_tree_deref_retry(page)) > - goto restart; > + if (radix_tree_deref_retry(page)) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > > if (radix_tree_exceptional_entry(page)) > swapped++; This should be applied on top. There are no restarts anymore. ----8<---- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by kanga.kvack.org (Postfix) with ESMTP id 80C2C6B007E for ; Fri, 4 Mar 2016 08:33:38 -0500 (EST) Received: by mail-io0-f175.google.com with SMTP id g203so62042228iof.2 for ; Fri, 04 Mar 2016 05:33:38 -0800 (PST) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com. [119.145.14.66]) by mx.google.com with ESMTPS id m8si4246122igx.42.2016.03.04.05.33.31 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 04 Mar 2016 05:33:37 -0800 (PST) Message-ID: <56D98BDD.3060806@huawei.com> Date: Fri, 4 Mar 2016 21:21:33 +0800 From: zhong jiang MIME-Version: 1.0 Subject: Re: [PATCH 1/5] radix-tree: Fix race in gang lookup References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> In-Reply-To: <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org On 2016/1/28 5:17, Matthew Wilcox wrote: > From: Matthew Wilcox > > If the indirect_ptr bit is set on a slot, that indicates we need to > redo the lookup. Introduce a new function radix_tree_iter_retry() > which forces the loop to retry the lookup by setting 'slot' to NULL and > turning the iterator back to point at the problematic entry. > > This is a pretty rare problem to hit at the moment; the lookup has to > race with a grow of the radix tree from a height of 0. The consequences > of hitting this race are that gang lookup could return a pointer to a > radix_tree_node instead of a pointer to whatever the user had inserted > in the tree. > > Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") > Signed-off-by: Matthew Wilcox > Cc: stable@vger.kernel.org > --- > include/linux/radix-tree.h | 16 ++++++++++++++++ > lib/radix-tree.c | 12 ++++++++++-- > 2 files changed, 26 insertions(+), 2 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index f9a3da5bf892..db0ed595749b 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root, > struct radix_tree_iter *iter, unsigned flags); > > /** > + * radix_tree_iter_retry - retry this chunk of the iteration > + * @iter: iterator state > + * > + * If we iterate over a tree protected only by the RCU lock, a race > + * against deletion or creation may result in seeing a slot for which > + * radix_tree_deref_retry() returns true. If so, call this function > + * and continue the iteration. > + */ > +static inline __must_check > +void **radix_tree_iter_retry(struct radix_tree_iter *iter) > +{ > + iter->next_index = iter->index; > + return NULL; > +} > + > +/** > * radix_tree_chunk_size - get current chunk size > * > * @iter: pointer to radix tree iterator > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index a25f635dcc56..65422ac17114 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results, > return 0; > > radix_tree_for_each_slot(slot, root, &iter, first_index) { > - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); > + results[ret] = rcu_dereference_raw(*slot); > if (!results[ret]) > continue; > + if (radix_tree_is_indirect_ptr(results[ret])) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > if (++ret == max_items) > break; > } according to your patch, after A race occur, slot equals to null. radix_tree_next_slot() will continue to work. Therefore, it will not return the problematic entry. > @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, > return 0; > > radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) { > - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); > + results[ret] = rcu_dereference_raw(*slot); > if (!results[ret]) > continue; > + if (radix_tree_is_indirect_ptr(results[ret])) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > if (++ret == max_items) > break; > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161838AbcA0VSK (ORCPT ); Wed, 27 Jan 2016 16:18:10 -0500 Received: from mga14.intel.com ([192.55.52.115]:34207 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161756AbcA0VSE (ORCPT ); Wed, 27 Jan 2016 16:18:04 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,356,1449561600"; d="scan'208";a="899609425" From: Matthew Wilcox To: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org Subject: [PATCH 1/5] radix-tree: Fix race in gang lookup Date: Wed, 27 Jan 2016 16:17:48 -0500 Message-Id: <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> X-Mailer: git-send-email 2.7.0.rc3 In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matthew Wilcox If the indirect_ptr bit is set on a slot, that indicates we need to redo the lookup. Introduce a new function radix_tree_iter_retry() which forces the loop to retry the lookup by setting 'slot' to NULL and turning the iterator back to point at the problematic entry. This is a pretty rare problem to hit at the moment; the lookup has to race with a grow of the radix tree from a height of 0. The consequences of hitting this race are that gang lookup could return a pointer to a radix_tree_node instead of a pointer to whatever the user had inserted in the tree. Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") Signed-off-by: Matthew Wilcox Cc: stable@vger.kernel.org --- include/linux/radix-tree.h | 16 ++++++++++++++++ lib/radix-tree.c | 12 ++++++++++-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index f9a3da5bf892..db0ed595749b 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root, struct radix_tree_iter *iter, unsigned flags); /** + * radix_tree_iter_retry - retry this chunk of the iteration + * @iter: iterator state + * + * If we iterate over a tree protected only by the RCU lock, a race + * against deletion or creation may result in seeing a slot for which + * radix_tree_deref_retry() returns true. If so, call this function + * and continue the iteration. + */ +static inline __must_check +void **radix_tree_iter_retry(struct radix_tree_iter *iter) +{ + iter->next_index = iter->index; + return NULL; +} + +/** * radix_tree_chunk_size - get current chunk size * * @iter: pointer to radix tree iterator diff --git a/lib/radix-tree.c b/lib/radix-tree.c index a25f635dcc56..65422ac17114 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results, return 0; radix_tree_for_each_slot(slot, root, &iter, first_index) { - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); + results[ret] = rcu_dereference_raw(*slot); if (!results[ret]) continue; + if (radix_tree_is_indirect_ptr(results[ret])) { + slot = radix_tree_iter_retry(&iter); + continue; + } if (++ret == max_items) break; } @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, return 0; radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) { - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); + results[ret] = rcu_dereference_raw(*slot); if (!results[ret]) continue; + if (radix_tree_is_indirect_ptr(results[ret])) { + slot = radix_tree_iter_retry(&iter); + continue; + } if (++ret == max_items) break; } -- 2.7.0.rc3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161809AbcA0VSH (ORCPT ); Wed, 27 Jan 2016 16:18:07 -0500 Received: from mga01.intel.com ([192.55.52.88]:18548 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161751AbcA0VSB (ORCPT ); Wed, 27 Jan 2016 16:18:01 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,356,1449561600"; d="scan'208";a="735380264" From: Matthew Wilcox To: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 0/5] Fix races & improve the radix tree iterator patterns Date: Wed, 27 Jan 2016 16:17:47 -0500 Message-Id: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> X-Mailer: git-send-email 2.7.0.rc3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matthew Wilcox The first two patches here are bugfixes, and I would like to see them make their way into stable ASAP since they can lead to data corruption (very low probabilty). The last three patches do not qualify as bugfixes. They simply improve the standard pattern used to do radix tree iterations by removing the 'goto restart' part. Partially this is because this is an ugly & confusing goto, and partially because with multi-order entries in the tree, it'll be more likely that we'll see an indirect_ptr bit, and it's more efficient to kep going from the point of the iteration we're currently in than restart from the beginning each time. Matthew Wilcox (5): radix-tree: Fix race in gang lookup hwspinlock: Fix race between radix tree insertion and lookup btrfs: Use radix_tree_iter_retry() mm: Use radix_tree_iter_retry() radix-tree,shmem: Introduce radix_tree_iter_next() drivers/hwspinlock/hwspinlock_core.c | 4 +++ fs/btrfs/tests/btrfs-tests.c | 3 +- include/linux/radix-tree.h | 31 +++++++++++++++++++++ lib/radix-tree.c | 12 ++++++-- mm/filemap.c | 53 ++++++++++++------------------------ mm/shmem.c | 30 ++++++++++---------- 6 files changed, 78 insertions(+), 55 deletions(-) -- 2.7.0.rc3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968050AbcA0VSr (ORCPT ); Wed, 27 Jan 2016 16:18:47 -0500 Received: from mga04.intel.com ([192.55.52.120]:39448 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161759AbcA0VSF (ORCPT ); Wed, 27 Jan 2016 16:18:05 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,356,1449561600"; d="scan'208";a="902446018" From: Matthew Wilcox To: Andrew Morton , Hugh Dickins Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next() Date: Wed, 27 Jan 2016 16:17:52 -0500 Message-Id: <1453929472-25566-6-git-send-email-matthew.r.wilcox@intel.com> X-Mailer: git-send-email 2.7.0.rc3 In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matthew Wilcox shmem likes to occasionally drop the lock, schedule, then reacqire the lock and continue with the iteration from the last place it left off. This is currently done with a pretty ugly goto. Introduce radix_tree_iter_next() and use it throughout shmem.c. Signed-off-by: Matthew Wilcox --- include/linux/radix-tree.h | 15 +++++++++++++++ mm/shmem.c | 12 +++--------- 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index db0ed595749b..dec2c6c77eea 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter) } /** + * radix_tree_iter_next - resume iterating when the chunk may be invalid + * @iter: iterator state + * + * If the iterator needs to release then reacquire a lock, the chunk may + * have been invalidated by an insertion or deletion. Call this function + * to continue the iteration from the next index. + */ +static inline __must_check +void **radix_tree_iter_next(struct radix_tree_iter *iter) +{ + iter->next_index = iter->index + 1; + return NULL; +} + +/** * radix_tree_chunk_size - get current chunk size * * @iter: pointer to radix tree iterator diff --git a/mm/shmem.c b/mm/shmem.c index 6ec14b70d82d..438ea8004c26 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, rcu_read_lock(); -restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { if (iter.index >= end) break; @@ -398,8 +397,7 @@ restart: if (need_resched()) { cond_resched_rcu(); - start = iter.index + 1; - goto restart; + slot = radix_tree_iter_next(&iter); } } @@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping) start = 0; rcu_read_lock(); -restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { page = radix_tree_deref_slot(slot); if (!page || radix_tree_exception(page)) { @@ -1967,8 +1964,7 @@ restart: if (need_resched()) { cond_resched_rcu(); - start = iter.index + 1; - goto restart; + slot = radix_tree_iter_next(&iter); } } rcu_read_unlock(); @@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping) start = 0; rcu_read_lock(); -restart: radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, start, SHMEM_TAG_PINNED) { @@ -2039,8 +2034,7 @@ restart: continue_resched: if (need_resched()) { cond_resched_rcu(); - start = iter.index + 1; - goto restart; + slot = radix_tree_iter_next(&iter); } } rcu_read_unlock(); -- 2.7.0.rc3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968263AbcA0VS4 (ORCPT ); Wed, 27 Jan 2016 16:18:56 -0500 Received: from mga04.intel.com ([192.55.52.120]:39448 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161202AbcA0VSB (ORCPT ); Wed, 27 Jan 2016 16:18:01 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,356,1449561600"; d="scan'208";a="902446007" From: Matthew Wilcox To: Andrew Morton , Hugh Dickins Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 4/5] mm: Use radix_tree_iter_retry() Date: Wed, 27 Jan 2016 16:17:51 -0500 Message-Id: <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> X-Mailer: git-send-email 2.7.0.rc3 In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matthew Wilcox Instead of a 'goto restart', we can now use radix_tree_iter_retry() to restart from our current position. This will make a difference when there are more ways to happen across an indirect pointer. And it eliminates some confusing gotos. Signed-off-by: Matthew Wilcox --- mm/filemap.c | 53 +++++++++++++++++------------------------------------ mm/shmem.c | 18 ++++++++++++------ 2 files changed, 29 insertions(+), 42 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 7705dac561ba..e0c4b905fe1c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1245,7 +1245,6 @@ unsigned find_get_entries(struct address_space *mapping, return 0; rcu_read_lock(); -restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { struct page *page; repeat: @@ -1253,8 +1252,10 @@ repeat: if (unlikely(!page)) continue; if (radix_tree_exception(page)) { - if (radix_tree_deref_retry(page)) - goto restart; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } /* * A shadow entry of a recently evicted page, a swap * entry from shmem/tmpfs or a DAX entry. Return it @@ -1307,7 +1308,6 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start, return 0; rcu_read_lock(); -restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { struct page *page; repeat: @@ -1317,13 +1317,8 @@ repeat: if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { - /* - * Transient condition which can only trigger - * when entry at index 0 moves out of or back - * to root: none yet gotten, safe to restart. - */ - WARN_ON(iter.index); - goto restart; + slot = radix_tree_iter_retry(&iter); + continue; } /* * A shadow entry of a recently evicted page, @@ -1374,7 +1369,6 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index, return 0; rcu_read_lock(); -restart: radix_tree_for_each_contig(slot, &mapping->page_tree, &iter, index) { struct page *page; repeat: @@ -1385,12 +1379,8 @@ repeat: if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { - /* - * Transient condition which can only trigger - * when entry at index 0 moves out of or back - * to root: none yet gotten, safe to restart. - */ - goto restart; + slot = radix_tree_iter_retry(&iter); + continue; } /* * A shadow entry of a recently evicted page, @@ -1450,7 +1440,6 @@ unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index, return 0; rcu_read_lock(); -restart: radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, *index, tag) { struct page *page; @@ -1461,12 +1450,8 @@ repeat: if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { - /* - * Transient condition which can only trigger - * when entry at index 0 moves out of or back - * to root: none yet gotten, safe to restart. - */ - goto restart; + slot = radix_tree_iter_retry(&iter); + continue; } /* * A shadow entry of a recently evicted page. @@ -1529,7 +1514,6 @@ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start, return 0; rcu_read_lock(); -restart: radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, start, tag) { struct page *page; @@ -1539,12 +1523,8 @@ repeat: continue; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { - /* - * Transient condition which can only trigger - * when entry at index 0 moves out of or back - * to root: none yet gotten, safe to restart. - */ - goto restart; + slot = radix_tree_iter_retry(&iter); + continue; } /* @@ -2151,10 +2131,11 @@ repeat: if (unlikely(!page)) goto next; if (radix_tree_exception(page)) { - if (radix_tree_deref_retry(page)) - break; - else - goto next; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } + goto next; } if (!page_cache_get_speculative(page)) diff --git a/mm/shmem.c b/mm/shmem.c index fa2ceb2d2655..6ec14b70d82d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -388,8 +388,10 @@ restart: * don't need to reset the counter, nor do we risk infinite * restarts. */ - if (radix_tree_deref_retry(page)) - goto restart; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } if (radix_tree_exceptional_entry(page)) swapped++; @@ -1952,8 +1954,10 @@ restart: radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { page = radix_tree_deref_slot(slot); if (!page || radix_tree_exception(page)) { - if (radix_tree_deref_retry(page)) - goto restart; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } } else if (page_count(page) - page_mapcount(page) > 1) { spin_lock_irq(&mapping->tree_lock); radix_tree_tag_set(&mapping->page_tree, iter.index, @@ -2007,8 +2011,10 @@ restart: page = radix_tree_deref_slot(slot); if (radix_tree_exception(page)) { - if (radix_tree_deref_retry(page)) - goto restart; + if (radix_tree_deref_retry(page)) { + slot = radix_tree_iter_retry(&iter); + continue; + } page = NULL; } -- 2.7.0.rc3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968221AbcA0VSy (ORCPT ); Wed, 27 Jan 2016 16:18:54 -0500 Received: from mga02.intel.com ([134.134.136.20]:22107 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161753AbcA0VSC (ORCPT ); Wed, 27 Jan 2016 16:18:02 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,356,1449561600"; d="scan'208";a="890505856" From: Matthew Wilcox To: Andrew Morton , Hugh Dickins Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 3/5] btrfs: Use radix_tree_iter_retry() Date: Wed, 27 Jan 2016 16:17:50 -0500 Message-Id: <1453929472-25566-4-git-send-email-matthew.r.wilcox@intel.com> X-Mailer: git-send-email 2.7.0.rc3 In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matthew Wilcox Even though this is a 'can't happen' situation, use the new radix_tree_iter_retry() pattern to eliminate a goto. --- fs/btrfs/tests/btrfs-tests.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index b1d920b30070..0da78c54317a 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -137,7 +137,6 @@ static void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info) void **slot; spin_lock(&fs_info->buffer_lock); -restart: radix_tree_for_each_slot(slot, &fs_info->buffer_radix, &iter, 0) { struct extent_buffer *eb; @@ -147,7 +146,7 @@ restart: /* Shouldn't happen but that kind of thinking creates CVE's */ if (radix_tree_exception(eb)) { if (radix_tree_deref_retry(eb)) - goto restart; + slot = radix_tree_iter_retry(iter); continue; } spin_unlock(&fs_info->buffer_lock); -- 2.7.0.rc3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968152AbcA0VSu (ORCPT ); Wed, 27 Jan 2016 16:18:50 -0500 Received: from mga04.intel.com ([192.55.52.120]:39451 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161755AbcA0VSC (ORCPT ); Wed, 27 Jan 2016 16:18:02 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,356,1449561600"; d="scan'208";a="902446006" From: Matthew Wilcox To: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org Subject: [PATCH 2/5] hwspinlock: Fix race between radix tree insertion and lookup Date: Wed, 27 Jan 2016 16:17:49 -0500 Message-Id: <1453929472-25566-3-git-send-email-matthew.r.wilcox@intel.com> X-Mailer: git-send-email 2.7.0.rc3 In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matthew Wilcox of_hwspin_lock_get_id() is protected by the RCU lock, which means that insertions can occur simultaneously with the lookup. If the radix tree transitions from a height of 0, we can see a slot with the indirect_ptr bit set, which will cause us to at least read random memory, and could cause other havoc. Fix this by using the newly introduced radix_tree_iter_retry(). Signed-off-by: Matthew Wilcox Cc: stable@vger.kernel.org --- drivers/hwspinlock/hwspinlock_core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/hwspinlock/hwspinlock_core.c b/drivers/hwspinlock/hwspinlock_core.c index 52f708bcf77f..d50c701b19d6 100644 --- a/drivers/hwspinlock/hwspinlock_core.c +++ b/drivers/hwspinlock/hwspinlock_core.c @@ -313,6 +313,10 @@ int of_hwspin_lock_get_id(struct device_node *np, int index) hwlock = radix_tree_deref_slot(slot); if (unlikely(!hwlock)) continue; + if (radix_tree_is_indirect_ptr(hwlock)) { + slot = radix_tree_iter_retry(&iter); + continue; + } if (hwlock->bank->dev->of_node == args.np) { ret = 0; -- 2.7.0.rc3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754736AbcA1HRp (ORCPT ); Thu, 28 Jan 2016 02:17:45 -0500 Received: from mail-lf0-f66.google.com ([209.85.215.66]:33265 "EHLO mail-lf0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752148AbcA1HRn (ORCPT ); Thu, 28 Jan 2016 02:17:43 -0500 MIME-Version: 1.0 In-Reply-To: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Date: Thu, 28 Jan 2016 10:17:41 +0300 Message-ID: Subject: Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox wrote: > From: Matthew Wilcox > > The first two patches here are bugfixes, and I would like to see them > make their way into stable ASAP since they can lead to data corruption > (very low probabilty). > > The last three patches do not qualify as bugfixes. They simply improve > the standard pattern used to do radix tree iterations by removing the > 'goto restart' part. Partially this is because this is an ugly & > confusing goto, and partially because with multi-order entries in the > tree, it'll be more likely that we'll see an indirect_ptr bit, and > it's more efficient to kep going from the point of the iteration we're > currently in than restart from the beginning each time. Ack whole set. I think we should go deeper in hide dereference/retry inside iterator. Something like radix_tree_for_each_data(data, slot, root, iter, start). I'll prepare patch for that. > > Matthew Wilcox (5): > radix-tree: Fix race in gang lookup > hwspinlock: Fix race between radix tree insertion and lookup > btrfs: Use radix_tree_iter_retry() > mm: Use radix_tree_iter_retry() > radix-tree,shmem: Introduce radix_tree_iter_next() > > drivers/hwspinlock/hwspinlock_core.c | 4 +++ > fs/btrfs/tests/btrfs-tests.c | 3 +- > include/linux/radix-tree.h | 31 +++++++++++++++++++++ > lib/radix-tree.c | 12 ++++++-- > mm/filemap.c | 53 ++++++++++++------------------------ > mm/shmem.c | 30 ++++++++++---------- > 6 files changed, 78 insertions(+), 55 deletions(-) > > -- > 2.7.0.rc3 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756412AbcA2OqF (ORCPT ); Fri, 29 Jan 2016 09:46:05 -0500 Received: from mx2.suse.de ([195.135.220.15]:36567 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756389AbcA2OqC (ORCPT ); Fri, 29 Jan 2016 09:46:02 -0500 Subject: Re: [PATCH 4/5] mm: Use radix_tree_iter_retry() To: Matthew Wilcox , Andrew Morton , Hugh Dickins References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org From: Vlastimil Babka Message-ID: <56AB7B27.3090805@suse.cz> Date: Fri, 29 Jan 2016 15:45:59 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/27/2016 10:17 PM, Matthew Wilcox wrote: > From: Matthew Wilcox > > Instead of a 'goto restart', we can now use radix_tree_iter_retry() > to restart from our current position. This will make a difference > when there are more ways to happen across an indirect pointer. And it > eliminates some confusing gotos. > > Signed-off-by: Matthew Wilcox [...] > diff --git a/mm/shmem.c b/mm/shmem.c > index fa2ceb2d2655..6ec14b70d82d 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -388,8 +388,10 @@ restart: > * don't need to reset the counter, nor do we risk infinite > * restarts. > */ > - if (radix_tree_deref_retry(page)) > - goto restart; > + if (radix_tree_deref_retry(page)) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > > if (radix_tree_exceptional_entry(page)) > swapped++; This should be applied on top. There are no restarts anymore. ----8<---- >>From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 29 Jan 2016 15:41:31 +0100 Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix Remove now-obsolete-and-misleading comment. Signed-off-by: Vlastimil Babka --- mm/shmem.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 8f89abd4eaee..4d758938340c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, page = radix_tree_deref_slot(slot); - /* - * This should only be possible to happen at index 0, so we - * don't need to reset the counter, nor do we risk infinite - * restarts. - */ if (radix_tree_deref_retry(page)) { slot = radix_tree_iter_retry(&iter); continue; -- 2.7.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756329AbcA2Oup (ORCPT ); Fri, 29 Jan 2016 09:50:45 -0500 Received: from mga01.intel.com ([192.55.52.88]:42067 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756092AbcA2Oun (ORCPT ); Fri, 29 Jan 2016 09:50:43 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,364,1449561600"; d="scan'208";a="891892329" Date: Fri, 29 Jan 2016 09:50:40 -0500 From: Matthew Wilcox To: Vlastimil Babka Cc: Matthew Wilcox , Andrew Morton , Hugh Dickins , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 4/5] mm: Use radix_tree_iter_retry() Message-ID: <20160129145040.GW2948@linux.intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> <56AB7B27.3090805@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56AB7B27.3090805@suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 29, 2016 at 03:45:59PM +0100, Vlastimil Babka wrote: > This should be applied on top. There are no restarts anymore. Quite right. Sorry I missed the comment. Acked-by: Matthwe Wilcox > ----8<---- > >From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka > Date: Fri, 29 Jan 2016 15:41:31 +0100 > Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix > > Remove now-obsolete-and-misleading comment. > > Signed-off-by: Vlastimil Babka > --- > mm/shmem.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index 8f89abd4eaee..4d758938340c 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, > > page = radix_tree_deref_slot(slot); > > - /* > - * This should only be possible to happen at index 0, so we > - * don't need to reset the counter, nor do we risk infinite > - * restarts. > - */ > if (radix_tree_deref_retry(page)) { > slot = radix_tree_iter_retry(&iter); > continue; > -- > 2.7.0 > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932990AbcBAOew (ORCPT ); Mon, 1 Feb 2016 09:34:52 -0500 Received: from mx2.suse.de ([195.135.220.15]:40976 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932418AbcBAOet (ORCPT ); Mon, 1 Feb 2016 09:34:49 -0500 Date: Mon, 1 Feb 2016 15:34:32 +0100 From: David Sterba To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 3/5] btrfs: Use radix_tree_iter_retry() Message-ID: <20160201143432.GH31992@twin.jikos.cz> Reply-To: dsterba@suse.cz Mail-Followup-To: dsterba@suse.cz, Matthew Wilcox , Andrew Morton , Hugh Dickins , Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-4-git-send-email-matthew.r.wilcox@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1453929472-25566-4-git-send-email-matthew.r.wilcox@intel.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 27, 2016 at 04:17:50PM -0500, Matthew Wilcox wrote: > From: Matthew Wilcox > > Even though this is a 'can't happen' situation, use the new > radix_tree_iter_retry() pattern to eliminate a goto. Andrew's tree contains a fixup for a build failure > @@ -147,7 +146,7 @@ restart: > /* Shouldn't happen but that kind of thinking creates CVE's */ > if (radix_tree_exception(eb)) { > if (radix_tree_deref_retry(eb)) > - goto restart; > + slot = radix_tree_iter_retry(iter); slot = radix_tree_iter_retry(&iter); http://ozlabs.org/~akpm/mmots/broken-out/btrfs-use-radix_tree_iter_retry-fix.patch From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756255AbcBCG1I (ORCPT ); Wed, 3 Feb 2016 01:27:08 -0500 Received: from mail-lb0-f195.google.com ([209.85.217.195]:34421 "EHLO mail-lb0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751274AbcBCG1G (ORCPT ); Wed, 3 Feb 2016 01:27:06 -0500 MIME-Version: 1.0 In-Reply-To: References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> Date: Wed, 3 Feb 2016 09:27:03 +0300 Message-ID: Subject: Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 28, 2016 at 10:17 AM, Konstantin Khlebnikov wrote: > On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox > wrote: >> From: Matthew Wilcox >> >> The first two patches here are bugfixes, and I would like to see them >> make their way into stable ASAP since they can lead to data corruption >> (very low probabilty). >> >> The last three patches do not qualify as bugfixes. They simply improve >> the standard pattern used to do radix tree iterations by removing the >> 'goto restart' part. Partially this is because this is an ugly & >> confusing goto, and partially because with multi-order entries in the >> tree, it'll be more likely that we'll see an indirect_ptr bit, and >> it's more efficient to kep going from the point of the iteration we're >> currently in than restart from the beginning each time. > > Ack whole set. > > I think we should go deeper in hide dereference/retry inside iterator. > Something like radix_tree_for_each_data(data, slot, root, iter, start). > I'll prepare patch for that. After second thought: there'ra not so many users for new sugar. This scheme with radix_tree_deref_retry - radix_tree_iter_retry complicated but fine. > >> >> Matthew Wilcox (5): >> radix-tree: Fix race in gang lookup >> hwspinlock: Fix race between radix tree insertion and lookup >> btrfs: Use radix_tree_iter_retry() >> mm: Use radix_tree_iter_retry() >> radix-tree,shmem: Introduce radix_tree_iter_next() >> >> drivers/hwspinlock/hwspinlock_core.c | 4 +++ >> fs/btrfs/tests/btrfs-tests.c | 3 +- >> include/linux/radix-tree.h | 31 +++++++++++++++++++++ >> lib/radix-tree.c | 12 ++++++-- >> mm/filemap.c | 53 ++++++++++++------------------------ >> mm/shmem.c | 30 ++++++++++---------- >> 6 files changed, 78 insertions(+), 55 deletions(-) >> >> -- >> 2.7.0.rc3 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756192AbcBCVhX (ORCPT ); Wed, 3 Feb 2016 16:37:23 -0500 Received: from mail-lf0-f53.google.com ([209.85.215.53]:36609 "EHLO mail-lf0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756160AbcBCVhT (ORCPT ); Wed, 3 Feb 2016 16:37:19 -0500 MIME-Version: 1.0 In-Reply-To: <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-2-git-send-email-matthew.r.wilcox@intel.com> Date: Thu, 4 Feb 2016 00:37:17 +0300 Message-ID: Subject: Re: [PATCH 1/5] radix-tree: Fix race in gang lookup From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Ohad Ben-Cohen , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" , Stable Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox wrote: > From: Matthew Wilcox > > If the indirect_ptr bit is set on a slot, that indicates we need to > redo the lookup. Introduce a new function radix_tree_iter_retry() > which forces the loop to retry the lookup by setting 'slot' to NULL and > turning the iterator back to point at the problematic entry. > > This is a pretty rare problem to hit at the moment; the lookup has to > race with a grow of the radix tree from a height of 0. The consequences > of hitting this race are that gang lookup could return a pointer to a > radix_tree_node instead of a pointer to whatever the user had inserted > in the tree. > > Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") > Signed-off-by: Matthew Wilcox > Cc: stable@vger.kernel.org > --- > include/linux/radix-tree.h | 16 ++++++++++++++++ > lib/radix-tree.c | 12 ++++++++++-- > 2 files changed, 26 insertions(+), 2 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index f9a3da5bf892..db0ed595749b 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root, > struct radix_tree_iter *iter, unsigned flags); > > /** > + * radix_tree_iter_retry - retry this chunk of the iteration > + * @iter: iterator state > + * > + * If we iterate over a tree protected only by the RCU lock, a race > + * against deletion or creation may result in seeing a slot for which > + * radix_tree_deref_retry() returns true. If so, call this function > + * and continue the iteration. > + */ > +static inline __must_check > +void **radix_tree_iter_retry(struct radix_tree_iter *iter) > +{ > + iter->next_index = iter->index; > + return NULL; > +} > + > +/** > * radix_tree_chunk_size - get current chunk size > * > * @iter: pointer to radix tree iterator > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index a25f635dcc56..65422ac17114 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results, > return 0; > > radix_tree_for_each_slot(slot, root, &iter, first_index) { > - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); > + results[ret] = rcu_dereference_raw(*slot); > if (!results[ret]) > continue; > + if (radix_tree_is_indirect_ptr(results[ret])) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > if (++ret == max_items) > break; > } Looks like your fix doesn't work. After radix_tree_iter_retry: radix_tree_for_each_slot will call radix_tree_next_slot which isn't safe to call for NULL slot. #define radix_tree_for_each_slot(slot, root, iter, start) \ for (slot = radix_tree_iter_init(iter, start) ; \ slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \ slot = radix_tree_next_slot(slot, iter, 0)) tagged iterator works becase restart happens only at root - tags filled with single bit. quick (untested) fix for that --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -457,9 +457,9 @@ radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags) return slot + offset + 1; } } else { - unsigned size = radix_tree_chunk_size(iter) - 1; + int size = radix_tree_chunk_size(iter) - 1; - while (size--) { + while (size-- > 0) { slot++; iter->index++; if (likely(*slot)) > @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, > return 0; > > radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) { > - results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot)); > + results[ret] = rcu_dereference_raw(*slot); > if (!results[ret]) > continue; > + if (radix_tree_is_indirect_ptr(results[ret])) { > + slot = radix_tree_iter_retry(&iter); > + continue; > + } > if (++ret == max_items) > break; > } > -- > 2.7.0.rc3 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755618AbcBDIui (ORCPT ); Thu, 4 Feb 2016 03:50:38 -0500 Received: from mail-lf0-f44.google.com ([209.85.215.44]:33463 "EHLO mail-lf0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751127AbcBDIuh (ORCPT ); Thu, 4 Feb 2016 03:50:37 -0500 MIME-Version: 1.0 In-Reply-To: <1453929472-25566-6-git-send-email-matthew.r.wilcox@intel.com> References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-6-git-send-email-matthew.r.wilcox@intel.com> Date: Thu, 4 Feb 2016 11:50:35 +0300 Message-ID: Subject: Re: [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next() From: Konstantin Khlebnikov To: Matthew Wilcox Cc: Andrew Morton , Hugh Dickins , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , "linux-mm@kvack.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox wrote: > From: Matthew Wilcox > > shmem likes to occasionally drop the lock, schedule, then reacqire > the lock and continue with the iteration from the last place it > left off. This is currently done with a pretty ugly goto. Introduce > radix_tree_iter_next() and use it throughout shmem.c. > > Signed-off-by: Matthew Wilcox > --- > include/linux/radix-tree.h | 15 +++++++++++++++ > mm/shmem.c | 12 +++--------- > 2 files changed, 18 insertions(+), 9 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index db0ed595749b..dec2c6c77eea 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter) > } > > /** > + * radix_tree_iter_next - resume iterating when the chunk may be invalid > + * @iter: iterator state > + * > + * If the iterator needs to release then reacquire a lock, the chunk may > + * have been invalidated by an insertion or deletion. Call this function > + * to continue the iteration from the next index. > + */ > +static inline __must_check > +void **radix_tree_iter_next(struct radix_tree_iter *iter) > +{ > + iter->next_index = iter->index + 1; > + return NULL; > +} > + > +/** This works for normal iterator but not for tagged. It must also reset iter->tags to zero. > * radix_tree_chunk_size - get current chunk size > * > * @iter: pointer to radix tree iterator > diff --git a/mm/shmem.c b/mm/shmem.c > index 6ec14b70d82d..438ea8004c26 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, > > rcu_read_lock(); > > -restart: > radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { > if (iter.index >= end) > break; > @@ -398,8 +397,7 @@ restart: > > if (need_resched()) { > cond_resched_rcu(); > - start = iter.index + 1; > - goto restart; > + slot = radix_tree_iter_next(&iter); > } > } > > @@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping) > start = 0; > rcu_read_lock(); > > -restart: > radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { > page = radix_tree_deref_slot(slot); > if (!page || radix_tree_exception(page)) { > @@ -1967,8 +1964,7 @@ restart: > > if (need_resched()) { > cond_resched_rcu(); > - start = iter.index + 1; > - goto restart; > + slot = radix_tree_iter_next(&iter); > } > } > rcu_read_unlock(); > @@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping) > > start = 0; > rcu_read_lock(); > -restart: > radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, > start, SHMEM_TAG_PINNED) { > > @@ -2039,8 +2034,7 @@ restart: > continue_resched: > if (need_resched()) { > cond_resched_rcu(); > - start = iter.index + 1; > - goto restart; > + slot = radix_tree_iter_next(&iter); > } > } > rcu_read_unlock(); > -- > 2.7.0.rc3 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1428413AbcBSSCV (ORCPT ); Fri, 19 Feb 2016 13:02:21 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:30134 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752332AbcBSSCS (ORCPT ); Fri, 19 Feb 2016 13:02:18 -0500 Subject: Re: [PATCH 4/5] mm: Use radix_tree_iter_retry() To: Matthew Wilcox , Andrew Morton , Hugh Dickins References: <1453929472-25566-1-git-send-email-matthew.r.wilcox@intel.com> <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Cc: Matthew Wilcox , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org From: Sasha Levin Message-ID: <56C758A0.4060600@oracle.com> Date: Fri, 19 Feb 2016 13:02:08 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <1453929472-25566-5-git-send-email-matthew.r.wilcox@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: userv0022.oracle.com [156.151.31.74] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/27/2016 04:17 PM, Matthew Wilcox wrote: > From: Matthew Wilcox > > Instead of a 'goto restart', we can now use radix_tree_iter_retry() > to restart from our current position. This will make a difference > when there are more ways to happen across an indirect pointer. And it > eliminates some confusing gotos. Hey Matthew, I'm seeing the following NULL ptr deref while fuzzing: [ 3380.120501] general protection fault: 0000 [#1] SMP KASAN [ 3380.120529] Modules linked in: [ 3380.120555] CPU: 2 PID: 23271 Comm: syz-executor Not tainted 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978 [ 3380.120569] task: ffff8800a5181000 ti: ffff8801a63b8000 task.ti: ffff8801a63b8000 [ 3380.120681] RIP: shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100) [ 3380.120692] RSP: 0018:ffff8801a63bfd58 EFLAGS: 00010202 [ 3380.120703] RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 0000000000940000 [ 3380.120714] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8800a5181b3c [ 3380.120725] RBP: ffff8801a63bfe58 R08: ffff8800a5181b40 R09: 0000000000000001 [ 3380.120736] R10: fffff44e6f425fff R11: ffffffffbdb0a420 R12: 0000000000000008 [ 3380.120745] R13: 0000000000000001 R14: 0000000000000001 R15: ffffea0002ad1660 [ 3380.120759] FS: 00007fbc71e9c700(0000) GS:ffff8801d3c00000(0000) knlGS:0000000000000000 [ 3380.120769] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3380.120780] CR2: 0000000020010ff7 CR3: 00000001a0728000 CR4: 00000000000406e0 [ 3380.120794] Stack: [ 3380.120815] ffffffffa2738239 00000008a63bfdc8 ffff8800a499e740 1ffff10034c77fba [ 3380.120834] ffff8801ac446da0 ffff8800a499e8f0 0000000000000000 1ffff10034c77001 [ 3380.120852] ffff8801a63b8000 ffff8801a63b8008 ffff8801ac446f90 ffff8801ac446f98 [ 3380.120856] Call Trace: [ 3380.120929] shmem_fcntl (mm/shmem.c:2135) [ 3380.120963] SyS_fcntl (fs/fcntl.c:336 fs/fcntl.c:372 fs/fcntl.c:357) [ 3380.121112] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200) [ 3380.122294] Code: c7 45 a0 00 00 00 00 e9 86 02 00 00 e8 cf a8 ee ff 4d 85 e4 0f 84 b2 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 d4 08 00 00 49 8b 1c 24 e8 12 34 de ff 85 c0 All code ======== 0: c7 45 a0 00 00 00 00 movl $0x0,-0x60(%rbp) 7: e9 86 02 00 00 jmpq 0x292 c: e8 cf a8 ee ff callq 0xffffffffffeea8e0 11: 4d 85 e4 test %r12,%r12 14: 0f 84 b2 07 00 00 je 0x7cc 1a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 21: fc ff df 24: 4c 89 e2 mov %r12,%rdx 27: 48 c1 ea 03 shr $0x3,%rdx 2b:* 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction 2f: 0f 85 d4 08 00 00 jne 0x909 35: 49 8b 1c 24 mov (%r12),%rbx 39: e8 12 34 de ff callq 0xffffffffffde3450 3e: 85 c0 test %eax,%eax ... Code starting with the faulting instruction =========================================== 0: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) 4: 0f 85 d4 08 00 00 jne 0x8de a: 49 8b 1c 24 mov (%r12),%rbx e: e8 12 34 de ff callq 0xffffffffffde3425 13: 85 c0 test %eax,%eax ... [ 3380.122312] RIP shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100) [ 3380.122317] RSP Thanks, Sasha