* [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration @ 2010-12-20 15:23 Mel Gorman 2010-12-20 17:01 ` Mel Gorman 2010-12-22 8:56 ` KAMEZAWA Hiroyuki 0 siblings, 2 replies; 10+ messages in thread From: Mel Gorman @ 2010-12-20 15:23 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. The point of the RCU protection there is part of getting a stable reference to anon_vma and is only held for anon pages as file pages are locked which is sufficient protection against freeing. However, while a file page's mapping is being migrated, the radix tree is double checked to ensure it is the expected page. This uses radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held triggering the following warning under CONFIG_PROVE_RCU. [ 173.674290] =================================================== [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] [ 173.676016] --------------------------------------------------- [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! [ 173.676016] [ 173.676016] other info that might help us debug this: [ 173.676016] [ 173.676016] [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 [ 173.676016] 1 lock held by hugeadm/2899: [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab [ 173.676016] [ 173.676016] stack backtrace: [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild [ 173.676016] Call Trace: [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa This patch introduces radix_tree_deref_slot_protected() which calls rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock that is protecting this dereference. Holding the tree lock protects against parallel updaters of the radix tree meaning that rcu_dereference_protected is allowable. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/linux/radix-tree.h | 17 +++++++++++++++++ mm/migrate.c | 4 ++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index ab2baa5..a1f1672 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) } /** + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held + * @pslot: pointer to slot, returned by radix_tree_lookup_slot + * Returns: item that was stored in that slot with any direct pointer flag + * removed. + * + * Similar to radix_tree_deref_slot but only used during migration when a pages + * mapping is being moved. The caller does not hold the RCU read lock but it + * must hold the tree lock to prevent parallel updates. + */ +static inline void *radix_tree_deref_slot_protected(void **pslot, + spinlock_t *treelock) +{ + BUG_ON(rcu_read_lock_held()); + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); +} + +/** * radix_tree_deref_retry - check radix_tree_deref_slot * @arg: pointer returned by radix_tree_deref_slot * Returns: 0 if retry is not required, otherwise retry is required diff --git a/mm/migrate.c b/mm/migrate.c index fe5a3c6..7d4686a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -244,7 +244,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, expected_count = 2 + page_has_private(page); if (page_count(page) != expected_count || - (struct page *)radix_tree_deref_slot(pslot) != page) { + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } @@ -316,7 +316,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, expected_count = 2 + page_has_private(page); if (page_count(page) != expected_count || - (struct page *)radix_tree_deref_slot(pslot) != page) { + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2010-12-20 15:23 [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration Mel Gorman @ 2010-12-20 17:01 ` Mel Gorman 2010-12-20 23:48 ` Minchan Kim ` (2 more replies) 2010-12-22 8:56 ` KAMEZAWA Hiroyuki 1 sibling, 3 replies; 10+ messages in thread From: Mel Gorman @ 2010-12-20 17:01 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky On Mon, Dec 20, 2010 at 03:23:36PM +0000, Mel Gorman wrote: > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > The point of the RCU protection there is part of getting a stable reference > to anon_vma and is only held for anon pages as file pages are locked > which is sufficient protection against freeing. > > However, while a file page's mapping is being migrated, the radix > tree is double checked to ensure it is the expected page. This uses > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > triggering the following warning under CONFIG_PROVE_RCU. > > [ 173.674290] =================================================== > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > [ 173.676016] --------------------------------------------------- > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > [ 173.676016] > [ 173.676016] other info that might help us debug this: > [ 173.676016] > [ 173.676016] > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > [ 173.676016] 1 lock held by hugeadm/2899: > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > [ 173.676016] > [ 173.676016] stack backtrace: > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > [ 173.676016] Call Trace: > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > > This patch introduces radix_tree_deref_slot_protected() which calls > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > that is protecting this dereference. Holding the tree lock protects against > parallel updaters of the radix tree meaning that rcu_dereference_protected > is allowable. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > include/linux/radix-tree.h | 17 +++++++++++++++++ > mm/migrate.c | 4 ++-- > 2 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index ab2baa5..a1f1672 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) > } > > /** > + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > + * Returns: item that was stored in that slot with any direct pointer flag > + * removed. > + * > + * Similar to radix_tree_deref_slot but only used during migration when a pages > + * mapping is being moved. The caller does not hold the RCU read lock but it > + * must hold the tree lock to prevent parallel updates. > + */ > +static inline void *radix_tree_deref_slot_protected(void **pslot, > + spinlock_t *treelock) > +{ > + BUG_ON(rcu_read_lock_held()); This was a bad idea. After some extended testing, it was obvious that this function can be called for swapcache pages with the RCU lock held. Paul, is it still permissible to use rcu_dereference_protected() or must the RCU read lock not be held? > + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); > +} > + > +/** > * radix_tree_deref_retry - check radix_tree_deref_slot > * @arg: pointer returned by radix_tree_deref_slot > * Returns: 0 if retry is not required, otherwise retry is required > diff --git a/mm/migrate.c b/mm/migrate.c > index fe5a3c6..7d4686a 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -244,7 +244,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, > > expected_count = 2 + page_has_private(page); > if (page_count(page) != expected_count || > - (struct page *)radix_tree_deref_slot(pslot) != page) { > + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } > @@ -316,7 +316,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, > > expected_count = 2 + page_has_private(page); > if (page_count(page) != expected_count || > - (struct page *)radix_tree_deref_slot(pslot) != page) { > + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2010-12-20 17:01 ` Mel Gorman @ 2010-12-20 23:48 ` Minchan Kim 2010-12-21 10:49 ` Mel Gorman 2010-12-21 7:16 ` Milton Miller 2011-01-12 23:21 ` Paul E. McKenney 2 siblings, 1 reply; 10+ messages in thread From: Minchan Kim @ 2010-12-20 23:48 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky On Tue, Dec 21, 2010 at 2:01 AM, Mel Gorman <mel@csn.ul.ie> wrote: > On Mon, Dec 20, 2010 at 03:23:36PM +0000, Mel Gorman wrote: >> migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous >> pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. >> The point of the RCU protection there is part of getting a stable reference >> to anon_vma and is only held for anon pages as file pages are locked >> which is sufficient protection against freeing. >> >> However, while a file page's mapping is being migrated, the radix >> tree is double checked to ensure it is the expected page. This uses >> radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held >> triggering the following warning under CONFIG_PROVE_RCU. >> >> [ 173.674290] =================================================== >> [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] >> [ 173.676016] --------------------------------------------------- >> [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! >> [ 173.676016] >> [ 173.676016] other info that might help us debug this: >> [ 173.676016] >> [ 173.676016] >> [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 >> [ 173.676016] 1 lock held by hugeadm/2899: >> [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab >> [ 173.676016] >> [ 173.676016] stack backtrace: >> [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild >> [ 173.676016] Call Trace: >> [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b >> [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 >> [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab >> [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 >> [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 >> [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 >> [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae >> [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa >> >> This patch introduces radix_tree_deref_slot_protected() which calls >> rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock >> that is protecting this dereference. Holding the tree lock protects against >> parallel updaters of the radix tree meaning that rcu_dereference_protected >> is allowable. >> >> Signed-off-by: Mel Gorman <mel@csn.ul.ie> >> --- >> include/linux/radix-tree.h | 17 +++++++++++++++++ >> mm/migrate.c | 4 ++-- >> 2 files changed, 19 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h >> index ab2baa5..a1f1672 100644 >> --- a/include/linux/radix-tree.h >> +++ b/include/linux/radix-tree.h >> @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) >> } >> >> /** >> + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held >> + * @pslot: pointer to slot, returned by radix_tree_lookup_slot >> + * Returns: item that was stored in that slot with any direct pointer flag >> + * removed. >> + * >> + * Similar to radix_tree_deref_slot but only used during migration when a pages >> + * mapping is being moved. The caller does not hold the RCU read lock but it >> + * must hold the tree lock to prevent parallel updates. >> + */ >> +static inline void *radix_tree_deref_slot_protected(void **pslot, >> + spinlock_t *treelock) >> +{ >> + BUG_ON(rcu_read_lock_held()); Hmm.. Why did you add the check? If rcu_read_lock were already held, we wouldn't need this new API. > > This was a bad idea. After some extended testing, it was obvious that > this function can be called for swapcache pages with the RCU lock held. > Paul, is it still permissible to use rcu_dereference_protected() or must I guess has no problem. > the RCU read lock not be held? > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2010-12-20 23:48 ` Minchan Kim @ 2010-12-21 10:49 ` Mel Gorman 0 siblings, 0 replies; 10+ messages in thread From: Mel Gorman @ 2010-12-21 10:49 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky On Tue, Dec 21, 2010 at 08:48:50AM +0900, Minchan Kim wrote: > On Tue, Dec 21, 2010 at 2:01 AM, Mel Gorman <mel@csn.ul.ie> wrote: > > On Mon, Dec 20, 2010 at 03:23:36PM +0000, Mel Gorman wrote: > >> migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > >> pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > >> The point of the RCU protection there is part of getting a stable reference > >> to anon_vma and is only held for anon pages as file pages are locked > >> which is sufficient protection against freeing. > >> > >> However, while a file page's mapping is being migrated, the radix > >> tree is double checked to ensure it is the expected page. This uses > >> radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > >> triggering the following warning under CONFIG_PROVE_RCU. > >> > >> [ 173.674290] =================================================== > >> [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > >> [ 173.676016] --------------------------------------------------- > >> [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > >> [ 173.676016] > >> [ 173.676016] other info that might help us debug this: > >> [ 173.676016] > >> [ 173.676016] > >> [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > >> [ 173.676016] 1 lock held by hugeadm/2899: > >> [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > >> [ 173.676016] > >> [ 173.676016] stack backtrace: > >> [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > >> [ 173.676016] Call Trace: > >> [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > >> [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > >> [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > >> [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > >> [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > >> [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > >> [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > >> [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > >> > >> This patch introduces radix_tree_deref_slot_protected() which calls > >> rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > >> that is protecting this dereference. Holding the tree lock protects against > >> parallel updaters of the radix tree meaning that rcu_dereference_protected > >> is allowable. > >> > >> Signed-off-by: Mel Gorman <mel@csn.ul.ie> > >> --- > >> include/linux/radix-tree.h | 17 +++++++++++++++++ > >> mm/migrate.c | 4 ++-- > >> 2 files changed, 19 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > >> index ab2baa5..a1f1672 100644 > >> --- a/include/linux/radix-tree.h > >> +++ b/include/linux/radix-tree.h > >> @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) > >> } > >> > >> /** > >> + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held > >> + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > >> + * Returns: item that was stored in that slot with any direct pointer flag > >> + * removed. > >> + * > >> + * Similar to radix_tree_deref_slot but only used during migration when a pages > >> + * mapping is being moved. The caller does not hold the RCU read lock but it > >> + * must hold the tree lock to prevent parallel updates. > >> + */ > >> +static inline void *radix_tree_deref_slot_protected(void **pslot, > >> + spinlock_t *treelock) > >> +{ > >> + BUG_ON(rcu_read_lock_held()); > > Hmm.. Why did you add the check? > If rcu_read_lock were already held, we wouldn't need this new API. > Because our earlier discussions assumed that RCU read lock was not held in this path. The check was added to ensure that assumption was correct, it wasn't. > > > > This was a bad idea. After some extended testing, it was obvious that > > this function can be called for swapcache pages with the RCU lock held. > > Paul, is it still permissible to use rcu_dereference_protected() or must > > I guess has no problem. > > > the RCU read lock not be held? > > > > > > -- > Kind regards, > Minchan Kim > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2010-12-20 17:01 ` Mel Gorman 2010-12-20 23:48 ` Minchan Kim @ 2010-12-21 7:16 ` Milton Miller 2010-12-21 12:26 ` Mel Gorman 2011-01-12 23:21 ` Paul E. McKenney 2 siblings, 1 reply; 10+ messages in thread From: Milton Miller @ 2010-12-21 7:16 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky, Paul E. McKenney, Mel Gorman, Dipankar Sarma [ Add Paul back to the CC list, and also Dipankar. Hopefully I killed the mime encodings correctly ] On Tue, 21 Dec 2010 at 08:48:50 +0900, Minchan Kim wrote: > On Tue, Dec 21, 2010 at 2:01 AM, Mel Gorman <mel@csn.ul.ie> wrote: > > On Mon, Dec 20, 2010 at 03:23:36PM +0000, Mel Gorman wrote: > > > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > > > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > > > The point of the RCU protection there is part of getting a stable reference > > > to anon_vma and is only held for anon pages as file pages are locked > > > which is sufficient protection against freeing. > > > > > > However, while a file page's mapping is being migrated, the radix > > > tree is double checked to ensure it is the expected page. This uses > > > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > > > triggering the following warning under CONFIG_PROVE_RCU. > > > > > > [ 173.674290] =================================================== > > > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > > > [ 173.676016] --------------------------------------------------- > > > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > [ 173.676016] > > > [ 173.676016] other info that might help us debug this: > > > [ 173.676016] > > > [ 173.676016] > > > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > > > [ 173.676016] 1 lock held by hugeadm/2899: > > > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.},at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > > > [ 173.676016] > > > [ 173.676016] stack backtrace: > > > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > > > [ 173.676016] Call Trace: > > > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > > > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > > > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > > > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > > > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > > > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > > > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > > > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > > > > > > This patch introduces radix_tree_deref_slot_protected() which calls > > > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > > > that is protecting this dereference. Holding the tree lock protects against > > > parallel updaters of the radix tree meaning that rcu_dereference_protected > > > is allowable. > > > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > > --- > > > include/linux/radix-tree.h | 17 +++++++++++++++++ > > > mm/migrate.c | 4 ++-- > > > 2 files changed, 19 insertions(+), 2 deletions(-) > > > > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > > index ab2baa5..a1f1672 100644 > > > --- a/include/linux/radix-tree.h > > > +++ b/include/linux/radix-tree.h > > > @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) > > > } > > > > > > /** > > > + * radix_tree_deref_slot_protected - dereference a slot without RCUlock but with tree lock held > > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > > + * Returns: item that was stored in that slot with any direct pointer flag > > > + * removed. > > > + * > > > + * Similar to radix_tree_deref_slot but only used during migration when a pages > > > + * mapping is being moved. The caller does not hold the RCU read lock but it > > > + * must hold the tree lock to prevent parallel updates. > > > + */ > > > +static inline void *radix_tree_deref_slot_protected(void **pslot, > > > + spinlock_t *treelock) > > > +{ > > > + BUG_ON(rcu_read_lock_held()); > > Hmm.. Why did you add the check? > If rcu_read_lock were already held, we wouldn't need this new API. I'm not Paul but I can read the code in include/linux/rcuupdate.h. Holding rcu_read_lock_held isn't a problem, but using protected with just the read lock is. > > > > > This was a bad idea. After some extended testing, it was obvious that > > this function can be called for swapcache pages with the RCU lock held. > > Paul, is it still permissible to use rcu_dereference_protected() or must > > I guess has no problem. No this is a problem .. because __rcu_dereference_protected doesn't include the smp_read_barrier_depends() that is needed in the rcu only reference path. Either we need two helpers, one for when the tree is write locked and one when the tree is only rcu read locked, or we use __rcu_dereference_check with check = 1. > > > the RCU read lock not be held? > > > > Minchan Kim milton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2010-12-21 7:16 ` Milton Miller @ 2010-12-21 12:26 ` Mel Gorman 2010-12-22 2:10 ` Minchan Kim 0 siblings, 1 reply; 10+ messages in thread From: Mel Gorman @ 2010-12-21 12:26 UTC (permalink / raw) To: Milton Miller Cc: Minchan Kim, Andrew Morton, gerald.schaefer, KAMEZAWA Hiroyuki, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky, Paul E. McKenney, Dipankar Sarma On Tue, Dec 21, 2010 at 01:16:23AM -0600, Milton Miller wrote: > > [ Add Paul back to the CC list, and also Dipankar. > Hopefully I killed the mime encodings correctly ] > > On Tue, 21 Dec 2010 at 08:48:50 +0900, Minchan Kim wrote: > > On Tue, Dec 21, 2010 at 2:01 AM, Mel Gorman <mel@csn.ul.ie> wrote: > > > On Mon, Dec 20, 2010 at 03:23:36PM +0000, Mel Gorman wrote: > > > > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > > > > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > > > > The point of the RCU protection there is part of getting a stable reference > > > > to anon_vma and is only held for anon pages as file pages are locked > > > > which is sufficient protection against freeing. > > > > > > > > However, while a file page's mapping is being migrated, the radix > > > > tree is double checked to ensure it is the expected page. This uses > > > > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > > > > triggering the following warning under CONFIG_PROVE_RCU. > > > > > > > > [ 173.674290] =================================================== > > > > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > > > > [ 173.676016] --------------------------------------------------- > > > > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > [ 173.676016] > > > > [ 173.676016] other info that might help us debug this: > > > > [ 173.676016] > > > > [ 173.676016] > > > > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > > > > [ 173.676016] 1 lock held by hugeadm/2899: > > > > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.},at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > > > > [ 173.676016] > > > > [ 173.676016] stack backtrace: > > > > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > > > > [ 173.676016] Call Trace: > > > > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > > > > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > > > > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > > > > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > > > > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > > > > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > > > > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > > > > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > > > > > > > > This patch introduces radix_tree_deref_slot_protected() which calls > > > > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > > > > that is protecting this dereference. Holding the tree lock protects against > > > > parallel updaters of the radix tree meaning that rcu_dereference_protected > > > > is allowable. > > > > > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > > > --- > > > > include/linux/radix-tree.h | 17 +++++++++++++++++ > > > > mm/migrate.c | 4 ++-- > > > > 2 files changed, 19 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > > > index ab2baa5..a1f1672 100644 > > > > --- a/include/linux/radix-tree.h > > > > +++ b/include/linux/radix-tree.h > > > > @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) > > > > } > > > > > > > > /** > > > > + * radix_tree_deref_slot_protected - dereference a slot without RCUlock but with tree lock held > > > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > > > + * Returns: item that was stored in that slot with any direct pointer flag > > > > + * removed. > > > > + * > > > > + * Similar to radix_tree_deref_slot but only used during migration when a pages > > > > + * mapping is being moved. The caller does not hold the RCU read lock but it > > > > + * must hold the tree lock to prevent parallel updates. > > > > + */ > > > > +static inline void *radix_tree_deref_slot_protected(void **pslot, > > > > + spinlock_t *treelock) > > > > +{ > > > > + BUG_ON(rcu_read_lock_held()); > > > > Hmm.. Why did you add the check? > > If rcu_read_lock were already held, we wouldn't need this new API. > > I'm not Paul but I can read the code in include/linux/rcuupdate.h. > > Holding rcu_read_lock_held isn't a problem, but using protected with > just the read lock is. > Bah, this was extremely careless of me as it's even written in teh documentation. In this specific case, it's simply allowed to ignore whether the RCU read lock is held or not and the BUG_ON check was unnecessary. The tree lock protects against parallel updaters which is what we really care about for using _protected. In a later cycle, I should look at reducing the RCU read lock hold time in migration. The main thing it's protecting is getting a stable reference to anon_vma and it's held longer than is necessary for that. In the meantime, can anyone spot a problem with this patch? ==== CUT HERE ==== mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. The point of the RCU protection there is part of getting a stable reference to anon_vma and is only held for anon pages as file pages are locked which is sufficient protection against freeing. However, while a file page's mapping is being migrated, the radix tree is double checked to ensure it is the expected page. This uses radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held triggering the following warning. [ 173.674290] =================================================== [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] [ 173.676016] --------------------------------------------------- [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! [ 173.676016] [ 173.676016] other info that might help us debug this: [ 173.676016] [ 173.676016] [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 [ 173.676016] 1 lock held by hugeadm/2899: [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab [ 173.676016] [ 173.676016] stack backtrace: [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild [ 173.676016] Call Trace: [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa This patch introduces radix_tree_deref_slot_protected() which calls rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock that is protecting this dereference. Holding the tree lock protects against parallel updaters of the radix tree meaning that rcu_dereference_protected is allowable. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/linux/radix-tree.h | 16 ++++++++++++++++ mm/migrate.c | 4 ++-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index ab2baa5..23241c2 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -146,6 +146,22 @@ static inline void *radix_tree_deref_slot(void **pslot) } /** + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held + * @pslot: pointer to slot, returned by radix_tree_lookup_slot + * Returns: item that was stored in that slot with any direct pointer flag + * removed. + * + * Similar to radix_tree_deref_slot but only used during migration when a pages + * mapping is being moved. The caller does not hold the RCU read lock but it + * must hold the tree lock to prevent parallel updates. + */ +static inline void *radix_tree_deref_slot_protected(void **pslot, + spinlock_t *treelock) +{ + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); +} + +/** * radix_tree_deref_retry - check radix_tree_deref_slot * @arg: pointer returned by radix_tree_deref_slot * Returns: 0 if retry is not required, otherwise retry is required diff --git a/mm/migrate.c b/mm/migrate.c index fe5a3c6..7d4686a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -244,7 +244,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, expected_count = 2 + page_has_private(page); if (page_count(page) != expected_count || - (struct page *)radix_tree_deref_slot(pslot) != page) { + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } @@ -316,7 +316,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, expected_count = 2 + page_has_private(page); if (page_count(page) != expected_count || - (struct page *)radix_tree_deref_slot(pslot) != page) { + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2010-12-21 12:26 ` Mel Gorman @ 2010-12-22 2:10 ` Minchan Kim 0 siblings, 0 replies; 10+ messages in thread From: Minchan Kim @ 2010-12-22 2:10 UTC (permalink / raw) To: Mel Gorman Cc: Milton Miller, Andrew Morton, gerald.schaefer, KAMEZAWA Hiroyuki, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky, Paul E. McKenney, Dipankar Sarma On Tue, Dec 21, 2010 at 9:26 PM, Mel Gorman <mel@csn.ul.ie> wrote: > On Tue, Dec 21, 2010 at 01:16:23AM -0600, Milton Miller wrote: >> >> [ Add Paul back to the CC list, and also Dipankar. >> Hopefully I killed the mime encodings correctly ] >> >> On Tue, 21 Dec 2010 at 08:48:50 +0900, Minchan Kim wrote: >> > On Tue, Dec 21, 2010 at 2:01 AM, Mel Gorman <mel@csn.ul.ie> wrote: >> > > On Mon, Dec 20, 2010 at 03:23:36PM +0000, Mel Gorman wrote: >> > > > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous >> > > > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. >> > > > The point of the RCU protection there is part of getting a stable reference >> > > > to anon_vma and is only held for anon pages as file pages are locked >> > > > which is sufficient protection against freeing. >> > > > >> > > > However, while a file page's mapping is being migrated, the radix >> > > > tree is double checked to ensure it is the expected page. This uses >> > > > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held >> > > > triggering the following warning under CONFIG_PROVE_RCU. >> > > > >> > > > [ 173.674290] =================================================== >> > > > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] >> > > > [ 173.676016] --------------------------------------------------- >> > > > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! >> > > > [ 173.676016] >> > > > [ 173.676016] other info that might help us debug this: >> > > > [ 173.676016] >> > > > [ 173.676016] >> > > > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 >> > > > [ 173.676016] 1 lock held by hugeadm/2899: >> > > > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.},at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab >> > > > [ 173.676016] >> > > > [ 173.676016] stack backtrace: >> > > > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild >> > > > [ 173.676016] Call Trace: >> > > > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b >> > > > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 >> > > > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab >> > > > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 >> > > > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 >> > > > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 >> > > > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae >> > > > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa >> > > > >> > > > This patch introduces radix_tree_deref_slot_protected() which calls >> > > > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock >> > > > that is protecting this dereference. Holding the tree lock protects against >> > > > parallel updaters of the radix tree meaning that rcu_dereference_protected >> > > > is allowable. >> > > > >> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> >> > > > --- >> > > > include/linux/radix-tree.h | 17 +++++++++++++++++ >> > > > mm/migrate.c | 4 ++-- >> > > > 2 files changed, 19 insertions(+), 2 deletions(-) >> > > > >> > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h >> > > > index ab2baa5..a1f1672 100644 >> > > > --- a/include/linux/radix-tree.h >> > > > +++ b/include/linux/radix-tree.h >> > > > @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) >> > > > } >> > > > >> > > > /** >> > > > + * radix_tree_deref_slot_protected - dereference a slot without RCUlock but with tree lock held >> > > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot >> > > > + * Returns: item that was stored in that slot with any direct pointer flag >> > > > + * removed. >> > > > + * >> > > > + * Similar to radix_tree_deref_slot but only used during migration when a pages >> > > > + * mapping is being moved. The caller does not hold the RCU read lock but it >> > > > + * must hold the tree lock to prevent parallel updates. >> > > > + */ >> > > > +static inline void *radix_tree_deref_slot_protected(void **pslot, >> > > > + spinlock_t *treelock) >> > > > +{ >> > > > + BUG_ON(rcu_read_lock_held()); >> > >> > Hmm.. Why did you add the check? >> > If rcu_read_lock were already held, we wouldn't need this new API. >> >> I'm not Paul but I can read the code in include/linux/rcuupdate.h. >> >> Holding rcu_read_lock_held isn't a problem, but using protected with >> just the read lock is. >> > > Bah, this was extremely careless of me as it's even written in teh > documentation. In this specific case, it's simply allowed to ignore whether > the RCU read lock is held or not and the BUG_ON check was unnecessary. The > tree lock protects against parallel updaters which is what we really care > about for using _protected. > > In a later cycle, I should look at reducing the RCU read lock hold time > in migration. The main thing it's protecting is getting a stable > reference to anon_vma and it's held longer than is necessary for that. Yes. I think if we want to reduce RCU read lock hold time, we should look unmap_and_move in case of anon page. After we hold a reference of anon_vma->external_refcount, anon_vma would be stable so we can release rcu_read_unlock. It can save many time. > In the meantime, can anyone spot a problem with this patch? > > ==== CUT HERE ==== > mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration > > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > The point of the RCU protection there is part of getting a stable reference > to anon_vma and is only held for anon pages as file pages are locked > which is sufficient protection against freeing. > > However, while a file page's mapping is being migrated, the radix tree > is double checked to ensure it is the expected page. This uses > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > triggering the following warning. > > [ 173.674290] =================================================== > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > [ 173.676016] --------------------------------------------------- > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > [ 173.676016] > [ 173.676016] other info that might help us debug this: > [ 173.676016] > [ 173.676016] > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > [ 173.676016] 1 lock held by hugeadm/2899: > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > [ 173.676016] > [ 173.676016] stack backtrace: > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > [ 173.676016] Call Trace: > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > > This patch introduces radix_tree_deref_slot_protected() which calls > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > that is protecting this dereference. Holding the tree lock protects > against parallel updaters of the radix tree meaning that > rcu_dereference_protected is allowable. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> This is what I want. Thanks. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2010-12-20 17:01 ` Mel Gorman 2010-12-20 23:48 ` Minchan Kim 2010-12-21 7:16 ` Milton Miller @ 2011-01-12 23:21 ` Paul E. McKenney 2011-01-13 10:07 ` Mel Gorman 2 siblings, 1 reply; 10+ messages in thread From: Paul E. McKenney @ 2011-01-12 23:21 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky On Mon, Dec 20, 2010 at 05:01:46PM +0000, Mel Gorman wrote: > On Mon, Dec 20, 2010 at 03:23:36PM +0000, Mel Gorman wrote: > > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > > The point of the RCU protection there is part of getting a stable reference > > to anon_vma and is only held for anon pages as file pages are locked > > which is sufficient protection against freeing. > > > > However, while a file page's mapping is being migrated, the radix > > tree is double checked to ensure it is the expected page. This uses > > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > > triggering the following warning under CONFIG_PROVE_RCU. > > > > [ 173.674290] =================================================== > > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > > [ 173.676016] --------------------------------------------------- > > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > [ 173.676016] > > [ 173.676016] other info that might help us debug this: > > [ 173.676016] > > [ 173.676016] > > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > > [ 173.676016] 1 lock held by hugeadm/2899: > > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > > [ 173.676016] > > [ 173.676016] stack backtrace: > > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > > [ 173.676016] Call Trace: > > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > > > > This patch introduces radix_tree_deref_slot_protected() which calls > > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > > that is protecting this dereference. Holding the tree lock protects against > > parallel updaters of the radix tree meaning that rcu_dereference_protected > > is allowable. > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > --- > > include/linux/radix-tree.h | 17 +++++++++++++++++ > > mm/migrate.c | 4 ++-- > > 2 files changed, 19 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > index ab2baa5..a1f1672 100644 > > --- a/include/linux/radix-tree.h > > +++ b/include/linux/radix-tree.h > > @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) > > } > > > > /** > > + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > + * Returns: item that was stored in that slot with any direct pointer flag > > + * removed. > > + * > > + * Similar to radix_tree_deref_slot but only used during migration when a pages > > + * mapping is being moved. The caller does not hold the RCU read lock but it > > + * must hold the tree lock to prevent parallel updates. > > + */ > > +static inline void *radix_tree_deref_slot_protected(void **pslot, > > + spinlock_t *treelock) > > +{ > > + BUG_ON(rcu_read_lock_held()); > > This was a bad idea. After some extended testing, it was obvious that > this function can be called for swapcache pages with the RCU lock held. > Paul, is it still permissible to use rcu_dereference_protected() or must > the RCU read lock not be held? Apologies for the late reply! It is OK to call rcu_dereference_protected() with rcu_read_lock() held, but -only- if updates are somehow blocked -- for example, the treelock being held as below. It is OK to have extra protection, at least in this case. ;-) > > + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); > > +} > > + > > +/** > > * radix_tree_deref_retry - check radix_tree_deref_slot > > * @arg: pointer returned by radix_tree_deref_slot > > * Returns: 0 if retry is not required, otherwise retry is required > > diff --git a/mm/migrate.c b/mm/migrate.c > > index fe5a3c6..7d4686a 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -244,7 +244,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, > > > > expected_count = 2 + page_has_private(page); > > if (page_count(page) != expected_count || > > - (struct page *)radix_tree_deref_slot(pslot) != page) { > > + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { > > spin_unlock_irq(&mapping->tree_lock); > > return -EAGAIN; > > } > > @@ -316,7 +316,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, > > > > expected_count = 2 + page_has_private(page); > > if (page_count(page) != expected_count || > > - (struct page *)radix_tree_deref_slot(pslot) != page) { > > + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { > > spin_unlock_irq(&mapping->tree_lock); > > return -EAGAIN; > > } > > > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2011-01-12 23:21 ` Paul E. McKenney @ 2011-01-13 10:07 ` Mel Gorman 0 siblings, 0 replies; 10+ messages in thread From: Mel Gorman @ 2011-01-13 10:07 UTC (permalink / raw) To: Paul E. McKenney Cc: Andrew Morton, Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky On Wed, Jan 12, 2011 at 03:21:13PM -0800, Paul E. McKenney wrote: > On Mon, Dec 20, 2010 at 05:01:46PM +0000, Mel Gorman wrote: > > On Mon, Dec 20, 2010 at 03:23:36PM +0000, Mel Gorman wrote: > > > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > > > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > > > The point of the RCU protection there is part of getting a stable reference > > > to anon_vma and is only held for anon pages as file pages are locked > > > which is sufficient protection against freeing. > > > > > > However, while a file page's mapping is being migrated, the radix > > > tree is double checked to ensure it is the expected page. This uses > > > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > > > triggering the following warning under CONFIG_PROVE_RCU. > > > > > > [ 173.674290] =================================================== > > > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > > > [ 173.676016] --------------------------------------------------- > > > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > [ 173.676016] > > > [ 173.676016] other info that might help us debug this: > > > [ 173.676016] > > > [ 173.676016] > > > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > > > [ 173.676016] 1 lock held by hugeadm/2899: > > > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > > > [ 173.676016] > > > [ 173.676016] stack backtrace: > > > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > > > [ 173.676016] Call Trace: > > > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > > > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > > > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > > > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > > > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > > > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > > > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > > > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > > > > > > This patch introduces radix_tree_deref_slot_protected() which calls > > > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > > > that is protecting this dereference. Holding the tree lock protects against > > > parallel updaters of the radix tree meaning that rcu_dereference_protected > > > is allowable. > > > > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > > --- > > > include/linux/radix-tree.h | 17 +++++++++++++++++ > > > mm/migrate.c | 4 ++-- > > > 2 files changed, 19 insertions(+), 2 deletions(-) > > > > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > > index ab2baa5..a1f1672 100644 > > > --- a/include/linux/radix-tree.h > > > +++ b/include/linux/radix-tree.h > > > @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) > > > } > > > > > > /** > > > + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held > > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > > + * Returns: item that was stored in that slot with any direct pointer flag > > > + * removed. > > > + * > > > + * Similar to radix_tree_deref_slot but only used during migration when a pages > > > + * mapping is being moved. The caller does not hold the RCU read lock but it > > > + * must hold the tree lock to prevent parallel updates. > > > + */ > > > +static inline void *radix_tree_deref_slot_protected(void **pslot, > > > + spinlock_t *treelock) > > > +{ > > > + BUG_ON(rcu_read_lock_held()); > > > > This was a bad idea. After some extended testing, it was obvious that > > this function can be called for swapcache pages with the RCU lock held. > > Paul, is it still permissible to use rcu_dereference_protected() or must > > the RCU read lock not be held? > > Apologies for the late reply! > > It is OK to call rcu_dereference_protected() with rcu_read_lock() held, > but -only- if updates are somehow blocked -- for example, the treelock > being held as below. > > It is OK to have extra protection, at least in this case. ;-) > Thanks for the clarification Paul. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration 2010-12-20 15:23 [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration Mel Gorman 2010-12-20 17:01 ` Mel Gorman @ 2010-12-22 8:56 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 10+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-12-22 8:56 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Minchan Kim, gerald.schaefer, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Heiko Carstens, Martin Schwidefsky On Mon, 20 Dec 2010 15:23:36 +0000 Mel Gorman <mel@csn.ul.ie> wrote: > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > The point of the RCU protection there is part of getting a stable reference > to anon_vma and is only held for anon pages as file pages are locked > which is sufficient protection against freeing. > > However, while a file page's mapping is being migrated, the radix > tree is double checked to ensure it is the expected page. This uses > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > triggering the following warning under CONFIG_PROVE_RCU. > > [ 173.674290] =================================================== > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > [ 173.676016] --------------------------------------------------- > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > [ 173.676016] > [ 173.676016] other info that might help us debug this: > [ 173.676016] > [ 173.676016] > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > [ 173.676016] 1 lock held by hugeadm/2899: > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > [ 173.676016] > [ 173.676016] stack backtrace: > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > [ 173.676016] Call Trace: > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > > This patch introduces radix_tree_deref_slot_protected() which calls > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > that is protecting this dereference. Holding the tree lock protects against > parallel updaters of the radix tree meaning that rcu_dereference_protected > is allowable. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> Thank you for fixing. Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > include/linux/radix-tree.h | 17 +++++++++++++++++ > mm/migrate.c | 4 ++-- > 2 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index ab2baa5..a1f1672 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -146,6 +146,23 @@ static inline void *radix_tree_deref_slot(void **pslot) > } > > /** > + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > + * Returns: item that was stored in that slot with any direct pointer flag > + * removed. > + * > + * Similar to radix_tree_deref_slot but only used during migration when a pages > + * mapping is being moved. The caller does not hold the RCU read lock but it > + * must hold the tree lock to prevent parallel updates. > + */ > +static inline void *radix_tree_deref_slot_protected(void **pslot, > + spinlock_t *treelock) > +{ > + BUG_ON(rcu_read_lock_held()); > + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); > +} > + > +/** > * radix_tree_deref_retry - check radix_tree_deref_slot > * @arg: pointer returned by radix_tree_deref_slot > * Returns: 0 if retry is not required, otherwise retry is required > diff --git a/mm/migrate.c b/mm/migrate.c > index fe5a3c6..7d4686a 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -244,7 +244,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, > > expected_count = 2 + page_has_private(page); > if (page_count(page) != expected_count || > - (struct page *)radix_tree_deref_slot(pslot) != page) { > + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } > @@ -316,7 +316,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, > > expected_count = 2 + page_has_private(page); > if (page_count(page) != expected_count || > - (struct page *)radix_tree_deref_slot(pslot) != page) { > + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-01-13 10:07 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-12-20 15:23 [PATCH] mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration Mel Gorman 2010-12-20 17:01 ` Mel Gorman 2010-12-20 23:48 ` Minchan Kim 2010-12-21 10:49 ` Mel Gorman 2010-12-21 7:16 ` Milton Miller 2010-12-21 12:26 ` Mel Gorman 2010-12-22 2:10 ` Minchan Kim 2011-01-12 23:21 ` Paul E. McKenney 2011-01-13 10:07 ` Mel Gorman 2010-12-22 8:56 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).