* [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage @ 2010-11-21 11:26 Arun Bhanu 2010-11-21 13:30 ` Ted Ts'o 0 siblings, 1 reply; 21+ messages in thread From: Arun Bhanu @ 2010-11-21 11:26 UTC (permalink / raw) To: linux-ext4, linux-kernel Cc: Theodore Ts'o, Andreas Dilger, Paul McKenney, Eric Sandeen I saw this in kernel log messages while testing 2.6.37-rc2. I think it appeared while mounting an external hard-disk. I can't seem to reproduce it. Please let me know if you need more info. [47064.272151] =================================================== [47064.273474] [ INFO: suspicious rcu_dereference_check() usage. ] [47064.273956] --------------------------------------------------- [47064.274431] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! [47064.274905] [47064.274906] other info that might help us debug this: [47064.274907] [47064.276303] [47064.276303] rcu_scheduler_active = 1, debug_locks = 0 [47064.277202] 2 locks held by mount/21199: [47064.277635] #0: (&type->s_umount_key#20/1){+.+.+.}, at: [<c05007f0>] sget+0x21f/0x35d [47064.278078] #1: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c04f5e67>] migrate_page_move_mapping+0x3a/0x120 [47064.278529] [47064.278529] stack backtrace: [47064.279409] Pid: 21199, comm: mount Not tainted 2.6.37-rc2-ab2-589136bfa784a4558b397f017ca2f06f0ca9080e+ #1 [47064.279864] Call Trace: [47064.280313] [<c0822e61>] ? printk+0x2d/0x34 [47064.280765] [<c04709c8>] lockdep_rcu_dereference+0x97/0x9f [47064.281220] [<c04f5e28>] radix_tree_deref_slot+0x4a/0x4f [47064.281680] [<c04f5ea7>] migrate_page_move_mapping+0x7a/0x120 [47064.282129] [<c04f6323>] migrate_page+0x1f/0x35 [47064.282573] [<c04f6464>] move_to_new_page+0x12b/0x164 [47064.283017] [<c04f6773>] migrate_pages+0x1e1/0x2ee [47064.283463] [<c04eed89>] ? compaction_alloc+0x0/0x1ef [47064.283918] [<c04eebdc>] compact_zone+0x24f/0x3fc [47064.284363] [<c04ef0f4>] try_to_compact_pages+0x17c/0x1e6 [47064.284820] [<c04cac33>] __alloc_pages_nodemask+0x397/0x6b7 [47064.285276] [<c04caf75>] __get_free_pages+0x22/0x33 [47064.285729] [<c04f4304>] __kmalloc+0x2f/0x112 [47064.286193] [<c0435adc>] ? should_resched+0xd/0x28 [47064.286655] [<c0578ca9>] kzalloc.clone.58+0x12/0x14 [47064.287118] [<c057c7f1>] ext4_fill_super+0x1090/0x2521 [47064.287573] [<c040ea02>] ? native_sched_clock+0x14/0x52 [47064.288029] [<c054774a>] ? disk_name+0x86/0x90 [47064.288477] [<c0523bfa>] ? set_blocksize+0x33/0x78 [47064.288930] [<c0500aed>] mount_bdev+0x123/0x16d [47064.289385] [<c057b761>] ? ext4_fill_super+0x0/0x2521 [47064.289842] [<c05776fd>] ? ext4_mount+0x0/0x24 [47064.290300] [<c057771c>] ext4_mount+0x1f/0x24 [47064.290760] [<c057b761>] ? ext4_fill_super+0x0/0x2521 [47064.291222] [<c05003e1>] vfs_kern_mount+0xa1/0x1ad [47064.291687] [<c0511c36>] ? get_fs_type+0x38/0x91 [47064.292149] [<c0500546>] do_kern_mount+0x3d/0xc8 [47064.292615] [<c05142b2>] do_mount+0x614/0x640 [47064.293056] [<c04d69f0>] ? strndup_user+0x2e/0x3f [47064.293489] [<c05144a1>] sys_mount+0x6d/0x99 [47064.293925] [<c040971f>] sysenter_do_call+0x12/0x38 [47064.368116] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) -Arun ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage 2010-11-21 11:26 [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage Arun Bhanu @ 2010-11-21 13:30 ` Ted Ts'o 2010-11-21 15:39 ` Minchan Kim 0 siblings, 1 reply; 21+ messages in thread From: Ted Ts'o @ 2010-11-21 13:30 UTC (permalink / raw) To: linux-ext4, linux-mm, linux-kernel, Andreas Dilger, Paul McKenney, Eric Sandeen <san On Sun, Nov 21, 2010 at 07:26:11PM +0800, Arun Bhanu wrote: > I saw this in kernel log messages while testing 2.6.37-rc2. I think it > appeared while mounting an external hard-disk. I can't seem to > reproduce it. I could be wrong but this looks like it's a bug in mm/migrate.c in migrate_page_move_mapping(): it is calling radix_tree_lookup_slot() without first taking an rcu_read_lock(). It was triggered by a memory allocation out of ext4_fill_super(), which then triggered a memory compaction/migration, but I don't believe it's otherwise related to the ext4 code. Over to the linux-mm folks for confirmation... - Ted > Please let me know if you need more info. > > [47064.272151] =================================================== > [47064.273474] [ INFO: suspicious rcu_dereference_check() usage. ] > [47064.273956] --------------------------------------------------- > [47064.274431] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > [47064.274905] > [47064.274906] other info that might help us debug this: > [47064.274907] > [47064.276303] > [47064.276303] rcu_scheduler_active = 1, debug_locks = 0 > [47064.277202] 2 locks held by mount/21199: > [47064.277635] #0: (&type->s_umount_key#20/1){+.+.+.}, at: [<c05007f0>] sget+0x21f/0x35d > [47064.278078] #1: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c04f5e67>] migrate_page_move_mapping+0x3a/0x120 > [47064.278529] > [47064.278529] stack backtrace: > [47064.279409] Pid: 21199, comm: mount Not tainted 2.6.37-rc2-ab2-589136bfa784a4558b397f017ca2f06f0ca9080e+ #1 > [47064.279864] Call Trace: > [47064.280313] [<c0822e61>] ? printk+0x2d/0x34 > [47064.280765] [<c04709c8>] lockdep_rcu_dereference+0x97/0x9f > [47064.281220] [<c04f5e28>] radix_tree_deref_slot+0x4a/0x4f > [47064.281680] [<c04f5ea7>] migrate_page_move_mapping+0x7a/0x120 > [47064.282129] [<c04f6323>] migrate_page+0x1f/0x35 > [47064.282573] [<c04f6464>] move_to_new_page+0x12b/0x164 > [47064.283017] [<c04f6773>] migrate_pages+0x1e1/0x2ee > [47064.283463] [<c04eed89>] ? compaction_alloc+0x0/0x1ef > [47064.283918] [<c04eebdc>] compact_zone+0x24f/0x3fc > [47064.284363] [<c04ef0f4>] try_to_compact_pages+0x17c/0x1e6 > [47064.284820] [<c04cac33>] __alloc_pages_nodemask+0x397/0x6b7 > [47064.285276] [<c04caf75>] __get_free_pages+0x22/0x33 > [47064.285729] [<c04f4304>] __kmalloc+0x2f/0x112 > [47064.286193] [<c0435adc>] ? should_resched+0xd/0x28 > [47064.286655] [<c0578ca9>] kzalloc.clone.58+0x12/0x14 > [47064.287118] [<c057c7f1>] ext4_fill_super+0x1090/0x2521 > [47064.287573] [<c040ea02>] ? native_sched_clock+0x14/0x52 > [47064.288029] [<c054774a>] ? disk_name+0x86/0x90 > [47064.288477] [<c0523bfa>] ? set_blocksize+0x33/0x78 > [47064.288930] [<c0500aed>] mount_bdev+0x123/0x16d > [47064.289385] [<c057b761>] ? ext4_fill_super+0x0/0x2521 > [47064.289842] [<c05776fd>] ? ext4_mount+0x0/0x24 > [47064.290300] [<c057771c>] ext4_mount+0x1f/0x24 > [47064.290760] [<c057b761>] ? ext4_fill_super+0x0/0x2521 > [47064.291222] [<c05003e1>] vfs_kern_mount+0xa1/0x1ad > [47064.291687] [<c0511c36>] ? get_fs_type+0x38/0x91 > [47064.292149] [<c0500546>] do_kern_mount+0x3d/0xc8 > [47064.292615] [<c05142b2>] do_mount+0x614/0x640 > [47064.293056] [<c04d69f0>] ? strndup_user+0x2e/0x3f > [47064.293489] [<c05144a1>] sys_mount+0x6d/0x99 > [47064.293925] [<c040971f>] sysenter_do_call+0x12/0x38 > [47064.368116] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) > > -Arun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage 2010-11-21 13:30 ` Ted Ts'o @ 2010-11-21 15:39 ` Minchan Kim 2010-11-21 17:37 ` Ted Ts'o 0 siblings, 1 reply; 21+ messages in thread From: Minchan Kim @ 2010-11-21 15:39 UTC (permalink / raw) To: Ted Ts'o, linux-ext4, linux-mm, linux-kernel, Andreas Dilger, Paul McKenney On Sun, Nov 21, 2010 at 08:30:24AM -0500, Ted Ts'o wrote: > On Sun, Nov 21, 2010 at 07:26:11PM +0800, Arun Bhanu wrote: > > I saw this in kernel log messages while testing 2.6.37-rc2. I think it > > appeared while mounting an external hard-disk. I can't seem to > > reproduce it. > > I could be wrong but this looks like it's a bug in mm/migrate.c in > migrate_page_move_mapping(): it is calling radix_tree_lookup_slot() > without first taking an rcu_read_lock(). > > It was triggered by a memory allocation out of ext4_fill_super(), > which then triggered a memory compaction/migration, but I don't > believe it's otherwise related to the ext4 code. > > Over to the linux-mm folks for confirmation... I think it's no problem. That's because migration always holds lock_page on the file page. So the page couldn't remove from radix. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage 2010-11-21 15:39 ` Minchan Kim @ 2010-11-21 17:37 ` Ted Ts'o 2010-11-22 0:38 ` Minchan Kim 2010-11-23 7:16 ` [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage KOSAKI Motohiro 0 siblings, 2 replies; 21+ messages in thread From: Ted Ts'o @ 2010-11-21 17:37 UTC (permalink / raw) To: Minchan Kim Cc: linux-ext4, linux-mm, linux-kernel, Andreas Dilger, Paul McKenney, Eric Sandeen On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote: > > I think it's no problem. > > That's because migration always holds lock_page on the file page. > So the page couldn't remove from radix. It may be "ok" in that it won't cause a race, but it still leaves an unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will cause /proc_lock_stat to be disabled. So I think it still needs to be fixed by adding rcu_read_lock()/rcu_read_unlock() to migrate_page_move_mapping(). - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage 2010-11-21 17:37 ` Ted Ts'o @ 2010-11-22 0:38 ` Minchan Kim 2010-11-22 3:31 ` Milton Miller 2010-11-23 7:16 ` [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage KOSAKI Motohiro 1 sibling, 1 reply; 21+ messages in thread From: Minchan Kim @ 2010-11-22 0:38 UTC (permalink / raw) To: Ted Ts'o, Minchan Kim, linux-ext4, linux-mm, linux-kernel, Andreas Dilger On Mon, Nov 22, 2010 at 2:37 AM, Ted Ts'o <tytso@mit.edu> wrote: > On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote: >> >> I think it's no problem. >> >> That's because migration always holds lock_page on the file page. >> So the page couldn't remove from radix. > > It may be "ok" in that it won't cause a race, but it still leaves an > unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will > cause /proc_lock_stat to be disabled. So I think it still needs to be > fixed by adding rcu_read_lock()/rcu_read_unlock() to > migrate_page_move_mapping(). > > - Ted > Yes. if it is really "ok" about race, we will add rcu_read_lock with below comment to prevent false positive. "suppress RCU lockdep false positives". But I am not sure it's good although rcu_read_lock is little cost. Whenever we find false positive, should we add rcu_read_lock to suppress although it's no problem in real product? Couldn't we provide following function? (or we might have already it but I missed it. ) /* * Suppress RCU lockdep false positive. */ #ifdef CONFIG_PROVE_RCU #define rcu_read_lock_suppress rcu_read_lock #else #define rcu_read_lock_suppress #endif -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage 2010-11-22 0:38 ` Minchan Kim @ 2010-11-22 3:31 ` Milton Miller 2010-11-22 6:16 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Milton Miller @ 2010-11-22 3:31 UTC (permalink / raw) To: Minchan Kim Cc: linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Paul E. McKenney On 2010-11-22 at around 0:38:49, Minchan Kim wrote: > On Mon, Nov 22, 2010 at 2:37 AM, Ted Ts'o <tytso@mit.edu> wrote: > > On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote: > > > > > > I think it's no problem. > > > > > > That's because migration always holds lock_page on the file page. > > > So the page couldn't remove from radix. > > > > It may be "ok" in that it won't cause a race, but it still leaves an > > unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will > > cause /proc_lock_stat to be disabled. So I think it still needs to be > > fixed by adding rcu_read_lock()/rcu_read_unlock() to > > migrate_page_move_mapping(). > > > > - Ted > > > > Yes. if it is really "ok" about race, we will add rcu_read_lock with > below comment to prevent false positive. > "suppress RCU lockdep false positives". > But I am not sure it's good although rcu_read_lock is little cost. > Whenever we find false positive, should we add rcu_read_lock to > suppress although it's no problem in real product? > Couldn't we provide following function? (or we might have already it > but I missed it. ) > > /* > * Suppress RCU lockdep false positive. > */ > #ifdef CONFIG_PROVE_RCU > #define rcu_read_lock_suppress rcu_read_lock > #else > #define rcu_read_lock_suppress > #endif No, you don't need anything like this, as rcu_dereference_check already takes a test for alternate locking. However, looking more closely at the code, it appears this is the "the tree is write locked" case as described in radix-tree.h Looking at rcupdate.h, perhaps we need a version of radix_tree_deref_slot that uses rcu_dereference_protected? Copying Paul McKenney for rcu ... milton ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage 2010-11-22 3:31 ` Milton Miller @ 2010-11-22 6:16 ` Paul E. McKenney 2010-12-07 19:01 ` [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! Gerald Schaefer 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2010-11-22 6:16 UTC (permalink / raw) To: Milton Miller Cc: Minchan Kim, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu On Sun, Nov 21, 2010 at 09:31:14PM -0600, Milton Miller wrote: > On 2010-11-22 at around 0:38:49, Minchan Kim wrote: > > On Mon, Nov 22, 2010 at 2:37 AM, Ted Ts'o <tytso@mit.edu> wrote: > > > On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote: > > > > > > > > I think it's no problem. > > > > > > > > That's because migration always holds lock_page on the file page. > > > > So the page couldn't remove from radix. > > > > > > It may be "ok" in that it won't cause a race, but it still leaves an > > > unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will > > > cause /proc_lock_stat to be disabled. So I think it still needs to be > > > fixed by adding rcu_read_lock()/rcu_read_unlock() to > > > migrate_page_move_mapping(). > > > > > > - Ted > > > > > > > Yes. if it is really "ok" about race, we will add rcu_read_lock with > > below comment to prevent false positive. > > "suppress RCU lockdep false positives". > > But I am not sure it's good although rcu_read_lock is little cost. > > Whenever we find false positive, should we add rcu_read_lock to > > suppress although it's no problem in real product? > > Couldn't we provide following function? (or we might have already it > > but I missed it. ) > > > > /* > > * Suppress RCU lockdep false positive. > > */ > > #ifdef CONFIG_PROVE_RCU > > #define rcu_read_lock_suppress rcu_read_lock > > #else > > #define rcu_read_lock_suppress > > #endif > > No, you don't need anything like this, as rcu_dereference_check already > takes a test for alternate locking. > > However, looking more closely at the code, it appears this is the > "the tree is write locked" case as described in radix-tree.h > > Looking at rcupdate.h, perhaps we need a version of radix_tree_deref_slot > that uses rcu_dereference_protected? > > Copying Paul McKenney for rcu ... This approach could work. One way of doing it would be to add a second argument: static inline void *radix_tree_deref_slot_check(void **pslot, int ldc) { void *ret = rcu_dereference_check(*pslot, ldc); if (unlikely(radix_tree_is_indirect_ptr(ret))) ret = RADIX_TREE_RETRY; return ret; } static inline void *radix_tree_deref_slot(void **pslot) { return radix_tree_deref_slot_check(pslot, rcu_read_lock_held()); } Another alternative would have radix_tree_deref_slot() pass "1" into the "ldc" argument, which reduces splats but at the expense of failing to detect problems. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-11-22 6:16 ` Paul E. McKenney @ 2010-12-07 19:01 ` Gerald Schaefer 2010-12-08 1:19 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 21+ messages in thread From: Gerald Schaefer @ 2010-12-07 19:01 UTC (permalink / raw) To: paulmck, KAMEZAWA Hiroyuki Cc: Milton Miller, Minchan Kim, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky I got the same warning as reported by Arun Bhanu in "[BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage" during memory hotplug test on 2.6.37-rc5, see below. migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. This should be ok, as the comment suggests, so the warning is probably a false positive. Eventually, migrate_page_move_mapping() will call radix_tree_deref_slot() with tree_lock held, but w/o rcu_read_lock() for non-anonymous pages. As suggested by Milton before, a new version of radix_tree_deref_slot that uses rcu_dereference_protected could help, if this is a false positive, or maybe the rcu_read_lock() should be called for all pages, not just anonymous. Any thoughts? =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 6 locks held by sh/960: #0: (&buffer->mutex){+.+.+.}, at: [<00000000002afa7a>] sysfs_write_file+0x4a/0x1a8 #1: (s_active#52){.+.+.+}, at: [<00000000002afb02>] sysfs_write_file+0xd2/0x1a8 #2: (&mem->state_mutex){+.+.+.}, at: [<0000000000404d78>] memory_block_change_state+0x40/0x1a0 #3: (mem_hotplug_mutex){+.+.+.}, at: [<00000000002242c4>] lock_memory_hotplug+0x2c/0x44 #4: (pm_mutex){+.+.+.}, at: [<000000000022483c>] remove_memory+0x104/0x758 #5: (&(&inode->i_data.tree_lock)->rlock){..-...}, at: [<000000000022624a>] migrate_page_move_mapping+0x4a/0x2d8 stack backtrace: CPU: 2 Not tainted 2.6.37-rc5 #2 Process sh (pid: 960, task: 0000000018e38640, ksp: 000000000f037818) 000000000f037ad8 000000000f037a58 0000000000000002 0000000000000000 000000000f037af8 000000000f037a70 000000000f037a70 0000000000567c42 0000000000000000 0000000017bd7e78 0000000000000002 0000000017bd2e30 000003e00000000d 000003e00000000c 000000000f037ac0 0000000000000000 0000000000000000 0000000000100bfa 000000000f037a58 000000000f037a98 Call Trace: ([<0000000000100b02>] show_trace+0xee/0x144) [<00000000002263ea>] migrate_page_move_mapping+0x1ea/0x2d8 [<0000000000226b1c>] migrate_page+0x38/0x68 [<0000000000226c36>] move_to_new_page+0xea/0x2bc [<00000000002276f6>] migrate_pages+0x496/0x568 [<0000000000224be6>] remove_memory+0x4ae/0x758 [<0000000000404e20>] memory_block_change_state+0xe8/0x1a0 [<0000000000404fda>] store_mem_state+0x102/0x138 [<00000000002afb26>] sysfs_write_file+0xf6/0x1a8 [<000000000023357c>] vfs_write+0xac/0x18c [<0000000000233758>] SyS_write+0x58/0xa8 [<0000000000113976>] sysc_noemu+0x16/0x1c [<0000020000162edc>] 0x20000162edc INFO: lockdep is turned off. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-07 19:01 ` [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! Gerald Schaefer @ 2010-12-08 1:19 ` KAMEZAWA Hiroyuki 2010-12-16 13:50 ` Gerald Schaefer 0 siblings, 1 reply; 21+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-12-08 1:19 UTC (permalink / raw) To: gerald.schaefer Cc: paulmck, Milton Miller, Minchan Kim, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Tue, 07 Dec 2010 20:01:49 +0100 Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote: > I got the same warning as reported by Arun Bhanu in "[BUG?] [Ext4] INFO: > suspicious rcu_dereference_check() usage" during memory hotplug test on > 2.6.37-rc5, see below. > > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > This should be ok, as the comment suggests, so the warning is probably a > false positive. Eventually, migrate_page_move_mapping() will call > radix_tree_deref_slot() with tree_lock held, but w/o rcu_read_lock() for > non-anonymous pages. > > As suggested by Milton before, a new version of radix_tree_deref_slot that > uses rcu_dereference_protected could help, if this is a false positive, or > maybe the rcu_read_lock() should be called for all pages, not just anonymous. > Any thoughts? > Hmm, in radix_tree_deref_slot() comment, 140 * For use with radix_tree_lookup_slot(). Caller must hold tree at least read 141 * locked across slot lookup and dereference. More likely, will be used with 142 * radix_tree_replace_slot(), as well, so caller will hold tree write locked. racy update must not happen. rcu_dereference_protected() sounds nice. Thanks, -Kame ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-08 1:19 ` KAMEZAWA Hiroyuki @ 2010-12-16 13:50 ` Gerald Schaefer 2010-12-17 0:04 ` Minchan Kim 0 siblings, 1 reply; 21+ messages in thread From: Gerald Schaefer @ 2010-12-16 13:50 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: paulmck, Milton Miller, Minchan Kim, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see below. Both cases are easily reproducible: memory unplug with big page cache, or adding large pages during run-time. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by bash/761: #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 stack backtrace: CPU: 1 Not tainted 2.6.37-rc6 #4 Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 Call Trace: ([<0000000000100b02>] show_trace+0xee/0x144) [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 [<0000000000226c80>] migrate_page+0x38/0x68 [<0000000000226d9a>] move_to_new_page+0xea/0x2bc [<000000000022785a>] migrate_pages+0x496/0x568 [<000000000021e24e>] compact_zone+0x432/0x7d8 [<000000000021e772>] compact_zone_order+0x9e/0xbc [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c [<00000000002a65be>] proc_sys_write+0x26/0x34 [<00000000002336e0>] vfs_write+0xac/0x18c [<00000000002338bc>] SyS_write+0x58/0xa8 [<0000000000113976>] sysc_noemu+0x16/0x1c [<0000020000162edc>] 0x20000162edc INFO: lockdep is turned off. I honestly do not understand 100% why this is a false positive, seeing that e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but the &mapping->tree_lock instead. So I'm not quite sure how to fix this properly, but simply adding rcu_read_lock/unlock() to the affected code paths, even if it is not necessary for synchronization, would get rid of the warning, like in the following patch. Any ideas? --- fs/hugetlbfs/inode.c | 2 ++ mm/migrate.c | 4 ++++ 2 files changed, 6 insertions(+) --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct { int rc; + rcu_read_lock(); rc = migrate_huge_page_move_mapping(mapping, newpage, page); + rcu_read_unlock(); if (rc) return rc; migrate_page_copy(newpage, page); --- a/mm/migrate.c +++ b/mm/migrate.c @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m BUG_ON(PageWriteback(page)); /* Writeback must be complete */ + rcu_read_lock(); rc = migrate_page_move_mapping(mapping, newpage, page); + rcu_read_unlock(); if (rc) return rc; @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s head = page_buffers(page); + rcu_read_lock(); rc = migrate_page_move_mapping(mapping, newpage, page); + rcu_read_unlock(); if (rc) return rc; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-16 13:50 ` Gerald Schaefer @ 2010-12-17 0:04 ` Minchan Kim 2010-12-17 5:47 ` Paul E. McKenney 2010-12-17 8:39 ` Mel Gorman 0 siblings, 2 replies; 21+ messages in thread From: Minchan Kim @ 2010-12-17 0:04 UTC (permalink / raw) To: gerald.schaefer Cc: KAMEZAWA Hiroyuki, paulmck, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote: > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > below. Both cases are easily reproducible: memory unplug with big page cache, > or adding large pages during run-time. > > =================================================== > [ INFO: suspicious rcu_dereference_check() usage. ] > --------------------------------------------------- > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > other info that might help us debug this: > > > rcu_scheduler_active = 1, debug_locks = 0 > 1 lock held by bash/761: > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > stack backtrace: > CPU: 1 Not tainted 2.6.37-rc6 #4 > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > Call Trace: > ([<0000000000100b02>] show_trace+0xee/0x144) > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > [<0000000000226c80>] migrate_page+0x38/0x68 > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > [<000000000022785a>] migrate_pages+0x496/0x568 > [<000000000021e24e>] compact_zone+0x432/0x7d8 > [<000000000021e772>] compact_zone_order+0x9e/0xbc > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > [<00000000002a65be>] proc_sys_write+0x26/0x34 > [<00000000002336e0>] vfs_write+0xac/0x18c > [<00000000002338bc>] SyS_write+0x58/0xa8 > [<0000000000113976>] sysc_noemu+0x16/0x1c > [<0000020000162edc>] 0x20000162edc > INFO: lockdep is turned off. > > I honestly do not understand 100% why this is a false positive, seeing that > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > even if it is not necessary for synchronization, would get rid of the warning, > like in the following patch. Any ideas? In case of anon page, we hold rcu_read_lock in unmap_and_move. The problem is file-backed page. In case of that, we hold lock_page and mapping->tree_lock as update-side lock. So we don't need rcu_read_lock. > > --- > fs/hugetlbfs/inode.c | 2 ++ > mm/migrate.c | 4 ++++ > 2 files changed, 6 insertions(+) > > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > { > int rc; > > + rcu_read_lock(); > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > + rcu_read_unlock(); > if (rc) > return rc; > migrate_page_copy(newpage, page); > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > + rcu_read_lock(); > rc = migrate_page_move_mapping(mapping, newpage, page); > + rcu_read_unlock(); > > if (rc) > return rc; > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > head = page_buffers(page); > > + rcu_read_lock(); > rc = migrate_page_move_mapping(mapping, newpage, page); > + rcu_read_unlock(); > > if (rc) > return rc; > > > How about this? Maybe Paul have better idea. (It's apparently be word-wrapped.) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index ab2baa5..135af1e 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot) } /** + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check + * @pslot: pointer to slot, returned by radix_tree_lookup_slot + * Returns: item that was stored in that slot with any direct pointer flag + * removed. + * + * This functions works like radix_tree_deref_slot except it doesn't check + * RCU rule. Normally this funcion is used with update-side lock. + * You should use this function very carefully. + */ +static inline void *radix_tree_deref_slot_nocheck(void **pslot) +{ + return rcu_dereference_protected(*pslot, 1); +} +/** * radix_tree_deref_retry - check radix_tree_deref_slot * @arg: pointer returned by radix_tree_deref_slot * Returns: 0 if retry is not required, otherwise retry is required diff --git a/mm/migrate.c b/mm/migrate.c index 2eb2243..5be2841 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct address_space *mappin expected_count = 2 + page_has_private(page); if (page_count(page) != expected_count || - (struct page *)radix_tree_deref_slot(pslot) != page) { + (struct page *)radix_tree_deref_slot_nocheck(pslot) + != page) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, expected_count = 2 + page_has_private(page); if (page_count(page) != expected_count || - (struct page *)radix_tree_deref_slot(pslot) != page) { + (struct page *)radix_tree_deref_slot_nocheck(pslot) + != page) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 0:04 ` Minchan Kim @ 2010-12-17 5:47 ` Paul E. McKenney 2010-12-17 5:59 ` Eric Dumazet 2010-12-17 15:08 ` Minchan Kim 2010-12-17 8:39 ` Mel Gorman 1 sibling, 2 replies; 21+ messages in thread From: Paul E. McKenney @ 2010-12-17 5:47 UTC (permalink / raw) To: Minchan Kim Cc: gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer > <gerald.schaefer@de.ibm.com> wrote: > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > > below. Both cases are easily reproducible: memory unplug with big page cache, > > or adding large pages during run-time. > > > > =================================================== > > [ INFO: suspicious rcu_dereference_check() usage. ] > > --------------------------------------------------- > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > other info that might help us debug this: > > > > > > rcu_scheduler_active = 1, debug_locks = 0 > > 1 lock held by bash/761: > > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > > > stack backtrace: > > CPU: 1 Not tainted 2.6.37-rc6 #4 > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > > Call Trace: > > ([<0000000000100b02>] show_trace+0xee/0x144) > > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > > [<0000000000226c80>] migrate_page+0x38/0x68 > > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > > [<000000000022785a>] migrate_pages+0x496/0x568 > > [<000000000021e24e>] compact_zone+0x432/0x7d8 > > [<000000000021e772>] compact_zone_order+0x9e/0xbc > > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > > [<00000000002a65be>] proc_sys_write+0x26/0x34 > > [<00000000002336e0>] vfs_write+0xac/0x18c > > [<00000000002338bc>] SyS_write+0x58/0xa8 > > [<0000000000113976>] sysc_noemu+0x16/0x1c > > [<0000020000162edc>] 0x20000162edc > > INFO: lockdep is turned off. > > > > I honestly do not understand 100% why this is a false positive, seeing that > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > > even if it is not necessary for synchronization, would get rid of the warning, > > like in the following patch. Any ideas? > > In case of anon page, we hold rcu_read_lock in unmap_and_move. > The problem is file-backed page. In case of that, we hold lock_page > and mapping->tree_lock as update-side lock. > So we don't need rcu_read_lock. > > > > > --- > > fs/hugetlbfs/inode.c | 2 ++ > > mm/migrate.c | 4 ++++ > > 2 files changed, 6 insertions(+) > > > > --- a/fs/hugetlbfs/inode.c > > +++ b/fs/hugetlbfs/inode.c > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > > { > > int rc; > > > > + rcu_read_lock(); > > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > > + rcu_read_unlock(); > > if (rc) > > return rc; > > migrate_page_copy(newpage, page); > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > > > + rcu_read_lock(); > > rc = migrate_page_move_mapping(mapping, newpage, page); > > + rcu_read_unlock(); > > > > if (rc) > > return rc; > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > > > head = page_buffers(page); > > > > + rcu_read_lock(); > > rc = migrate_page_move_mapping(mapping, newpage, page); > > + rcu_read_unlock(); > > > > if (rc) > > return rc; > > > > > > > > > How about this? > Maybe Paul have better idea. > (It's apparently be word-wrapped.) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index ab2baa5..135af1e 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot) > } > > /** > + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > + * Returns: item that was stored in that slot with any direct pointer flag > + * removed. > + * > + * This functions works like radix_tree_deref_slot except it doesn't check > + * RCU rule. Normally this funcion is used with update-side lock. > + * You should use this function very carefully. > + */ > +static inline void *radix_tree_deref_slot_nocheck(void **pslot) > +{ > + return rcu_dereference_protected(*pslot, 1); I suggest replacing the "1" with lockdep expressions for the locks that you say might be held: return rcu_dereference_check(*pslot, lockdep_is_held(&mapping->tree_lock)); This assumes that when you said "and" you meant both lock_page() and mapping->tree_lock. Also you need to pass in the mapping, which should not be a problem given likely inlining. If you meant that either mapping->tree_lock or page_lock() might be held, I suppose that the page_lock() state could be passed in, but perhaps better to take a general lockdep expression. So, either or both? ;-) Thanx, Paul > +} > +/** > * radix_tree_deref_retry - check radix_tree_deref_slot > * @arg: pointer returned by radix_tree_deref_slot > * Returns: 0 if retry is not required, otherwise retry is required > diff --git a/mm/migrate.c b/mm/migrate.c > index 2eb2243..5be2841 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct > address_space *mappin > > expected_count = 2 + page_has_private(page); > if (page_count(page) != expected_count || > - (struct page *)radix_tree_deref_slot(pslot) != page) { > + (struct page *)radix_tree_deref_slot_nocheck(pslot) > + != page) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } > @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct > address_space *mapping, > > expected_count = 2 + page_has_private(page); > if (page_count(page) != expected_count || > - (struct page *)radix_tree_deref_slot(pslot) != page) { > + (struct page *)radix_tree_deref_slot_nocheck(pslot) > + != page) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } > > > > > -- > Kind regards, > Minchan Kim > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 5:47 ` Paul E. McKenney @ 2010-12-17 5:59 ` Eric Dumazet 2010-12-17 15:08 ` Minchan Kim 1 sibling, 0 replies; 21+ messages in thread From: Eric Dumazet @ 2010-12-17 5:59 UTC (permalink / raw) To: paulmck Cc: Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky Le jeudi 16 décembre 2010 à 21:47 -0800, Paul E. McKenney a écrit : > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > > How about this? > > Maybe Paul have better idea. > > (It's apparently be word-wrapped.) > > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > index ab2baa5..135af1e 100644 > > --- a/include/linux/radix-tree.h > > +++ b/include/linux/radix-tree.h > > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot) > > } > > > > /** > > + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > + * Returns: item that was stored in that slot with any direct pointer flag > > + * removed. > > + * > > + * This functions works like radix_tree_deref_slot except it doesn't check > > + * RCU rule. Normally this funcion is used with update-side lock. > > + * You should use this function very carefully. > > + */ > > +static inline void *radix_tree_deref_slot_nocheck(void **pslot) > > +{ > > + return rcu_dereference_protected(*pslot, 1); > > I suggest replacing the "1" with lockdep expressions for the locks > that you say might be held: > > return rcu_dereference_check(*pslot, > lockdep_is_held(&mapping->tree_lock)); > Yes, but point was also to use rcu_dereference_protected() here, not rcu_dereference_check(). > This assumes that when you said "and" you meant both lock_page() and > mapping->tree_lock. Also you need to pass in the mapping, which > should not be a problem given likely inlining. > > If you meant that either mapping->tree_lock or page_lock() might be > held, I suppose that the page_lock() state could be passed in, but > perhaps better to take a general lockdep expression. > > So, either or both? ;-) > Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 5:47 ` Paul E. McKenney 2010-12-17 5:59 ` Eric Dumazet @ 2010-12-17 15:08 ` Minchan Kim 2010-12-17 16:03 ` Paul E. McKenney 1 sibling, 1 reply; 21+ messages in thread From: Minchan Kim @ 2010-12-17 15:08 UTC (permalink / raw) To: Paul E. McKenney Cc: gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Thu, Dec 16, 2010 at 09:47:22PM -0800, Paul E. McKenney wrote: > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer > > <gerald.schaefer@de.ibm.com> wrote: > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > > > below. Both cases are easily reproducible: memory unplug with big page cache, > > > or adding large pages during run-time. > > > > > > =================================================== > > > [ INFO: suspicious rcu_dereference_check() usage. ] > > > --------------------------------------------------- > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > > > other info that might help us debug this: > > > > > > > > > rcu_scheduler_active = 1, debug_locks = 0 > > > 1 lock held by bash/761: > > > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > > > > > stack backtrace: > > > CPU: 1 Not tainted 2.6.37-rc6 #4 > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > > > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > > > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > > > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > > > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > > > Call Trace: > > > ([<0000000000100b02>] show_trace+0xee/0x144) > > > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > > > [<0000000000226c80>] migrate_page+0x38/0x68 > > > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > > > [<000000000022785a>] migrate_pages+0x496/0x568 > > > [<000000000021e24e>] compact_zone+0x432/0x7d8 > > > [<000000000021e772>] compact_zone_order+0x9e/0xbc > > > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > > > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > > > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > > > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > > > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > > > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > > > [<00000000002a65be>] proc_sys_write+0x26/0x34 > > > [<00000000002336e0>] vfs_write+0xac/0x18c > > > [<00000000002338bc>] SyS_write+0x58/0xa8 > > > [<0000000000113976>] sysc_noemu+0x16/0x1c > > > [<0000020000162edc>] 0x20000162edc > > > INFO: lockdep is turned off. > > > > > > I honestly do not understand 100% why this is a false positive, seeing that > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > > > even if it is not necessary for synchronization, would get rid of the warning, > > > like in the following patch. Any ideas? > > > > In case of anon page, we hold rcu_read_lock in unmap_and_move. > > The problem is file-backed page. In case of that, we hold lock_page > > and mapping->tree_lock as update-side lock. > > So we don't need rcu_read_lock. > > > > > > > > --- > > > fs/hugetlbfs/inode.c | 2 ++ > > > mm/migrate.c | 4 ++++ > > > 2 files changed, 6 insertions(+) > > > > > > --- a/fs/hugetlbfs/inode.c > > > +++ b/fs/hugetlbfs/inode.c > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > > > { > > > int rc; > > > > > > + rcu_read_lock(); > > > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > if (rc) > > > return rc; > > > migrate_page_copy(newpage, page); > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > > > > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > > > > > + rcu_read_lock(); > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > > > > if (rc) > > > return rc; > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > > > > > head = page_buffers(page); > > > > > > + rcu_read_lock(); > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > > > > if (rc) > > > return rc; > > > > > > > > > > > > > > > How about this? > > Maybe Paul have better idea. > > (It's apparently be word-wrapped.) > > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > index ab2baa5..135af1e 100644 > > --- a/include/linux/radix-tree.h > > +++ b/include/linux/radix-tree.h > > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot) > > } > > > > /** > > + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > + * Returns: item that was stored in that slot with any direct pointer flag > > + * removed. > > + * > > + * This functions works like radix_tree_deref_slot except it doesn't check > > + * RCU rule. Normally this funcion is used with update-side lock. > > + * You should use this function very carefully. > > + */ > > +static inline void *radix_tree_deref_slot_nocheck(void **pslot) > > +{ > > + return rcu_dereference_protected(*pslot, 1); > > I suggest replacing the "1" with lockdep expressions for the locks > that you say might be held: It's not hard. > > return rcu_dereference_check(*pslot, > lockdep_is_held(&mapping->tree_lock)); rcu_dereference_check still pass rcu_read_lock_held check we don't want. I think rcu_dereference_protected is proper. Why I don't add lockdep expressions is radix_tree_deref_slot is general API. It might be used anywhere where it doesn't related to mapping->tree_lock. If we add argument 'mapping', it has a very strong dependency with address_space. so I decided making the function general and then caller must use it very carefully. But I am not strong in this point. > > This assumes that when you said "and" you meant both lock_page() and > mapping->tree_lock. Also you need to pass in the mapping, which > should not be a problem given likely inlining. > > If you meant that either mapping->tree_lock or page_lock() might be > held, I suppose that the page_lock() state could be passed in, but > perhaps better to take a general lockdep expression. > > So, either or both? ;-) > > Thanx, Paul I think either is okay. That's because remove_from_page_cache/__remove_from_page_cache needs both locks so we can't prevent update if we get a either lock. In code context, I think mapping->tree_lock is more readable since it is used near by. Thanks for the comment, Paul. -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 15:08 ` Minchan Kim @ 2010-12-17 16:03 ` Paul E. McKenney 0 siblings, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2010-12-17 16:03 UTC (permalink / raw) To: Minchan Kim Cc: gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Sat, Dec 18, 2010 at 12:08:27AM +0900, Minchan Kim wrote: > On Thu, Dec 16, 2010 at 09:47:22PM -0800, Paul E. McKenney wrote: > > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer > > > <gerald.schaefer@de.ibm.com> wrote: > > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > > > > below. Both cases are easily reproducible: memory unplug with big page cache, > > > > or adding large pages during run-time. > > > > > > > > =================================================== > > > > [ INFO: suspicious rcu_dereference_check() usage. ] > > > > --------------------------------------------------- > > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > rcu_scheduler_active = 1, debug_locks = 0 > > > > 1 lock held by bash/761: > > > > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > > > > > > > stack backtrace: > > > > CPU: 1 Not tainted 2.6.37-rc6 #4 > > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > > > > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > > > > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > > > > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > > > > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > > > > Call Trace: > > > > ([<0000000000100b02>] show_trace+0xee/0x144) > > > > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > > > > [<0000000000226c80>] migrate_page+0x38/0x68 > > > > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > > > > [<000000000022785a>] migrate_pages+0x496/0x568 > > > > [<000000000021e24e>] compact_zone+0x432/0x7d8 > > > > [<000000000021e772>] compact_zone_order+0x9e/0xbc > > > > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > > > > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > > > > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > > > > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > > > > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > > > > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > > > > [<00000000002a65be>] proc_sys_write+0x26/0x34 > > > > [<00000000002336e0>] vfs_write+0xac/0x18c > > > > [<00000000002338bc>] SyS_write+0x58/0xa8 > > > > [<0000000000113976>] sysc_noemu+0x16/0x1c > > > > [<0000020000162edc>] 0x20000162edc > > > > INFO: lockdep is turned off. > > > > > > > > I honestly do not understand 100% why this is a false positive, seeing that > > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > > > > even if it is not necessary for synchronization, would get rid of the warning, > > > > like in the following patch. Any ideas? > > > > > > In case of anon page, we hold rcu_read_lock in unmap_and_move. > > > The problem is file-backed page. In case of that, we hold lock_page > > > and mapping->tree_lock as update-side lock. > > > So we don't need rcu_read_lock. > > > > > > > > > > > --- > > > > fs/hugetlbfs/inode.c | 2 ++ > > > > mm/migrate.c | 4 ++++ > > > > 2 files changed, 6 insertions(+) > > > > > > > > --- a/fs/hugetlbfs/inode.c > > > > +++ b/fs/hugetlbfs/inode.c > > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > > > > { > > > > int rc; > > > > > > > > + rcu_read_lock(); > > > > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > > > > + rcu_read_unlock(); > > > > if (rc) > > > > return rc; > > > > migrate_page_copy(newpage, page); > > > > --- a/mm/migrate.c > > > > +++ b/mm/migrate.c > > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > > > > > > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > > > > > > > + rcu_read_lock(); > > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > > + rcu_read_unlock(); > > > > > > > > if (rc) > > > > return rc; > > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > > > > > > > head = page_buffers(page); > > > > > > > > + rcu_read_lock(); > > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > > + rcu_read_unlock(); > > > > > > > > if (rc) > > > > return rc; > > > > > > > > > > > > > > > > > > > > > How about this? > > > Maybe Paul have better idea. > > > (It's apparently be word-wrapped.) > > > > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > > index ab2baa5..135af1e 100644 > > > --- a/include/linux/radix-tree.h > > > +++ b/include/linux/radix-tree.h > > > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot) > > > } > > > > > > /** > > > + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check > > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > > + * Returns: item that was stored in that slot with any direct pointer flag > > > + * removed. > > > + * > > > + * This functions works like radix_tree_deref_slot except it doesn't check > > > + * RCU rule. Normally this funcion is used with update-side lock. > > > + * You should use this function very carefully. > > > + */ > > > +static inline void *radix_tree_deref_slot_nocheck(void **pslot) > > > +{ > > > + return rcu_dereference_protected(*pslot, 1); > > > > I suggest replacing the "1" with lockdep expressions for the locks > > that you say might be held: > > It's not hard. > > > > > return rcu_dereference_check(*pslot, > > lockdep_is_held(&mapping->tree_lock)); > > rcu_dereference_check still pass rcu_read_lock_held check we don't want. > I think rcu_dereference_protected is proper. You are exactly right. The only reason that I used rcu_dereference_check() instead of rcu_dereference_protected() is because I didn't realize that RCU readers never called this function. > Why I don't add lockdep expressions is radix_tree_deref_slot is general API. > It might be used anywhere where it doesn't related to mapping->tree_lock. > If we add argument 'mapping', it has a very strong dependency with address_space. > so I decided making the function general and then caller must use it very carefully. > But I am not strong in this point. I believe that this would be a good thing. > > This assumes that when you said "and" you meant both lock_page() and > > mapping->tree_lock. Also you need to pass in the mapping, which > > should not be a problem given likely inlining. > > > > If you meant that either mapping->tree_lock or page_lock() might be > > held, I suppose that the page_lock() state could be passed in, but > > perhaps better to take a general lockdep expression. > > > > So, either or both? ;-) > > > > Thanx, Paul > > I think either is okay. That's because remove_from_page_cache/__remove_from_page_cache > needs both locks so we can't prevent update if we get a either lock. > In code context, I think mapping->tree_lock is more readable since it is used near by. Good!!! So we only really need to check for one or the other. > Thanks for the comment, Paul. No problem! Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 0:04 ` Minchan Kim 2010-12-17 5:47 ` Paul E. McKenney @ 2010-12-17 8:39 ` Mel Gorman 2010-12-17 9:28 ` Mel Gorman ` (2 more replies) 1 sibling, 3 replies; 21+ messages in thread From: Mel Gorman @ 2010-12-17 8:39 UTC (permalink / raw) To: Minchan Kim Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer > <gerald.schaefer@de.ibm.com> wrote: > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > > below. Both cases are easily reproducible: memory unplug with big page cache, > > or adding large pages during run-time. > > > > =================================================== > > [ INFO: suspicious rcu_dereference_check() usage. ] > > --------------------------------------------------- > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > other info that might help us debug this: > > > > > > rcu_scheduler_active = 1, debug_locks = 0 > > 1 lock held by bash/761: > > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > > > stack backtrace: > > CPU: 1 Not tainted 2.6.37-rc6 #4 > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > > Call Trace: > > ([<0000000000100b02>] show_trace+0xee/0x144) > > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > > [<0000000000226c80>] migrate_page+0x38/0x68 > > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > > [<000000000022785a>] migrate_pages+0x496/0x568 > > [<000000000021e24e>] compact_zone+0x432/0x7d8 > > [<000000000021e772>] compact_zone_order+0x9e/0xbc > > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > > [<00000000002a65be>] proc_sys_write+0x26/0x34 > > [<00000000002336e0>] vfs_write+0xac/0x18c > > [<00000000002338bc>] SyS_write+0x58/0xa8 > > [<0000000000113976>] sysc_noemu+0x16/0x1c > > [<0000020000162edc>] 0x20000162edc > > INFO: lockdep is turned off. > > > > I honestly do not understand 100% why this is a false positive, seeing that > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > > even if it is not necessary for synchronization, would get rid of the warning, > > like in the following patch. Any ideas? > > In case of anon page, we hold rcu_read_lock in unmap_and_move. > The problem is file-backed page. In case of that, we hold lock_page > and mapping->tree_lock as update-side lock. > So we don't need rcu_read_lock. > > > > > --- > > fs/hugetlbfs/inode.c | 2 ++ > > mm/migrate.c | 4 ++++ > > 2 files changed, 6 insertions(+) > > > > --- a/fs/hugetlbfs/inode.c > > +++ b/fs/hugetlbfs/inode.c > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > > { > > int rc; > > > > + rcu_read_lock(); > > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > > + rcu_read_unlock(); > > if (rc) > > return rc; > > migrate_page_copy(newpage, page); > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > > > + rcu_read_lock(); > > rc = migrate_page_move_mapping(mapping, newpage, page); > > + rcu_read_unlock(); > > > > if (rc) > > return rc; > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > > > head = page_buffers(page); > > > > + rcu_read_lock(); > > rc = migrate_page_move_mapping(mapping, newpage, page); > > + rcu_read_unlock(); > > > > if (rc) > > return rc; > > > > > > > > > How about this? > Maybe Paul have better idea. > (It's apparently be word-wrapped.) > heh, I wrote a patch almost identical to this and ran it overnight for testing (test was a memory consumer running while a parallel process grew and shrunk the hugepage pool). It passes but that is hardly a surprise. We differed slightly in a number of respects though. > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index ab2baa5..135af1e 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot) > } > > /** > + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > + * Returns: item that was stored in that slot with any direct pointer flag > + * removed. > + * > + * This functions works like radix_tree_deref_slot except it doesn't check > + * RCU rule. Normally this funcion is used with update-side lock. > + * You should use this function very carefully. > + */ > +static inline void *radix_tree_deref_slot_nocheck(void **pslot) > +{ > + return rcu_dereference_protected(*pslot, 1); > +} For this, I had diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index ab2baa5..252d21c 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot) } /** + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held + * @pslot: pointer to slot, returned by radix_tree_lookup_slot + * Returns: item that was stored in that slot with any direct pointer flag + * removed. + * + * For use with radix_tree_lookup_slot(). Caller must hold tree read + * locked across slot lookup and dereference. Not required if write lock is + * held (ie. items cannot be concurrently inserted). + * + * radix_tree_deref_retry must be used to confirm validity of the pointer if + * only the read lock is held. + */ +static inline void *radix_tree_deref_slot_protected(void **pslot, + spinlock_t *treelock) +{ + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); +} + +/** * radix_tree_deref_retry - check radix_tree_deref_slot * @arg: pointer returned by radix_tree_deref_slot * Returns: 0 if retry is not required, otherwise retry is required In the documentation, I noted that the check might be without RCU but with the knowledge that it's protected by the tree lock. I'm not a RCU expert but this is only safe when you know there isn't a parallel updater and the treelock should be preventing that, right? Even so, other users of rcu_dereference_protected() check a lock condition which I used tree lock for. I intended to read through the rest of documentation properly this morning to determine if this was indeed the right approach. I used the name _protected instead of _nocheck because the dereference is still protected (by the tree lock) just not by RCU. Again, have to check the documentation to ensure this is correct. > +/** > * radix_tree_deref_retry - check radix_tree_deref_slot > * @arg: pointer returned by radix_tree_deref_slot > * Returns: 0 if retry is not required, otherwise retry is required > diff --git a/mm/migrate.c b/mm/migrate.c > index 2eb2243..5be2841 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct > address_space *mappin > > expected_count = 2 + page_has_private(page); > if (page_count(page) != expected_count || > - (struct page *)radix_tree_deref_slot(pslot) != page) { > + (struct page *)radix_tree_deref_slot_nocheck(pslot) > + != page) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } We only differed here by my passing in the &mapping->tree_lock > @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct > address_space *mapping, > > expected_count = 2 + page_has_private(page); > if (page_count(page) != expected_count || > - (struct page *)radix_tree_deref_slot(pslot) != page) { > + (struct page *)radix_tree_deref_slot_nocheck(pslot) > + != page) { > spin_unlock_irq(&mapping->tree_lock); > return -EAGAIN; > } -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 8:39 ` Mel Gorman @ 2010-12-17 9:28 ` Mel Gorman 2010-12-17 15:22 ` Minchan Kim 2010-12-17 15:13 ` Minchan Kim 2010-12-17 16:01 ` Paul E. McKenney 2 siblings, 1 reply; 21+ messages in thread From: Mel Gorman @ 2010-12-17 9:28 UTC (permalink / raw) To: Minchan Kim Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote: > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer > > <gerald.schaefer@de.ibm.com> wrote: > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > > > below. Both cases are easily reproducible: memory unplug with big page cache, > > > or adding large pages during run-time. > > > > > > =================================================== > > > [ INFO: suspicious rcu_dereference_check() usage. ] > > > --------------------------------------------------- > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > > > other info that might help us debug this: > > > > > > > > > rcu_scheduler_active = 1, debug_locks = 0 > > > 1 lock held by bash/761: > > > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > > > > > stack backtrace: > > > CPU: 1 Not tainted 2.6.37-rc6 #4 > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > > > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > > > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > > > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > > > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > > > Call Trace: > > > ([<0000000000100b02>] show_trace+0xee/0x144) > > > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > > > [<0000000000226c80>] migrate_page+0x38/0x68 > > > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > > > [<000000000022785a>] migrate_pages+0x496/0x568 > > > [<000000000021e24e>] compact_zone+0x432/0x7d8 > > > [<000000000021e772>] compact_zone_order+0x9e/0xbc > > > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > > > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > > > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > > > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > > > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > > > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > > > [<00000000002a65be>] proc_sys_write+0x26/0x34 > > > [<00000000002336e0>] vfs_write+0xac/0x18c > > > [<00000000002338bc>] SyS_write+0x58/0xa8 > > > [<0000000000113976>] sysc_noemu+0x16/0x1c > > > [<0000020000162edc>] 0x20000162edc > > > INFO: lockdep is turned off. > > > > > > I honestly do not understand 100% why this is a false positive, seeing that > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > > > even if it is not necessary for synchronization, would get rid of the warning, > > > like in the following patch. Any ideas? > > > > In case of anon page, we hold rcu_read_lock in unmap_and_move. > > The problem is file-backed page. In case of that, we hold lock_page > > and mapping->tree_lock as update-side lock. > > So we don't need rcu_read_lock. > > > > > > > > --- > > > fs/hugetlbfs/inode.c | 2 ++ > > > mm/migrate.c | 4 ++++ > > > 2 files changed, 6 insertions(+) > > > > > > --- a/fs/hugetlbfs/inode.c > > > +++ b/fs/hugetlbfs/inode.c > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > > > { > > > int rc; > > > > > > + rcu_read_lock(); > > > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > if (rc) > > > return rc; > > > migrate_page_copy(newpage, page); > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > > > > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > > > > > + rcu_read_lock(); > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > > > > if (rc) > > > return rc; > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > > > > > head = page_buffers(page); > > > > > > + rcu_read_lock(); > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > > > > if (rc) > > > return rc; > > > > > > > > > > > > > > > How about this? > > Maybe Paul have better idea. > > (It's apparently be word-wrapped.) > > > > heh, I wrote a patch almost identical to this and ran it overnight for testing > (test was a memory consumer running while a parallel process grew and shrunk > the hugepage pool). It passes but that is hardly a surprise. We differed > slightly in a number of respects though. > For completeness, this is what I tested last night. There are two "confirms" in the changelog that I intended to work out today but maybe someone can confirm faster. ==== CUT HERE ==== mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. The point of the RCU protection there is part of getting a stable reference to anon_vma and is only held for anon pages as file pages are locked which is sufficient protection against freeing. However, while a file page's mapping is being migrated, the radix tree is double checked to ensure it is the expected page. This uses radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held triggering the following warning. [ 173.674290] =================================================== [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] [ 173.676016] --------------------------------------------------- [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! [ 173.676016] [ 173.676016] other info that might help us debug this: [ 173.676016] [ 173.676016] [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 [ 173.676016] 1 lock held by hugeadm/2899: [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab [ 173.676016] [ 173.676016] stack backtrace: [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild [ 173.676016] Call Trace: [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa This patch introduces radix_tree_deref_slot_protected() which calls rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock that is protecting this dereference. Based on the locking hierarchy described in mm/filemap.c, holding the tree lock is protecting the radix tree from concurrent updaters in all cases (Confirm that no case has been missed). According to Documentation/RCU/lockdep.txt, if there is a guarantee that no parallel updaters exist, use of rcu_dereference_protected() is allowed (Confirm this is accurate?). Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/linux/radix-tree.h | 19 +++++++++++++++++++ mm/migrate.c | 4 ++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index ab2baa5..252d21c 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot) } /** + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held + * @pslot: pointer to slot, returned by radix_tree_lookup_slot + * Returns: item that was stored in that slot with any direct pointer flag + * removed. + * + * For use with radix_tree_lookup_slot(). Caller must hold tree read + * locked across slot lookup and dereference. Not required if write lock is + * held (ie. items cannot be concurrently inserted). + * + * radix_tree_deref_retry must be used to confirm validity of the pointer if + * only the read lock is held. + */ +static inline void *radix_tree_deref_slot_protected(void **pslot, + spinlock_t *treelock) +{ + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); +} + +/** * radix_tree_deref_retry - check radix_tree_deref_slot * @arg: pointer returned by radix_tree_deref_slot * Returns: 0 if retry is not required, otherwise retry is required diff --git a/mm/migrate.c b/mm/migrate.c index fe5a3c6..7d4686a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -244,7 +244,7 @@ static int migrate_page_move_mapping(struct address_space *mapping, expected_count = 2 + page_has_private(page); if (page_count(page) != expected_count || - (struct page *)radix_tree_deref_slot(pslot) != page) { + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } @@ -316,7 +316,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, expected_count = 2 + page_has_private(page); if (page_count(page) != expected_count || - (struct page *)radix_tree_deref_slot(pslot) != page) { + (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { spin_unlock_irq(&mapping->tree_lock); return -EAGAIN; } ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 9:28 ` Mel Gorman @ 2010-12-17 15:22 ` Minchan Kim 0 siblings, 0 replies; 21+ messages in thread From: Minchan Kim @ 2010-12-17 15:22 UTC (permalink / raw) To: Mel Gorman Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Fri, Dec 17, 2010 at 09:28:28AM +0000, Mel Gorman wrote: > On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote: > > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer > > > <gerald.schaefer@de.ibm.com> wrote: > > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > > > > below. Both cases are easily reproducible: memory unplug with big page cache, > > > > or adding large pages during run-time. > > > > > > > > =================================================== > > > > [ INFO: suspicious rcu_dereference_check() usage. ] > > > > --------------------------------------------------- > > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > rcu_scheduler_active = 1, debug_locks = 0 > > > > 1 lock held by bash/761: > > > > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > > > > > > > stack backtrace: > > > > CPU: 1 Not tainted 2.6.37-rc6 #4 > > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > > > > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > > > > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > > > > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > > > > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > > > > Call Trace: > > > > ([<0000000000100b02>] show_trace+0xee/0x144) > > > > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > > > > [<0000000000226c80>] migrate_page+0x38/0x68 > > > > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > > > > [<000000000022785a>] migrate_pages+0x496/0x568 > > > > [<000000000021e24e>] compact_zone+0x432/0x7d8 > > > > [<000000000021e772>] compact_zone_order+0x9e/0xbc > > > > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > > > > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > > > > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > > > > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > > > > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > > > > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > > > > [<00000000002a65be>] proc_sys_write+0x26/0x34 > > > > [<00000000002336e0>] vfs_write+0xac/0x18c > > > > [<00000000002338bc>] SyS_write+0x58/0xa8 > > > > [<0000000000113976>] sysc_noemu+0x16/0x1c > > > > [<0000020000162edc>] 0x20000162edc > > > > INFO: lockdep is turned off. > > > > > > > > I honestly do not understand 100% why this is a false positive, seeing that > > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > > > > even if it is not necessary for synchronization, would get rid of the warning, > > > > like in the following patch. Any ideas? > > > > > > In case of anon page, we hold rcu_read_lock in unmap_and_move. > > > The problem is file-backed page. In case of that, we hold lock_page > > > and mapping->tree_lock as update-side lock. > > > So we don't need rcu_read_lock. > > > > > > > > > > > --- > > > > fs/hugetlbfs/inode.c | 2 ++ > > > > mm/migrate.c | 4 ++++ > > > > 2 files changed, 6 insertions(+) > > > > > > > > --- a/fs/hugetlbfs/inode.c > > > > +++ b/fs/hugetlbfs/inode.c > > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > > > > { > > > > int rc; > > > > > > > > + rcu_read_lock(); > > > > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > > > > + rcu_read_unlock(); > > > > if (rc) > > > > return rc; > > > > migrate_page_copy(newpage, page); > > > > --- a/mm/migrate.c > > > > +++ b/mm/migrate.c > > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > > > > > > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > > > > > > > + rcu_read_lock(); > > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > > + rcu_read_unlock(); > > > > > > > > if (rc) > > > > return rc; > > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > > > > > > > head = page_buffers(page); > > > > > > > > + rcu_read_lock(); > > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > > + rcu_read_unlock(); > > > > > > > > if (rc) > > > > return rc; > > > > > > > > > > > > > > > > > > > > > How about this? > > > Maybe Paul have better idea. > > > (It's apparently be word-wrapped.) > > > > > > > heh, I wrote a patch almost identical to this and ran it overnight for testing > > (test was a memory consumer running while a parallel process grew and shrunk > > the hugepage pool). It passes but that is hardly a surprise. We differed > > slightly in a number of respects though. > > > > For completeness, this is what I tested last night. There are two "confirms" > in the changelog that I intended to work out today but maybe someone can > confirm faster. > > ==== CUT HERE ==== > mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration > > migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous > pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d. > The point of the RCU protection there is part of getting a stable reference > to anon_vma and is only held for anon pages as file pages are locked > which is sufficient protection against freeing. > > However, while a file page's mapping is being migrated, the radix tree > is double checked to ensure it is the expected page. This uses > radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held > triggering the following warning. > > [ 173.674290] =================================================== > [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ] > [ 173.676016] --------------------------------------------------- > [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > [ 173.676016] > [ 173.676016] other info that might help us debug this: > [ 173.676016] > [ 173.676016] > [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0 > [ 173.676016] 1 lock held by hugeadm/2899: > [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab > [ 173.676016] > [ 173.676016] stack backtrace: > [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild > [ 173.676016] Call Trace: > [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b > [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86 > [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab > [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39 > [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107 > [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107 > [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae > [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa > > This patch introduces radix_tree_deref_slot_protected() which calls > rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock > that is protecting this dereference. Based on the locking hierarchy described > in mm/filemap.c, holding the tree lock is protecting the radix tree from > concurrent updaters in all cases (Confirm that no case has been missed). > According to Documentation/RCU/lockdep.txt, if there is a guarantee that > no parallel updaters exist, use of rcu_dereference_protected() is allowed > (Confirm this is accurate?). > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > include/linux/radix-tree.h | 19 +++++++++++++++++++ > mm/migrate.c | 4 ++-- > 2 files changed, 21 insertions(+), 2 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index ab2baa5..252d21c 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot) > } > > /** > + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > + * Returns: item that was stored in that slot with any direct pointer flag > + * removed. > + * > + * For use with radix_tree_lookup_slot(). Caller must hold tree read > + * locked across slot lookup and dereference. Not required if write lock is > + * held (ie. items cannot be concurrently inserted). > + * > + * radix_tree_deref_retry must be used to confirm validity of the pointer if > + * only the read lock is held. > + */ > +static inline void *radix_tree_deref_slot_protected(void **pslot, > + spinlock_t *treelock) > +{ > + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); > +} It seems to be good than mine. Just a nitpick. Can't we get the mutex lock as update-side lock in future? -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 8:39 ` Mel Gorman 2010-12-17 9:28 ` Mel Gorman @ 2010-12-17 15:13 ` Minchan Kim 2010-12-17 16:01 ` Paul E. McKenney 2 siblings, 0 replies; 21+ messages in thread From: Minchan Kim @ 2010-12-17 15:13 UTC (permalink / raw) To: Mel Gorman Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote: > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer > > <gerald.schaefer@de.ibm.com> wrote: > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > > > below. Both cases are easily reproducible: memory unplug with big page cache, > > > or adding large pages during run-time. > > > > > > =================================================== > > > [ INFO: suspicious rcu_dereference_check() usage. ] > > > --------------------------------------------------- > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > > > other info that might help us debug this: > > > > > > > > > rcu_scheduler_active = 1, debug_locks = 0 > > > 1 lock held by bash/761: > > > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > > > > > stack backtrace: > > > CPU: 1 Not tainted 2.6.37-rc6 #4 > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > > > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > > > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > > > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > > > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > > > Call Trace: > > > ([<0000000000100b02>] show_trace+0xee/0x144) > > > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > > > [<0000000000226c80>] migrate_page+0x38/0x68 > > > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > > > [<000000000022785a>] migrate_pages+0x496/0x568 > > > [<000000000021e24e>] compact_zone+0x432/0x7d8 > > > [<000000000021e772>] compact_zone_order+0x9e/0xbc > > > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > > > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > > > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > > > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > > > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > > > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > > > [<00000000002a65be>] proc_sys_write+0x26/0x34 > > > [<00000000002336e0>] vfs_write+0xac/0x18c > > > [<00000000002338bc>] SyS_write+0x58/0xa8 > > > [<0000000000113976>] sysc_noemu+0x16/0x1c > > > [<0000020000162edc>] 0x20000162edc > > > INFO: lockdep is turned off. > > > > > > I honestly do not understand 100% why this is a false positive, seeing that > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > > > even if it is not necessary for synchronization, would get rid of the warning, > > > like in the following patch. Any ideas? > > > > In case of anon page, we hold rcu_read_lock in unmap_and_move. > > The problem is file-backed page. In case of that, we hold lock_page > > and mapping->tree_lock as update-side lock. > > So we don't need rcu_read_lock. > > > > > > > > --- > > > fs/hugetlbfs/inode.c | 2 ++ > > > mm/migrate.c | 4 ++++ > > > 2 files changed, 6 insertions(+) > > > > > > --- a/fs/hugetlbfs/inode.c > > > +++ b/fs/hugetlbfs/inode.c > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > > > { > > > int rc; > > > > > > + rcu_read_lock(); > > > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > if (rc) > > > return rc; > > > migrate_page_copy(newpage, page); > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > > > > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > > > > > + rcu_read_lock(); > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > > > > if (rc) > > > return rc; > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > > > > > head = page_buffers(page); > > > > > > + rcu_read_lock(); > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > > > > if (rc) > > > return rc; > > > > > > > > > > > > > > > How about this? > > Maybe Paul have better idea. > > (It's apparently be word-wrapped.) > > > > heh, I wrote a patch almost identical to this and ran it overnight for testing > (test was a memory consumer running while a parallel process grew and shrunk > the hugepage pool). It passes but that is hardly a surprise. We differed Good. > slightly in a number of respects though. > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > index ab2baa5..135af1e 100644 > > --- a/include/linux/radix-tree.h > > +++ b/include/linux/radix-tree.h > > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot) > > } > > > > /** > > + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > + * Returns: item that was stored in that slot with any direct pointer flag > > + * removed. > > + * > > + * This functions works like radix_tree_deref_slot except it doesn't check > > + * RCU rule. Normally this funcion is used with update-side lock. > > + * You should use this function very carefully. > > + */ > > +static inline void *radix_tree_deref_slot_nocheck(void **pslot) > > +{ > > + return rcu_dereference_protected(*pslot, 1); > > +} > > For this, I had > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index ab2baa5..252d21c 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot) > } > > /** > + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > + * Returns: item that was stored in that slot with any direct pointer flag > + * removed. > + * > + * For use with radix_tree_lookup_slot(). Caller must hold tree read > + * locked across slot lookup and dereference. Not required if write lock is > + * held (ie. items cannot be concurrently inserted). > + * > + * radix_tree_deref_retry must be used to confirm validity of the pointer if > + * only the read lock is held. > + */ > +static inline void *radix_tree_deref_slot_protected(void **pslot, > + spinlock_t *treelock) > +{ > + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); > +} > + > +/** > * radix_tree_deref_retry - check radix_tree_deref_slot > * @arg: pointer returned by radix_tree_deref_slot > * Returns: 0 if retry is not required, otherwise retry is required > > In the documentation, I noted that the check might be without RCU but with > the knowledge that it's protected by the tree lock. I'm not a RCU expert > but this is only safe when you know there isn't a parallel updater and the > treelock should be preventing that, right? I think so. > > Even so, other users of rcu_dereference_protected() check a lock condition > which I used tree lock for. I intended to read through the rest of > documentation properly this morning to determine if this was indeed the > right approach. > > I used the name _protected instead of _nocheck because the dereference > is still protected (by the tree lock) just not by RCU. Again, have to > check the documentation to ensure this is correct. I agree _proected naming is good. > > > +/** > > * radix_tree_deref_retry - check radix_tree_deref_slot > > * @arg: pointer returned by radix_tree_deref_slot > > * Returns: 0 if retry is not required, otherwise retry is required > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 2eb2243..5be2841 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct > > address_space *mappin > > > > expected_count = 2 + page_has_private(page); > > if (page_count(page) != expected_count || > > - (struct page *)radix_tree_deref_slot(pslot) != page) { > > + (struct page *)radix_tree_deref_slot_nocheck(pslot) > > + != page) { > > spin_unlock_irq(&mapping->tree_lock); > > return -EAGAIN; > > } > > We only differed here by my passing in the &mapping->tree_lock I will add the comment in your formal patch. > > > > > @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct > > address_space *mapping, > > > > expected_count = 2 + page_has_private(page); > > if (page_count(page) != expected_count || > > - (struct page *)radix_tree_deref_slot(pslot) != page) { > > + (struct page *)radix_tree_deref_slot_nocheck(pslot) > > + != page) { > > spin_unlock_irq(&mapping->tree_lock); > > return -EAGAIN; > > } > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! 2010-12-17 8:39 ` Mel Gorman 2010-12-17 9:28 ` Mel Gorman 2010-12-17 15:13 ` Minchan Kim @ 2010-12-17 16:01 ` Paul E. McKenney 2 siblings, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2010-12-17 16:01 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Andrew Morton, Heiko Carstens, Martin Schwidefsky On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote: > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote: > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer > > <gerald.schaefer@de.ibm.com> wrote: > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see > > > below. Both cases are easily reproducible: memory unplug with big page cache, > > > or adding large pages during run-time. > > > > > > =================================================== > > > [ INFO: suspicious rcu_dereference_check() usage. ] > > > --------------------------------------------------- > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! > > > > > > other info that might help us debug this: > > > > > > > > > rcu_scheduler_active = 1, debug_locks = 0 > > > 1 lock held by bash/761: > > > #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8 > > > > > > stack backtrace: > > > CPU: 1 Not tainted 2.6.37-rc6 #4 > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8) > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 > > > 00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa > > > 0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 > > > 000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 > > > 0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 > > > Call Trace: > > > ([<0000000000100b02>] show_trace+0xee/0x144) > > > [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8 > > > [<0000000000226c80>] migrate_page+0x38/0x68 > > > [<0000000000226d9a>] move_to_new_page+0xea/0x2bc > > > [<000000000022785a>] migrate_pages+0x496/0x568 > > > [<000000000021e24e>] compact_zone+0x432/0x7d8 > > > [<000000000021e772>] compact_zone_order+0x9e/0xbc > > > [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c > > > [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64 > > > [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c > > > [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac > > > [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc > > > [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c > > > [<00000000002a65be>] proc_sys_write+0x26/0x34 > > > [<00000000002336e0>] vfs_write+0xac/0x18c > > > [<00000000002338bc>] SyS_write+0x58/0xa8 > > > [<0000000000113976>] sysc_noemu+0x16/0x1c > > > [<0000020000162edc>] 0x20000162edc > > > INFO: lockdep is turned off. > > > > > > I honestly do not understand 100% why this is a false positive, seeing that > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths, > > > even if it is not necessary for synchronization, would get rid of the warning, > > > like in the following patch. Any ideas? > > > > In case of anon page, we hold rcu_read_lock in unmap_and_move. > > The problem is file-backed page. In case of that, we hold lock_page > > and mapping->tree_lock as update-side lock. > > So we don't need rcu_read_lock. > > > > > > > > --- > > > fs/hugetlbfs/inode.c | 2 ++ > > > mm/migrate.c | 4 ++++ > > > 2 files changed, 6 insertions(+) > > > > > > --- a/fs/hugetlbfs/inode.c > > > +++ b/fs/hugetlbfs/inode.c > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct > > > { > > > int rc; > > > > > > + rcu_read_lock(); > > > rc = migrate_huge_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > if (rc) > > > return rc; > > > migrate_page_copy(newpage, page); > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m > > > > > > BUG_ON(PageWriteback(page)); /* Writeback must be complete */ > > > > > > + rcu_read_lock(); > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > > > > if (rc) > > > return rc; > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s > > > > > > head = page_buffers(page); > > > > > > + rcu_read_lock(); > > > rc = migrate_page_move_mapping(mapping, newpage, page); > > > + rcu_read_unlock(); > > > > > > if (rc) > > > return rc; > > > > > > > > > > > > > > > How about this? > > Maybe Paul have better idea. > > (It's apparently be word-wrapped.) > > > > heh, I wrote a patch almost identical to this and ran it overnight for testing > (test was a memory consumer running while a parallel process grew and shrunk > the hugepage pool). It passes but that is hardly a surprise. We differed > slightly in a number of respects though. > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > index ab2baa5..135af1e 100644 > > --- a/include/linux/radix-tree.h > > +++ b/include/linux/radix-tree.h > > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot) > > } > > > > /** > > + * radix_tree_deref_slot_nocheck - dereference a slot without RCU check > > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > > + * Returns: item that was stored in that slot with any direct pointer flag > > + * removed. > > + * > > + * This functions works like radix_tree_deref_slot except it doesn't check > > + * RCU rule. Normally this funcion is used with update-side lock. > > + * You should use this function very carefully. > > + */ > > +static inline void *radix_tree_deref_slot_nocheck(void **pslot) > > +{ > > + return rcu_dereference_protected(*pslot, 1); > > +} > > For this, I had > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index ab2baa5..252d21c 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot) > } > > /** > + * radix_tree_deref_slot_protected - dereference a slot without RCU lock but with tree lock held > + * @pslot: pointer to slot, returned by radix_tree_lookup_slot > + * Returns: item that was stored in that slot with any direct pointer flag > + * removed. > + * > + * For use with radix_tree_lookup_slot(). Caller must hold tree read > + * locked across slot lookup and dereference. Not required if write lock is > + * held (ie. items cannot be concurrently inserted). > + * > + * radix_tree_deref_retry must be used to confirm validity of the pointer if > + * only the read lock is held. > + */ > +static inline void *radix_tree_deref_slot_protected(void **pslot, > + spinlock_t *treelock) > +{ > + return rcu_dereference_protected(*pslot, lockdep_is_held(treelock)); > +} > + > +/** > * radix_tree_deref_retry - check radix_tree_deref_slot > * @arg: pointer returned by radix_tree_deref_slot > * Returns: 0 if retry is not required, otherwise retry is required > > In the documentation, I noted that the check might be without RCU but with > the knowledge that it's protected by the tree lock. I'm not a RCU expert > but this is only safe when you know there isn't a parallel updater and the > treelock should be preventing that, right? Yes, if you have prevented updates from happening, then it is OK to use rcu_dereference_protected() instead of rcu_dereference_check(). However, if RCU readers can invoke this code, things will break. By the way, this means that something like: rcu_dereference_protected(p, rcu_read_lock_held()) /* BUGGY!!! */ is just plain wrong. > Even so, other users of rcu_dereference_protected() check a lock condition > which I used tree lock for. I intended to read through the rest of > documentation properly this morning to determine if this was indeed the > right approach. > > I used the name _protected instead of _nocheck because the dereference > is still protected (by the tree lock) just not by RCU. Again, have to > check the documentation to ensure this is correct. Yep, this is indeed the case that rcu_dereference_protected() is for. > > +/** > > * radix_tree_deref_retry - check radix_tree_deref_slot > > * @arg: pointer returned by radix_tree_deref_slot > > * Returns: 0 if retry is not required, otherwise retry is required > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 2eb2243..5be2841 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct > > address_space *mappin > > > > expected_count = 2 + page_has_private(page); > > if (page_count(page) != expected_count || > > - (struct page *)radix_tree_deref_slot(pslot) != page) { > > + (struct page *)radix_tree_deref_slot_nocheck(pslot) > > + != page) { > > spin_unlock_irq(&mapping->tree_lock); > > return -EAGAIN; > > } > > We only differed here by my passing in the &mapping->tree_lock Which should be optimized away during inlining, so no performance penalty. ;-) Thanx, Paul > > @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct > > address_space *mapping, > > > > expected_count = 2 + page_has_private(page); > > if (page_count(page) != expected_count || > > - (struct page *)radix_tree_deref_slot(pslot) != page) { > > + (struct page *)radix_tree_deref_slot_nocheck(pslot) > > + != page) { > > spin_unlock_irq(&mapping->tree_lock); > > return -EAGAIN; > > } > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage 2010-11-21 17:37 ` Ted Ts'o 2010-11-22 0:38 ` Minchan Kim @ 2010-11-23 7:16 ` KOSAKI Motohiro 1 sibling, 0 replies; 21+ messages in thread From: KOSAKI Motohiro @ 2010-11-23 7:16 UTC (permalink / raw) To: Ted Ts'o, Minchan Kim, linux-ext4, linux-mm, linux-kernel, Andreas Dilger <adi Cc: kosaki.motohiro > On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote: > > > > I think it's no problem. > > > > That's because migration always holds lock_page on the file page. > > So the page couldn't remove from radix. > > It may be "ok" in that it won't cause a race, but it still leaves an > unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will > cause /proc_lock_stat to be disabled. So I think it still needs to be > fixed by adding rcu_read_lock()/rcu_read_unlock() to > migrate_page_move_mapping(). Hi Ted, Current mmotm has following patch and I think it should be fixed your issue. Thanks. From: Zeng Zhaoming <zengzm.kernel@gmail.com> find_task_by_vpid() should be protected by rcu_read_lock(), to prevent free_pid() reclaiming pid. Signed-off-by: Zeng Zhaoming <zengzm.kernel@gmail.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/mempolicy.c | 3 +++ 1 file changed, 3 insertions(+) diff -puN mm/mempolicy.c~mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure mm/mempolicy.c --- a/mm/mempolicy.c~mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure +++ a/mm/mempolicy.c @@ -1307,15 +1307,18 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pi goto out; /* Find the mm_struct */ + rcu_read_lock(); read_lock(&tasklist_lock); task = pid ? find_task_by_vpid(pid) : current; if (!task) { read_unlock(&tasklist_lock); + rcu_read_unlock(); err = -ESRCH; goto out; } mm = get_task_mm(task); read_unlock(&tasklist_lock); + rcu_read_unlock(); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2010-12-17 16:04 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-21 11:26 [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage Arun Bhanu 2010-11-21 13:30 ` Ted Ts'o 2010-11-21 15:39 ` Minchan Kim 2010-11-21 17:37 ` Ted Ts'o 2010-11-22 0:38 ` Minchan Kim 2010-11-22 3:31 ` Milton Miller 2010-11-22 6:16 ` Paul E. McKenney 2010-12-07 19:01 ` [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! Gerald Schaefer 2010-12-08 1:19 ` KAMEZAWA Hiroyuki 2010-12-16 13:50 ` Gerald Schaefer 2010-12-17 0:04 ` Minchan Kim 2010-12-17 5:47 ` Paul E. McKenney 2010-12-17 5:59 ` Eric Dumazet 2010-12-17 15:08 ` Minchan Kim 2010-12-17 16:03 ` Paul E. McKenney 2010-12-17 8:39 ` Mel Gorman 2010-12-17 9:28 ` Mel Gorman 2010-12-17 15:22 ` Minchan Kim 2010-12-17 15:13 ` Minchan Kim 2010-12-17 16:01 ` Paul E. McKenney 2010-11-23 7:16 ` [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage KOSAKI Motohiro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).