From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mel Gorman Subject: Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Date: Thu, 5 Jan 2012 22:21:16 +0000 Message-ID: <20120105222116.GF27881@csn.ul.ie> References: <1325499859-2262-1-git-send-email-gilad@benyossef.com> <1325499859-2262-8-git-send-email-gilad@benyossef.com> <4F033EC9.4050909@gmail.com> <20120105142017.GA27881@csn.ul.ie> <20120105144011.GU11810@n2100.arm.linux.org.uk> <20120105161739.GD27881@csn.ul.ie> <20120105163529.GA11810@n2100.arm.linux.org.uk> <20120105183504.GF2393@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Cc: Russell King - ARM Linux , KOSAKI Motohiro , Gilad Ben-Yossef , linux-kernel@vger.kernel.org, Chris Metcalf , Peter Zijlstra , Frederic Weisbecker , linux-mm@kvack.org, Pekka Enberg , Matt Mackall , Sasha Levin , Rik van Riel , Andi Kleen , Andrew Morton , Alexander Viro , Greg KH , linux-fsdevel@vger.kernel.org, Avi Kivity To: "Paul E. McKenney" Return-path: Content-Disposition: inline In-Reply-To: <20120105183504.GF2393@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org (Adding Greg to cc to see if he recalls seeing issues with sysfs dentry suffering from recursive locking recently) On Thu, Jan 05, 2012 at 10:35:04AM -0800, Paul E. McKenney wrote: > On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote: > > On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote: > > > Link please? > > > > Forwarded, as its still in my mailbox. > > > > > I'm including a patch below under development that is > > > intended to only cope with the page allocator case under heavy memory > > > pressure. Currently it does not pass testing because eventually RCU > > > gets stalled with the following trace > > > > > > [ 1817.176001] [] arch_trigger_all_cpu_backtrace+0x87/0xa0 > > > [ 1817.176001] [] __rcu_pending+0x149/0x260 > > > [ 1817.176001] [] rcu_check_callbacks+0x5f/0x110 > > > [ 1817.176001] [] update_process_times+0x3f/0x80 > > > [ 1817.176001] [] tick_sched_timer+0x5b/0xc0 > > > [ 1817.176001] [] __run_hrtimer+0xbe/0x1a0 > > > [ 1817.176001] [] hrtimer_interrupt+0xc1/0x1e0 > > > [ 1817.176001] [] smp_apic_timer_interrupt+0x63/0xa0 > > > [ 1817.176001] [] apic_timer_interrupt+0x13/0x20 > > > [ 1817.176001] [] vfsmount_lock_local_lock+0x25/0x30 > > > [ 1817.176001] [] path_init+0x2d5/0x370 > > > [ 1817.176001] [] path_lookupat+0x2d/0x620 > > > [ 1817.176001] [] do_path_lookup+0x2f/0xd0 > > > [ 1817.176001] [] user_path_at_empty+0x9f/0xd0 > > > [ 1817.176001] [] vfs_fstatat+0x4b/0x90 > > > [ 1817.176001] [] sys_newlstat+0x1f/0x50 > > > [ 1817.176001] [] system_call_fastpath+0x16/0x1b > > > > > > It might be a separate bug, don't know for sure. > I rebased the patch on top of 3.2 and tested again with a bunch of debugging options set (PROVE_RCU, PROVE_LOCKING etc). Same results. CPU hotplug is a lot more reliable and less likely to hang but eventually gets into trouble. Taking a closer look though, I don't think this is an RCU problem. It's just the messenger. > Do you get multiple RCU CPU stall-warning messages? Yes, one roughly every 50000 jiffies or so (HZ=250). [ 878.315029] INFO: rcu_sched detected stall on CPU 3 (t=16250 jiffies) [ 878.315032] INFO: rcu_sched detected stall on CPU 6 (t=16250 jiffies) [ 1072.878669] INFO: rcu_sched detected stall on CPU 3 (t=65030 jiffies) [ 1072.878672] INFO: rcu_sched detected stall on CPU 6 (t=65030 jiffies) [ 1267.442308] INFO: rcu_sched detected stall on CPU 3 (t=113810 jiffies) [ 1267.442312] INFO: rcu_sched detected stall on CPU 6 (t=113810 jiffies) [ 1462.005948] INFO: rcu_sched detected stall on CPU 3 (t=162590 jiffies) [ 1462.005952] INFO: rcu_sched detected stall on CPU 6 (t=162590 jiffies) [ 1656.569588] INFO: rcu_sched detected stall on CPU 3 (t=211370 jiffies) [ 1656.569592] INFO: rcu_sched detected stall on CPU 6 (t=211370 jiffies) [ 1851.133229] INFO: rcu_sched detected stall on CPU 6 (t=260150 jiffies) [ 1851.133233] INFO: rcu_sched detected stall on CPU 3 (t=260150 jiffies) [ 2045.696868] INFO: rcu_sched detected stall on CPU 3 (t=308930 jiffies) [ 2045.696872] INFO: rcu_sched detected stall on CPU 6 (t=308930 jiffies) [ 2240.260508] INFO: rcu_sched detected stall on CPU 6 (t=357710 jiffies) [ 2240.260511] INFO: rcu_sched detected stall on CPU 3 (t=357710 jiffies) > If so, it can > be helpful to look at how the stack frame changes over time. These > stalls are normally caused by a loop in the kernel with preemption > disabled, though other scenarios can also cause them. > The stacks are not changing much over time and start with this; [ 878.315029] INFO: rcu_sched detected stall on CPU 3 (t=16250 jiffies) [ 878.315032] INFO: rcu_sched detected stall on CPU 6 (t=16250 jiffies) [ 878.315036] Pid: 4422, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2 [ 878.315037] Call Trace: [ 878.315038] [] __rcu_pending+0x8e/0x36c [ 878.315052] [] ? tick_nohz_handler+0xdc/0xdc [ 878.315054] [] rcu_check_callbacks+0x106/0x172 [ 878.315056] [] update_process_times+0x3f/0x76 [ 878.315058] [] tick_sched_timer+0x70/0x9a [ 878.315060] [] __run_hrtimer+0xc7/0x157 [ 878.315062] [] hrtimer_interrupt+0xba/0x18a [ 878.315065] [] smp_apic_timer_interrupt+0x86/0x99 [ 878.315067] [] apic_timer_interrupt+0x73/0x80 [ 878.315068] [] ? retint_restore_args+0x13/0x13 [ 878.315072] [] ? __shrink_dcache_sb+0x7d/0x19f [ 878.315075] [] ? native_read_tsc+0x1/0x16 [ 878.315077] [] ? delay_tsc+0x3a/0x82 [ 878.315079] [] __delay+0xf/0x11 [ 878.315081] [] do_raw_spin_lock+0xb5/0xf9 [ 878.315083] [] _raw_spin_lock+0x39/0x3d [ 878.315085] [] ? shrink_dcache_parent+0x77/0x28c [ 878.315087] [] shrink_dcache_parent+0x77/0x28c [ 878.315089] [] ? have_submounts+0x13e/0x1bd [ 878.315092] [] sysfs_dentry_revalidate+0xaa/0xbe [ 878.315093] [] do_lookup+0x263/0x2fc [ 878.315096] [] ? security_inode_permission+0x1e/0x20 [ 878.315098] [] link_path_walk+0x1e2/0x763 [ 878.315099] [] path_lookupat+0x5c/0x61a [ 878.315102] [] ? might_fault+0x89/0x8d [ 878.315104] [] ? might_fault+0x40/0x8d [ 878.315105] [] do_path_lookup+0x2a/0xa8 [ 878.315107] [] user_path_at_empty+0x5d/0x97 [ 878.315109] [] ? trace_hardirqs_off+0xd/0xf [ 878.315111] [] ? _raw_spin_unlock_irqrestore+0x44/0x5a [ 878.315112] [] user_path_at+0x11/0x13 [ 878.315115] [] vfs_fstatat+0x44/0x71 [ 878.315117] [] vfs_lstat+0x1e/0x20 [ 878.315118] [] sys_newlstat+0x1f/0x40 [ 878.315120] [] ? trace_hardirqs_on_caller+0x12d/0x164 [ 878.315122] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 878.315124] [] ? trace_hardirqs_off+0xd/0xf [ 878.315126] [] system_call_fastpath+0x16/0x1b [ 878.557790] Pid: 5704, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2 [ 878.564226] Call Trace: [ 878.566677] [] __rcu_pending+0x8e/0x36c [ 878.572783] [] ? tick_nohz_handler+0xdc/0xdc [ 878.578702] [] rcu_check_callbacks+0x106/0x172 [ 878.584794] [] update_process_times+0x3f/0x76 [ 878.590798] [] tick_sched_timer+0x70/0x9a [ 878.596459] [] __run_hrtimer+0xc7/0x157 [ 878.601944] [] hrtimer_interrupt+0xba/0x18a [ 878.607778] [] smp_apic_timer_interrupt+0x86/0x99 [ 878.614129] [] apic_timer_interrupt+0x73/0x80 [ 878.620134] [] ? run_timer_softirq+0x49/0x32a [ 878.626759] [] ? __shrink_dcache_sb+0x7d/0x19f [ 878.632851] [] ? delay_tsc+0x8/0x82 [ 878.637988] [] __delay+0xf/0x11 [ 878.642778] [] do_raw_spin_lock+0xb5/0xf9 [ 878.648437] [] _raw_spin_lock+0x39/0x3d [ 878.653920] [] ? shrink_dcache_parent+0x77/0x28c [ 878.660186] [] shrink_dcache_parent+0x77/0x28c [ 878.666277] [] ? have_submounts+0x13e/0x1bd [ 878.672107] [] sysfs_dentry_revalidate+0xaa/0xbe [ 878.678372] [] do_lookup+0x263/0x2fc [ 878.683596] [] ? security_inode_permission+0x1e/0x20 [ 878.690207] [] link_path_walk+0x1e2/0x763 [ 878.695866] [] path_lookupat+0x5c/0x61a [ 878.701350] [] ? might_fault+0x89/0x8d [ 878.706747] [] ? might_fault+0x40/0x8d [ 878.712145] [] do_path_lookup+0x2a/0xa8 [ 878.717630] [] user_path_at_empty+0x5d/0x97 [ 878.723463] [] ? trace_hardirqs_off+0xd/0xf [ 878.729295] [] ? _raw_spin_unlock_irqrestore+0x44/0x5a [ 878.736080] [] user_path_at+0x11/0x13 [ 878.741391] [] vfs_fstatat+0x44/0x71 [ 878.746616] [] vfs_lstat+0x1e/0x20 [ 878.751668] [] sys_newlstat+0x1f/0x40 [ 878.756981] [] ? trace_hardirqs_on_caller+0x12d/0x164 [ 878.763678] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 878.770116] [] ? trace_hardirqs_off+0xd/0xf [ 878.775949] [] system_call_fastpath+0x16/0x1b [ 908.769486] BUG: spinlock lockup on CPU#6, udevd/4422 [ 908.774547] lock: ffff8803b4c701c8, .magic: dead4ead, .owner: udevd/5709, .owner_cpu: 4 Seeing that the owner was CPU 4, I found earlier in the log [ 815.244051] BUG: spinlock lockup on CPU#4, udevd/5709 [ 815.249103] lock: ffff8803b4c701c8, .magic: dead4ead, .owner: udevd/5709, .owner_cpu: 4 [ 815.258430] Pid: 5709, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2 [ 815.264866] Call Trace: [ 815.267329] [] spin_dump+0x88/0x8d [ 815.272388] [] do_raw_spin_lock+0xd6/0xf9 [ 815.278062] [] ? _raw_spin_lock+0x39/0x3d [ 815.283720] [] ? shrink_dcache_parent+0x77/0x28c [ 815.289986] [] ? shrink_dcache_parent+0x77/0x28c [ 815.296249] [] ? have_submounts+0x13e/0x1bd [ 815.302080] [] ? sysfs_dentry_revalidate+0xaa/0xbe [ 815.308515] [] ? do_lookup+0x263/0x2fc [ 815.313915] [] ? security_inode_permission+0x1e/0x20 [ 815.320524] [] ? link_path_walk+0x1e2/0x763 [ 815.326357] [] ? path_lookupat+0x5c/0x61a [ 815.332014] [] ? might_fault+0x89/0x8d [ 815.337410] [] ? might_fault+0x40/0x8d [ 815.342807] [] ? do_path_lookup+0x2a/0xa8 [ 815.348465] [] ? user_path_at_empty+0x5d/0x97 [ 815.354474] [] ? trace_hardirqs_off+0xd/0xf [ 815.360303] [] ? _raw_spin_unlock_irqrestore+0x44/0x5a [ 815.367085] [] ? user_path_at+0x11/0x13 [ 815.372569] [] ? vfs_fstatat+0x44/0x71 [ 815.377965] [] ? vfs_lstat+0x1e/0x20 [ 815.383192] [] ? sys_newlstat+0x1f/0x40 [ 815.388676] [] ? trace_hardirqs_on_caller+0x12d/0x164 [ 815.395373] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 815.401811] [] ? trace_hardirqs_off+0xd/0xf [ 815.407642] [] ? system_call_fastpath+0x16/0x1b The trace is not particularly useful but it looks like it recursively locked even though the message doesn't say that. If the shrink_dcache_parent() entry is accurate, that corresponds to this static int select_parent(struct dentry * parent) { struct dentry *this_parent; struct list_head *next; unsigned seq; int found = 0; int locked = 0; seq = read_seqbegin(&rename_lock); again: this_parent = parent; spin_lock(&this_parent->d_lock); <----- HERE I'm not overly clear on how VFS locking is meant to work but it almost looks as if the last reference to an inode is being dropped during a sysfs path lookup. Is that meant to happen? Judging by sysfs_dentry_revalidate() - possibly not. It looks like we must have reached out_bad: and called shrink_dcache_parent() on a dentry that was already locked by the running process. Not sure how this could have happened - Greg, does this look familiar? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org