* XFS / writeback invoking soft lockup. @ 2013-12-13 7:14 Dave Jones 2013-12-13 10:48 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Dave Jones @ 2013-12-13 7:14 UTC (permalink / raw) To: Linux Kernel; +Cc: xfs I can hit this pretty reliably on one of my slower test machines. (8gb ram, 1 slow sata disk) the machine is pretty responsive, and recovers after a while. anything we can do to shut it up ? BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u8:2:8479] Modules linked in: rfcomm l2tp_ppp l2tp_netlink l2tp_core scsi_transport_iscsi can_bcm nfc caif_socket caif af_802154 phonet af_rxrpc bluetooth can_raw can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key rose x25 atm netrom appletalk ipx p8023 p8022 psnap llc ax25 cfg80211 rfkill nouveau video backlight mxm_wmi wmi i2c_algo_bit ttm drm_kms_helper tg3 drm kvm_intel kvm ptp i2c_core pps_core libphy lpc_ich mfd_core serio_raw pcspkr microcode parport_pc parport shpchp floppy rtc_cmos xfs libcrc32c raid0 irq event stamp: 510442980 hardirqs last enabled at (510442979): [<c15c47d9>] _raw_spin_unlock_irq+0x27/0x46 hardirqs last disabled at (510442980): [<c15c5667>] apic_timer_interrupt+0x2f/0x3c softirqs last enabled at (510441282): [<c1045d01>] __do_softirq+0x14b/0x38d softirqs last disabled at (510441277): [<c100418b>] do_softirq_own_stack+0x2e/0x34 CPU: 2 PID: 8479 Comm: kworker/u8:2 Not tainted 3.13.0-rc3+ #1 Hardware name: Dell Inc. Precision WorkStation 490 /0DT031, BIOS A08 04/25/2008 Workqueue: writeback bdi_writeback_workfn (flush-9:127) task: e9e458c0 ti: e4214000 task.ti: e4214000 EIP: 0060:[<c112f797>] EFLAGS: 00000286 CPU: 2 EIP is at lru_add_drain_cpu+0x19/0xc0 EAX: c1a988c0 EBX: 00000002 ECX: 00000001 EDX: 2ff6c000 ESI: 00000000 EDI: 00000001 EBP: e4215c54 ESP: e4215c48 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 29ed5000 CR4: 000007f0 Stack: e4215cb8 00000000 00000001 e4215c5c c112f8f8 e4215c68 c112f934 00000001 e4215d04 c112baba 00000000 0000000e 00000000 00000000 00000001 00000000 c1a9c980 f1a08980 c112ad83 00000000 e4215da4 00000067 00000003 00000001 Call Trace: [<c112f8f8>] lru_add_drain+0x1c/0x39 [<c112f934>] __pagevec_release+0x10/0x26 [<c112baba>] write_cache_pages+0x2f9/0x486 [<c112ad83>] ? global_dirtyable_memory+0x73/0x73 [<c108c940>] ? get_lock_stats+0x1b/0x43 [<c108cf5c>] ? put_lock_stats.isra.30+0xd/0x22 [<c108d3a2>] ? lock_release_holdtime.part.31+0xa0/0xd5 [<c112bc7e>] generic_writepages+0x37/0x51 [<f8616158>] xfs_vm_writepages+0x3a/0x42 [xfs] [<c112d1d9>] do_writepages+0x1d/0x2c [<c11a86ff>] __writeback_single_inode+0x3f/0x4d7 [<c11a8f55>] ? writeback_sb_inodes+0x178/0x3d9 [<c11a8fb2>] writeback_sb_inodes+0x1d5/0x3d9 [<c11a9235>] __writeback_inodes_wb+0x7f/0xae [<c11a94ad>] wb_writeback+0x249/0x4c5 [<c11a9bed>] bdi_writeback_workfn+0x321/0x594 [<c105c9e7>] process_one_work+0x1ad/0x5ff [<c105c9af>] ? process_one_work+0x175/0x5ff [<c105cf2f>] worker_thread+0xf6/0x31b [<c105ce39>] ? process_one_work+0x5ff/0x5ff [<c106398e>] kthread+0xb5/0xc8 [<c108f8aa>] ? trace_hardirqs_on+0xb/0xd [<c15cbe37>] ret_from_kernel_thread+0x1b/0x28 [<c10638d9>] ? insert_kthread_work+0x63/0x63 Code: e8 41 50 49 00 83 c4 04 5b 5e 5f 5d c3 e8 25 ad 48 00 55 89 e5 57 56 53 66 66 66 66 90 89 c3 b8 c0 88 a9 c1 8b 14 9d 80 33 8b c1 <01> d0 8b 08 85 c9 75 74 b8 80 88 a9 c1 8d 34 02 8b 3e 85 ff 75 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XFS / writeback invoking soft lockup. 2013-12-13 7:14 XFS / writeback invoking soft lockup Dave Jones @ 2013-12-13 10:48 ` Dave Chinner 2013-12-13 11:16 ` Christoph Hellwig 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2013-12-13 10:48 UTC (permalink / raw) To: Dave Jones, Linux Kernel, xfs On Fri, Dec 13, 2013 at 02:14:07AM -0500, Dave Jones wrote: > I can hit this pretty reliably on one of my slower test machines. > (8gb ram, 1 slow sata disk) > > the machine is pretty responsive, and recovers after a while. > anything we can do to shut it up ? Actually, I think this indicates a problem. > BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u8:2:8479] ... > Call Trace: > [<c112f8f8>] lru_add_drain+0x1c/0x39 > [<c112f934>] __pagevec_release+0x10/0x26 > [<c112baba>] write_cache_pages+0x2f9/0x486 That code in write_cache_pages(): 1907 while (!done && (index <= end)) { 1908 int i; 1909 1910 nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, 1911 min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1) 1912 if (nr_pages == 0) 1913 break; 1914 1915 for (i = 0; i < nr_pages; i++) { 1916 struct page *page = pvec.pages[i]; .... .... 2001 } 2002 pagevec_release(&pvec); 2003 cond_resched(); 2004 } So after all the pages in a pagevec are processed, we release the CPU before we grab the next pagevec. This softlockup implies we have been processing this pagevec for 22s. That tells me the code is actually stuck spinning on something, not that this is a false positive. i.e. it should not take 22s to process 14 pages. [ Yes, I know XFS can process more than that ->writepage, but it's still only a millisecond of work if it doesn't block on anything. And it can't be blocking, otherwise we wouldn't be firing the softlockup warning. ] The page cache LRU code is a maze of twisty per-cpu passages that go deep into the mm subsystem and memcg code - I'm not really sure what all that code is doing, so you'll probably have to ask someone who knows about that code. All I can say is that there doesn't look to be any obvious signs that this is a XFS or writeback problem fom the stack trace, and without more information or a reproducable test case I'm not going to be able to understand the cause. Is the problem reproducable, or is it just a one-off? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XFS / writeback invoking soft lockup. 2013-12-13 10:48 ` Dave Chinner @ 2013-12-13 11:16 ` Christoph Hellwig 2013-12-13 16:22 ` Dave Jones 0 siblings, 1 reply; 5+ messages in thread From: Christoph Hellwig @ 2013-12-13 11:16 UTC (permalink / raw) To: Dave Chinner; +Cc: Dave Jones, Linux Kernel, xfs On Fri, Dec 13, 2013 at 09:48:53PM +1100, Dave Chinner wrote: > All I can say is that there doesn't look to be any obvious signs > that this is a XFS or writeback problem fom the stack trace, and > without more information or a reproducable test case I'm not going > to be able to understand the cause. > > Is the problem reproducable, or is it just a one-off? If it's reproducable it would be good to enable the xfs_writepage tracepoint and get a trace-cmd log of it. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XFS / writeback invoking soft lockup. 2013-12-13 11:16 ` Christoph Hellwig @ 2013-12-13 16:22 ` Dave Jones 2013-12-14 8:20 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Dave Jones @ 2013-12-13 16:22 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Dave Chinner, Linux Kernel, xfs On Fri, Dec 13, 2013 at 03:16:39AM -0800, Christoph Hellwig wrote: > On Fri, Dec 13, 2013 at 09:48:53PM +1100, Dave Chinner wrote: > > All I can say is that there doesn't look to be any obvious signs > > that this is a XFS or writeback problem fom the stack trace, and > > without more information or a reproducable test case I'm not going > > to be able to understand the cause. > > > > Is the problem reproducable, or is it just a one-off? > > If it's reproducable it would be good to enable the xfs_writepage > tracepoint and get a trace-cmd log of it. I hit it pretty reliably last night. I'll make a note to look into it on monday with that enabled. Dave ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XFS / writeback invoking soft lockup. 2013-12-13 16:22 ` Dave Jones @ 2013-12-14 8:20 ` Dave Chinner 0 siblings, 0 replies; 5+ messages in thread From: Dave Chinner @ 2013-12-14 8:20 UTC (permalink / raw) To: Dave Jones, Christoph Hellwig, Linux Kernel, xfs On Fri, Dec 13, 2013 at 11:22:28AM -0500, Dave Jones wrote: > On Fri, Dec 13, 2013 at 03:16:39AM -0800, Christoph Hellwig wrote: > > On Fri, Dec 13, 2013 at 09:48:53PM +1100, Dave Chinner wrote: > > > All I can say is that there doesn't look to be any obvious signs > > > that this is a XFS or writeback problem fom the stack trace, and > > > without more information or a reproducable test case I'm not going > > > to be able to understand the cause. > > > > > > Is the problem reproducable, or is it just a one-off? > > > > If it's reproducable it would be good to enable the xfs_writepage > > tracepoint and get a trace-cmd log of it. > > I hit it pretty reliably last night. I'll make a note to look into it > on monday with that enabled. You might want to grab the writeback tracepoints as well. Note that you'll probably have to filter by device so you don't get noise from all the other block devices, too. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-12-14 8:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-13 7:14 XFS / writeback invoking soft lockup Dave Jones 2013-12-13 10:48 ` Dave Chinner 2013-12-13 11:16 ` Christoph Hellwig 2013-12-13 16:22 ` Dave Jones 2013-12-14 8:20 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox