XFS / writeback invoking soft lockup.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* XFS / writeback invoking soft lockup.
@ 2013-12-13  7:14 Dave Jones
  2013-12-13 10:48 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Jones @ 2013-12-13  7:14 UTC (permalink / raw)
  To: Linux Kernel; +Cc: xfs

I can hit this pretty reliably on one of my slower test machines.
(8gb ram, 1 slow sata disk)

the machine is pretty responsive, and recovers after a while.
anything we can do to shut it up ?

BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u8:2:8479]
Modules linked in: rfcomm l2tp_ppp l2tp_netlink l2tp_core scsi_transport_iscsi can_bcm nfc caif_socket caif af_802154 phonet af_rxrpc bluetooth can_raw can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key rose x25 atm netrom appletalk ipx p8023 p8022 psnap llc ax25 cfg80211 rfkill nouveau video backlight mxm_wmi wmi i2c_algo_bit ttm drm_kms_helper tg3 drm kvm_intel kvm ptp i2c_core pps_core libphy lpc_ich mfd_core serio_raw pcspkr microcode parport_pc parport shpchp floppy rtc_cmos xfs libcrc32c raid0
irq event stamp: 510442980
hardirqs last  enabled at (510442979): [<c15c47d9>] _raw_spin_unlock_irq+0x27/0x46
hardirqs last disabled at (510442980): [<c15c5667>] apic_timer_interrupt+0x2f/0x3c
softirqs last  enabled at (510441282): [<c1045d01>] __do_softirq+0x14b/0x38d
softirqs last disabled at (510441277): [<c100418b>] do_softirq_own_stack+0x2e/0x34
CPU: 2 PID: 8479 Comm: kworker/u8:2 Not tainted 3.13.0-rc3+ #1 
Hardware name: Dell Inc.                 Precision WorkStation 490    /0DT031, BIOS A08 04/25/2008
Workqueue: writeback bdi_writeback_workfn (flush-9:127)
task: e9e458c0 ti: e4214000 task.ti: e4214000
EIP: 0060:[<c112f797>] EFLAGS: 00000286 CPU: 2
EIP is at lru_add_drain_cpu+0x19/0xc0
EAX: c1a988c0 EBX: 00000002 ECX: 00000001 EDX: 2ff6c000
ESI: 00000000 EDI: 00000001 EBP: e4215c54 ESP: e4215c48
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 29ed5000 CR4: 000007f0
Stack:
 e4215cb8 00000000 00000001 e4215c5c c112f8f8 e4215c68 c112f934 00000001
 e4215d04 c112baba 00000000 0000000e 00000000 00000000 00000001 00000000
 c1a9c980 f1a08980 c112ad83 00000000 e4215da4 00000067 00000003 00000001
Call Trace:
 [<c112f8f8>] lru_add_drain+0x1c/0x39
 [<c112f934>] __pagevec_release+0x10/0x26
 [<c112baba>] write_cache_pages+0x2f9/0x486
 [<c112ad83>] ? global_dirtyable_memory+0x73/0x73
 [<c108c940>] ? get_lock_stats+0x1b/0x43
 [<c108cf5c>] ? put_lock_stats.isra.30+0xd/0x22
 [<c108d3a2>] ? lock_release_holdtime.part.31+0xa0/0xd5
 [<c112bc7e>] generic_writepages+0x37/0x51
 [<f8616158>] xfs_vm_writepages+0x3a/0x42 [xfs]
 [<c112d1d9>] do_writepages+0x1d/0x2c
 [<c11a86ff>] __writeback_single_inode+0x3f/0x4d7
 [<c11a8f55>] ? writeback_sb_inodes+0x178/0x3d9
 [<c11a8fb2>] writeback_sb_inodes+0x1d5/0x3d9
 [<c11a9235>] __writeback_inodes_wb+0x7f/0xae
 [<c11a94ad>] wb_writeback+0x249/0x4c5
 [<c11a9bed>] bdi_writeback_workfn+0x321/0x594
 [<c105c9e7>] process_one_work+0x1ad/0x5ff
 [<c105c9af>] ? process_one_work+0x175/0x5ff
 [<c105cf2f>] worker_thread+0xf6/0x31b
 [<c105ce39>] ? process_one_work+0x5ff/0x5ff
 [<c106398e>] kthread+0xb5/0xc8
 [<c108f8aa>] ? trace_hardirqs_on+0xb/0xd
 [<c15cbe37>] ret_from_kernel_thread+0x1b/0x28
 [<c10638d9>] ? insert_kthread_work+0x63/0x63
Code: e8 41 50 49 00 83 c4 04 5b 5e 5f 5d c3 e8 25 ad 48 00 55 89 e5 57 56 53 66 66 66 66 90 89 c3 b8 c0 88 a9 c1 8b 14 9d 80 33 8b c1 <01> d0 8b 08 85 c9 75 74 b8 80 88 a9 c1 8d 34 02 8b 3e 85 ff 75


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS / writeback invoking soft lockup.
  2013-12-13  7:14 XFS / writeback invoking soft lockup Dave Jones
@ 2013-12-13 10:48 ` Dave Chinner
  2013-12-13 11:16   ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2013-12-13 10:48 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, xfs

On Fri, Dec 13, 2013 at 02:14:07AM -0500, Dave Jones wrote:
> I can hit this pretty reliably on one of my slower test machines.
> (8gb ram, 1 slow sata disk)
> 
> the machine is pretty responsive, and recovers after a while.
> anything we can do to shut it up ?

Actually, I think this indicates a problem.

> BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u8:2:8479]
...
> Call Trace:
>  [<c112f8f8>] lru_add_drain+0x1c/0x39
>  [<c112f934>] __pagevec_release+0x10/0x26
>  [<c112baba>] write_cache_pages+0x2f9/0x486

That code in write_cache_pages():

1907         while (!done && (index <= end)) {
1908                 int i;
1909
1910                 nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
1911                               min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1)
1912                 if (nr_pages == 0)
1913                         break;
1914
1915                 for (i = 0; i < nr_pages; i++) {
1916                         struct page *page = pvec.pages[i];
....
....
2001                 }
2002                 pagevec_release(&pvec);
2003                 cond_resched();
2004         }

So after all the pages in a pagevec are processed, we release the
CPU before we grab the next pagevec. This softlockup implies we
have been processing this pagevec for 22s. That tells me the code
is actually stuck spinning on something, not that this is a false
positive. i.e. it should not take 22s to process 14 pages. 

[ Yes, I know XFS can process more than that ->writepage, but it's
still only a millisecond of work if it doesn't block on anything.
And it can't be blocking, otherwise we wouldn't be firing the
softlockup warning. ]

The page cache LRU code is a maze of twisty per-cpu passages that go
deep into the mm subsystem and memcg code - I'm not really sure what
all that code is doing, so you'll probably have to ask someone who
knows about that code.

All I can say is that there doesn't look to be any obvious signs
that this is a XFS or writeback problem fom the stack trace, and
without more information or a reproducable test case I'm not going
to be able to understand the cause.

Is the problem reproducable, or is it just a one-off?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS / writeback invoking soft lockup.
  2013-12-13 10:48 ` Dave Chinner
@ 2013-12-13 11:16   ` Christoph Hellwig
  2013-12-13 16:22     ` Dave Jones
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2013-12-13 11:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Dave Jones, Linux Kernel, xfs

On Fri, Dec 13, 2013 at 09:48:53PM +1100, Dave Chinner wrote:
> All I can say is that there doesn't look to be any obvious signs
> that this is a XFS or writeback problem fom the stack trace, and
> without more information or a reproducable test case I'm not going
> to be able to understand the cause.
> 
> Is the problem reproducable, or is it just a one-off?

If it's reproducable it would be good to enable the xfs_writepage
tracepoint and get a trace-cmd log of it.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS / writeback invoking soft lockup.
  2013-12-13 11:16   ` Christoph Hellwig
@ 2013-12-13 16:22     ` Dave Jones
  2013-12-14  8:20       ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Jones @ 2013-12-13 16:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dave Chinner, Linux Kernel, xfs

On Fri, Dec 13, 2013 at 03:16:39AM -0800, Christoph Hellwig wrote:
 > On Fri, Dec 13, 2013 at 09:48:53PM +1100, Dave Chinner wrote:
 > > All I can say is that there doesn't look to be any obvious signs
 > > that this is a XFS or writeback problem fom the stack trace, and
 > > without more information or a reproducable test case I'm not going
 > > to be able to understand the cause.
 > > 
 > > Is the problem reproducable, or is it just a one-off?
 > 
 > If it's reproducable it would be good to enable the xfs_writepage
 > tracepoint and get a trace-cmd log of it.

I hit it pretty reliably last night.  I'll make a note to look into it 
on monday with that enabled.

	Dave


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS / writeback invoking soft lockup.
  2013-12-13 16:22     ` Dave Jones
@ 2013-12-14  8:20       ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2013-12-14  8:20 UTC (permalink / raw)
  To: Dave Jones, Christoph Hellwig, Linux Kernel, xfs

On Fri, Dec 13, 2013 at 11:22:28AM -0500, Dave Jones wrote:
> On Fri, Dec 13, 2013 at 03:16:39AM -0800, Christoph Hellwig wrote:
>  > On Fri, Dec 13, 2013 at 09:48:53PM +1100, Dave Chinner wrote:
>  > > All I can say is that there doesn't look to be any obvious signs
>  > > that this is a XFS or writeback problem fom the stack trace, and
>  > > without more information or a reproducable test case I'm not going
>  > > to be able to understand the cause.
>  > > 
>  > > Is the problem reproducable, or is it just a one-off?
>  > 
>  > If it's reproducable it would be good to enable the xfs_writepage
>  > tracepoint and get a trace-cmd log of it.
> 
> I hit it pretty reliably last night.  I'll make a note to look into it 
> on monday with that enabled.

You might want to grab the writeback tracepoints as well. Note that
you'll probably have to filter by device so you don't get noise from
all the other block devices, too.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-12-14  8:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-13  7:14 XFS / writeback invoking soft lockup Dave Jones
2013-12-13 10:48 ` Dave Chinner
2013-12-13 11:16   ` Christoph Hellwig
2013-12-13 16:22     ` Dave Jones
2013-12-14  8:20       ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox