* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) [not found] <Pine.LNX.4.64.0807210936410.7212@filesrv1.baby-dragons.com> @ 2008-07-21 23:12 ` Neil Brown 2008-07-21 23:43 ` Mr. James W. Laferriere 2008-07-22 1:25 ` Timothy Shimmin 0 siblings, 2 replies; 4+ messages in thread From: Neil Brown @ 2008-07-21 23:12 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist, xfs On Monday July 21, babydr@baby-dragons.com wrote: > INFO: task pdflush:393 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > pdflush D c8209f80 4748 393 2 > f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000 > f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44 > 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30 > Call Trace: > [<c014fc57>] ? mark_held_locks+0x67/0x80 > [<c0142233>] ? add_wait_queue+0x33/0x50 > [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0 > [<c0127a60>] ? default_wake_function+0x0/0x10 > [<c0127a60>] ? default_wake_function+0x0/0x10 > [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80 > [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50 > [<c03a495c>] xfs_bwrite+0x5c/0xe0 > [<c039e941>] xfs_syncsub+0x121/0x2b0 > [<c018a43b>] ? lock_super+0x1b/0x20 > [<c018a43b>] ? lock_super+0x1b/0x20 > [<c039e1d8>] xfs_sync+0x48/0x70 > [<c03af833>] xfs_fs_write_super+0x23/0x30 > [<c018a80f>] sync_supers+0xaf/0xc0 Looks a lot like an XFS problem to me. Or at least, XFS people would be able to interpret this stack the best. > [<c0169259>] wb_kupdate+0x29/0x100 > [<c016a0cc>] ? __pdflush+0xcc/0x1a0 > [<c016a0d2>] __pdflush+0xd2/0x1a0 > [<c016a1a0>] ? pdflush+0x0/0x40 > [<c016a1d1>] pdflush+0x31/0x40 > [<c0169230>] ? wb_kupdate+0x0/0x100 > [<c016a1a0>] ? pdflush+0x0/0x40 > [<c0141e2c>] kthread+0x5c/0xa0 > [<c0141dd0>] ? kthread+0x0/0xa0 > [<c0103d67>] kernel_thread_helper+0x7/0x10 > ======================= > 2 locks held by pdflush/393: > #0: (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0 > #1: (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20 > > ...snip... Repeats of above message ad-infintum . Hmm... I guess I clipped a bit too much for our XFS friends to know the context. bonnie is being run on an XFS filesystem on md/raid6. and it gets this warning a lot and essentially hangs. NeilBrown ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) 2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown @ 2008-07-21 23:43 ` Mr. James W. Laferriere 2008-07-22 2:20 ` Dave Chinner 2008-07-22 1:25 ` Timothy Shimmin 1 sibling, 1 reply; 4+ messages in thread From: Mr. James W. Laferriere @ 2008-07-21 23:43 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid maillist, xfs Hello Neil , On Tue, 22 Jul 2008, Neil Brown wrote: > On Monday July 21, babydr@baby-dragons.com wrote: >> INFO: task pdflush:393 blocked for more than 120 seconds. >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> pdflush D c8209f80 4748 393 2 >> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000 >> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44 >> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30 >> Call Trace: >> [<c014fc57>] ? mark_held_locks+0x67/0x80 >> [<c0142233>] ? add_wait_queue+0x33/0x50 >> [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0 >> [<c0127a60>] ? default_wake_function+0x0/0x10 >> [<c0127a60>] ? default_wake_function+0x0/0x10 >> [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80 >> [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50 >> [<c03a495c>] xfs_bwrite+0x5c/0xe0 >> [<c039e941>] xfs_syncsub+0x121/0x2b0 >> [<c018a43b>] ? lock_super+0x1b/0x20 >> [<c018a43b>] ? lock_super+0x1b/0x20 >> [<c039e1d8>] xfs_sync+0x48/0x70 >> [<c03af833>] xfs_fs_write_super+0x23/0x30 >> [<c018a80f>] sync_supers+0xaf/0xc0 > > Looks a lot like an XFS problem to me. > Or at least, XFS people would be able to interpret this stack the > best. Hmm , Ok , I'll post there , I can provide a -complete- boot -> renboot log of the actions , But it ain't small ~ 649K . So I'll post that on the back of my website , ie: http://www.baby-dragons.com/bonnie++1.03c-2.6.26-rc9.console.trace.log This includes the Sysrq-w & sysrq-b outputs at the bottom . If you're interested in that kind of thing ;-) . >> [<c0169259>] wb_kupdate+0x29/0x100 >> [<c016a0cc>] ? __pdflush+0xcc/0x1a0 >> [<c016a0d2>] __pdflush+0xd2/0x1a0 >> [<c016a1a0>] ? pdflush+0x0/0x40 >> [<c016a1d1>] pdflush+0x31/0x40 >> [<c0169230>] ? wb_kupdate+0x0/0x100 >> [<c016a1a0>] ? pdflush+0x0/0x40 >> [<c0141e2c>] kthread+0x5c/0xa0 >> [<c0141dd0>] ? kthread+0x0/0xa0 >> [<c0103d67>] kernel_thread_helper+0x7/0x10 >> ======================= >> 2 locks held by pdflush/393: >> #0: (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0 >> #1: (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20 >> >> ...snip... Repeats of above message ad-infintum . > > > Hmm... I guess I clipped a bit too much for our XFS friends to know > the context. > bonnie is being run on an XFS filesystem on md/raid6. and it gets > this warning a lot and essentially hangs. > NeilBrown Nah , wasn't you who did the clipping t'was me . in order to get the size of the original message below ~ 80K . Tnx , JimL -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network&System Engineer | 2133 McCullam Ave | Give me Linux | | babydr@baby-dragons.com | Fairbanks, AK. 99701 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) 2008-07-21 23:43 ` Mr. James W. Laferriere @ 2008-07-22 2:20 ` Dave Chinner 0 siblings, 0 replies; 4+ messages in thread From: Dave Chinner @ 2008-07-22 2:20 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: Neil Brown, linux-raid maillist, xfs On Mon, Jul 21, 2008 at 03:43:03PM -0800, Mr. James W. Laferriere wrote: > Hello Neil , > > On Tue, 22 Jul 2008, Neil Brown wrote: >> On Monday July 21, babydr@baby-dragons.com wrote: >>> INFO: task pdflush:393 blocked for more than 120 seconds. >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> pdflush D c8209f80 4748 393 2 >>> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000 >>> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44 >>> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30 >>> Call Trace: >>> [<c014fc57>] ? mark_held_locks+0x67/0x80 >>> [<c0142233>] ? add_wait_queue+0x33/0x50 >>> [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0 >>> [<c0127a60>] ? default_wake_function+0x0/0x10 >>> [<c0127a60>] ? default_wake_function+0x0/0x10 >>> [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80 >>> [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50 >>> [<c03a495c>] xfs_bwrite+0x5c/0xe0 >>> [<c039e941>] xfs_syncsub+0x121/0x2b0 >>> [<c018a43b>] ? lock_super+0x1b/0x20 >>> [<c018a43b>] ? lock_super+0x1b/0x20 >>> [<c039e1d8>] xfs_sync+0x48/0x70 >>> [<c03af833>] xfs_fs_write_super+0x23/0x30 >>> [<c018a80f>] sync_supers+0xaf/0xc0 >> >> Looks a lot like an XFS problem to me. >> Or at least, XFS people would be able to interpret this stack the >> best. > Hmm , Ok , I'll post there , I can provide a -complete- boot -> > renboot log of the actions , But it ain't small ~ 649K . So I'll post > that on the back of my website , ie: > > http://www.baby-dragons.com/bonnie++1.03c-2.6.26-rc9.console.trace.log Given that it's a log hang on 2.6.29-rc9, I'd first say add this commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=49641f1acfdfd437ed9b0a70b86bf36626c02afe to your build (went in after -rc9 but before 2.6.26 was released) and see if that solves the problem. In more detail, this stack trace implies log I/O has not completed after the log force was triggered in xfs_buf_wait_unpin(). The above patch fixes a bug in log I/o dispatch where an non-atomic compare and decrement would result in log I/O not being dispatched. So, you've got a hang waiting for log I/o to complete on a kernel that has a known problem with log I/O dispatch, so it's likely that's what you've hit. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) 2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown 2008-07-21 23:43 ` Mr. James W. Laferriere @ 2008-07-22 1:25 ` Timothy Shimmin 1 sibling, 0 replies; 4+ messages in thread From: Timothy Shimmin @ 2008-07-22 1:25 UTC (permalink / raw) To: Neil Brown; +Cc: Mr. James W. Laferriere, linux-raid maillist, xfs Neil Brown wrote: > On Monday July 21, babydr@baby-dragons.com wrote: >> INFO: task pdflush:393 blocked for more than 120 seconds. >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> pdflush D c8209f80 4748 393 2 >> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000 >> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44 >> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30 >> Call Trace: >> [<c014fc57>] ? mark_held_locks+0x67/0x80 >> [<c0142233>] ? add_wait_queue+0x33/0x50 >> [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0 >> [<c0127a60>] ? default_wake_function+0x0/0x10 >> [<c0127a60>] ? default_wake_function+0x0/0x10 >> [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80 >> [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50 >> [<c03a495c>] xfs_bwrite+0x5c/0xe0 >> [<c039e941>] xfs_syncsub+0x121/0x2b0 >> [<c018a43b>] ? lock_super+0x1b/0x20 >> [<c018a43b>] ? lock_super+0x1b/0x20 >> [<c039e1d8>] xfs_sync+0x48/0x70 >> [<c03af833>] xfs_fs_write_super+0x23/0x30 >> [<c018a80f>] sync_supers+0xaf/0xc0 > > Looks a lot like an XFS problem to me. > Or at least, XFS people would be able to interpret this stack the > best. > I presume if it is waiting in xfs_buf_wait_unpin() for a long time (>2min) then maybe a journal-log io completion hasn't come back to say that the matching buffer item has made to the ondisk log. i.e the buffer hasn't been unpinned yet (pincount>0) which is supposed to happen when its data hits the ondisk log. >> [<c0169259>] wb_kupdate+0x29/0x100 >> [<c016a0cc>] ? __pdflush+0xcc/0x1a0 >> [<c016a0d2>] __pdflush+0xd2/0x1a0 >> [<c016a1a0>] ? pdflush+0x0/0x40 >> [<c016a1d1>] pdflush+0x31/0x40 >> [<c0169230>] ? wb_kupdate+0x0/0x100 >> [<c016a1a0>] ? pdflush+0x0/0x40 >> [<c0141e2c>] kthread+0x5c/0xa0 >> [<c0141dd0>] ? kthread+0x0/0xa0 >> [<c0103d67>] kernel_thread_helper+0x7/0x10 >> ======================= >> 2 locks held by pdflush/393: >> #0: (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0 >> #1: (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20 >> >> ...snip... Repeats of above message ad-infintum . > > > Hmm... I guess I clipped a bit too much for our XFS friends to know > the context. > bonnie is being run on an XFS filesystem on md/raid6. and it gets > this warning a lot and essentially hangs. > Just for the record, in rc-9 we hadn't removed the QUEUE_ORDERED tag check yet and so I presume for md/raid6, barriers will be disabled. So barrier writes on the log won't be being issued. I don't see that as anything to do with the problem here - that is more of an issue on replay if we have the cache on and no barrier support - I just thought I'd mention it. --Tim ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-07-22 2:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.64.0807210936410.7212@filesrv1.baby-dragons.com>
2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown
2008-07-21 23:43 ` Mr. James W. Laferriere
2008-07-22 2:20 ` Dave Chinner
2008-07-22 1:25 ` Timothy Shimmin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox