* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
[not found] <Pine.LNX.4.64.0807210936410.7212@filesrv1.baby-dragons.com>
@ 2008-07-21 23:12 ` Neil Brown
2008-07-21 23:43 ` Mr. James W. Laferriere
2008-07-22 1:25 ` Timothy Shimmin
0 siblings, 2 replies; 4+ messages in thread
From: Neil Brown @ 2008-07-21 23:12 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: linux-raid maillist, xfs
On Monday July 21, babydr@baby-dragons.com wrote:
> INFO: task pdflush:393 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> pdflush D c8209f80 4748 393 2
> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
> Call Trace:
> [<c014fc57>] ? mark_held_locks+0x67/0x80
> [<c0142233>] ? add_wait_queue+0x33/0x50
> [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
> [<c0127a60>] ? default_wake_function+0x0/0x10
> [<c0127a60>] ? default_wake_function+0x0/0x10
> [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
> [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
> [<c03a495c>] xfs_bwrite+0x5c/0xe0
> [<c039e941>] xfs_syncsub+0x121/0x2b0
> [<c018a43b>] ? lock_super+0x1b/0x20
> [<c018a43b>] ? lock_super+0x1b/0x20
> [<c039e1d8>] xfs_sync+0x48/0x70
> [<c03af833>] xfs_fs_write_super+0x23/0x30
> [<c018a80f>] sync_supers+0xaf/0xc0
Looks a lot like an XFS problem to me.
Or at least, XFS people would be able to interpret this stack the
best.
> [<c0169259>] wb_kupdate+0x29/0x100
> [<c016a0cc>] ? __pdflush+0xcc/0x1a0
> [<c016a0d2>] __pdflush+0xd2/0x1a0
> [<c016a1a0>] ? pdflush+0x0/0x40
> [<c016a1d1>] pdflush+0x31/0x40
> [<c0169230>] ? wb_kupdate+0x0/0x100
> [<c016a1a0>] ? pdflush+0x0/0x40
> [<c0141e2c>] kthread+0x5c/0xa0
> [<c0141dd0>] ? kthread+0x0/0xa0
> [<c0103d67>] kernel_thread_helper+0x7/0x10
> =======================
> 2 locks held by pdflush/393:
> #0: (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0
> #1: (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20
>
> ...snip... Repeats of above message ad-infintum .
Hmm... I guess I clipped a bit too much for our XFS friends to know
the context.
bonnie is being run on an XFS filesystem on md/raid6. and it gets
this warning a lot and essentially hangs.
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown
@ 2008-07-21 23:43 ` Mr. James W. Laferriere
2008-07-22 2:20 ` Dave Chinner
2008-07-22 1:25 ` Timothy Shimmin
1 sibling, 1 reply; 4+ messages in thread
From: Mr. James W. Laferriere @ 2008-07-21 23:43 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid maillist, xfs
Hello Neil ,
On Tue, 22 Jul 2008, Neil Brown wrote:
> On Monday July 21, babydr@baby-dragons.com wrote:
>> INFO: task pdflush:393 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> pdflush D c8209f80 4748 393 2
>> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
>> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
>> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
>> Call Trace:
>> [<c014fc57>] ? mark_held_locks+0x67/0x80
>> [<c0142233>] ? add_wait_queue+0x33/0x50
>> [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
>> [<c0127a60>] ? default_wake_function+0x0/0x10
>> [<c0127a60>] ? default_wake_function+0x0/0x10
>> [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
>> [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
>> [<c03a495c>] xfs_bwrite+0x5c/0xe0
>> [<c039e941>] xfs_syncsub+0x121/0x2b0
>> [<c018a43b>] ? lock_super+0x1b/0x20
>> [<c018a43b>] ? lock_super+0x1b/0x20
>> [<c039e1d8>] xfs_sync+0x48/0x70
>> [<c03af833>] xfs_fs_write_super+0x23/0x30
>> [<c018a80f>] sync_supers+0xaf/0xc0
>
> Looks a lot like an XFS problem to me.
> Or at least, XFS people would be able to interpret this stack the
> best.
Hmm , Ok , I'll post there , I can provide a -complete- boot ->
renboot log of the actions , But it ain't small ~ 649K . So I'll post that on
the back of my website , ie:
http://www.baby-dragons.com/bonnie++1.03c-2.6.26-rc9.console.trace.log
This includes the Sysrq-w & sysrq-b outputs at the bottom .
If you're interested in that kind of thing ;-) .
>> [<c0169259>] wb_kupdate+0x29/0x100
>> [<c016a0cc>] ? __pdflush+0xcc/0x1a0
>> [<c016a0d2>] __pdflush+0xd2/0x1a0
>> [<c016a1a0>] ? pdflush+0x0/0x40
>> [<c016a1d1>] pdflush+0x31/0x40
>> [<c0169230>] ? wb_kupdate+0x0/0x100
>> [<c016a1a0>] ? pdflush+0x0/0x40
>> [<c0141e2c>] kthread+0x5c/0xa0
>> [<c0141dd0>] ? kthread+0x0/0xa0
>> [<c0103d67>] kernel_thread_helper+0x7/0x10
>> =======================
>> 2 locks held by pdflush/393:
>> #0: (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0
>> #1: (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20
>>
>> ...snip... Repeats of above message ad-infintum .
>
>
> Hmm... I guess I clipped a bit too much for our XFS friends to know
> the context.
> bonnie is being run on an XFS filesystem on md/raid6. and it gets
> this warning a lot and essentially hangs.
> NeilBrown
Nah , wasn't you who did the clipping t'was me . in order to get the
size of the original message below ~ 80K .
Tnx , JimL
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network&System Engineer | 2133 McCullam Ave | Give me Linux |
| babydr@baby-dragons.com | Fairbanks, AK. 99701 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown
2008-07-21 23:43 ` Mr. James W. Laferriere
@ 2008-07-22 1:25 ` Timothy Shimmin
1 sibling, 0 replies; 4+ messages in thread
From: Timothy Shimmin @ 2008-07-22 1:25 UTC (permalink / raw)
To: Neil Brown; +Cc: Mr. James W. Laferriere, linux-raid maillist, xfs
Neil Brown wrote:
> On Monday July 21, babydr@baby-dragons.com wrote:
>> INFO: task pdflush:393 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> pdflush D c8209f80 4748 393 2
>> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
>> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
>> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
>> Call Trace:
>> [<c014fc57>] ? mark_held_locks+0x67/0x80
>> [<c0142233>] ? add_wait_queue+0x33/0x50
>> [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
>> [<c0127a60>] ? default_wake_function+0x0/0x10
>> [<c0127a60>] ? default_wake_function+0x0/0x10
>> [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
>> [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
>> [<c03a495c>] xfs_bwrite+0x5c/0xe0
>> [<c039e941>] xfs_syncsub+0x121/0x2b0
>> [<c018a43b>] ? lock_super+0x1b/0x20
>> [<c018a43b>] ? lock_super+0x1b/0x20
>> [<c039e1d8>] xfs_sync+0x48/0x70
>> [<c03af833>] xfs_fs_write_super+0x23/0x30
>> [<c018a80f>] sync_supers+0xaf/0xc0
>
> Looks a lot like an XFS problem to me.
> Or at least, XFS people would be able to interpret this stack the
> best.
>
I presume if it is waiting in xfs_buf_wait_unpin() for a long
time (>2min) then maybe a journal-log io completion hasn't come
back to say that the matching buffer item has made to the ondisk log.
i.e the buffer hasn't been unpinned yet (pincount>0) which is supposed
to happen when its data hits the ondisk log.
>> [<c0169259>] wb_kupdate+0x29/0x100
>> [<c016a0cc>] ? __pdflush+0xcc/0x1a0
>> [<c016a0d2>] __pdflush+0xd2/0x1a0
>> [<c016a1a0>] ? pdflush+0x0/0x40
>> [<c016a1d1>] pdflush+0x31/0x40
>> [<c0169230>] ? wb_kupdate+0x0/0x100
>> [<c016a1a0>] ? pdflush+0x0/0x40
>> [<c0141e2c>] kthread+0x5c/0xa0
>> [<c0141dd0>] ? kthread+0x0/0xa0
>> [<c0103d67>] kernel_thread_helper+0x7/0x10
>> =======================
>> 2 locks held by pdflush/393:
>> #0: (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0
>> #1: (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20
>>
>> ...snip... Repeats of above message ad-infintum .
>
>
> Hmm... I guess I clipped a bit too much for our XFS friends to know
> the context.
> bonnie is being run on an XFS filesystem on md/raid6. and it gets
> this warning a lot and essentially hangs.
>
Just for the record,
in rc-9 we hadn't removed the QUEUE_ORDERED tag check yet and
so I presume for md/raid6, barriers will be disabled.
So barrier writes on the log won't be being issued.
I don't see that as anything to do with the problem here -
that is more of an issue on replay if we have the cache on
and no barrier support - I just thought I'd mention it.
--Tim
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
2008-07-21 23:43 ` Mr. James W. Laferriere
@ 2008-07-22 2:20 ` Dave Chinner
0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2008-07-22 2:20 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: Neil Brown, linux-raid maillist, xfs
On Mon, Jul 21, 2008 at 03:43:03PM -0800, Mr. James W. Laferriere wrote:
> Hello Neil ,
>
> On Tue, 22 Jul 2008, Neil Brown wrote:
>> On Monday July 21, babydr@baby-dragons.com wrote:
>>> INFO: task pdflush:393 blocked for more than 120 seconds.
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> pdflush D c8209f80 4748 393 2
>>> f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
>>> f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
>>> 00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
>>> Call Trace:
>>> [<c014fc57>] ? mark_held_locks+0x67/0x80
>>> [<c0142233>] ? add_wait_queue+0x33/0x50
>>> [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
>>> [<c0127a60>] ? default_wake_function+0x0/0x10
>>> [<c0127a60>] ? default_wake_function+0x0/0x10
>>> [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
>>> [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
>>> [<c03a495c>] xfs_bwrite+0x5c/0xe0
>>> [<c039e941>] xfs_syncsub+0x121/0x2b0
>>> [<c018a43b>] ? lock_super+0x1b/0x20
>>> [<c018a43b>] ? lock_super+0x1b/0x20
>>> [<c039e1d8>] xfs_sync+0x48/0x70
>>> [<c03af833>] xfs_fs_write_super+0x23/0x30
>>> [<c018a80f>] sync_supers+0xaf/0xc0
>>
>> Looks a lot like an XFS problem to me.
>> Or at least, XFS people would be able to interpret this stack the
>> best.
> Hmm , Ok , I'll post there , I can provide a -complete- boot ->
> renboot log of the actions , But it ain't small ~ 649K . So I'll post
> that on the back of my website , ie:
>
> http://www.baby-dragons.com/bonnie++1.03c-2.6.26-rc9.console.trace.log
Given that it's a log hang on 2.6.29-rc9, I'd first say add this commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=49641f1acfdfd437ed9b0a70b86bf36626c02afe
to your build (went in after -rc9 but before 2.6.26 was released)
and see if that solves the problem.
In more detail, this stack trace implies log I/O has not completed
after the log force was triggered in xfs_buf_wait_unpin(). The above
patch fixes a bug in log I/o dispatch where an non-atomic compare
and decrement would result in log I/O not being dispatched.
So, you've got a hang waiting for log I/o to complete on a kernel
that has a known problem with log I/O dispatch, so it's likely
that's what you've hit.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-07-22 2:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.64.0807210936410.7212@filesrv1.baby-dragons.com>
2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown
2008-07-21 23:43 ` Mr. James W. Laferriere
2008-07-22 2:20 ` Dave Chinner
2008-07-22 1:25 ` Timothy Shimmin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox