Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
       [not found] <Pine.LNX.4.64.0807210936410.7212@filesrv1.baby-dragons.com>
@ 2008-07-21 23:12 ` Neil Brown
  2008-07-21 23:43   ` Mr. James W. Laferriere
  2008-07-22  1:25   ` Timothy Shimmin
  0 siblings, 2 replies; 4+ messages in thread
From: Neil Brown @ 2008-07-21 23:12 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist, xfs

On Monday July 21, babydr@baby-dragons.com wrote:
> INFO: task pdflush:393 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> pdflush       D c8209f80  4748   393      2
>          f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
>          f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
>          00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
> Call Trace:
>    [<c014fc57>] ? mark_held_locks+0x67/0x80
>    [<c0142233>] ? add_wait_queue+0x33/0x50
>    [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
>    [<c0127a60>] ? default_wake_function+0x0/0x10
>    [<c0127a60>] ? default_wake_function+0x0/0x10
>    [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
>    [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
>    [<c03a495c>] xfs_bwrite+0x5c/0xe0
>    [<c039e941>] xfs_syncsub+0x121/0x2b0
>    [<c018a43b>] ? lock_super+0x1b/0x20
>    [<c018a43b>] ? lock_super+0x1b/0x20
>    [<c039e1d8>] xfs_sync+0x48/0x70
>    [<c03af833>] xfs_fs_write_super+0x23/0x30
>    [<c018a80f>] sync_supers+0xaf/0xc0

Looks a lot like an XFS problem to me.
Or at least, XFS people would be able to interpret this stack the
best.

>    [<c0169259>] wb_kupdate+0x29/0x100
>    [<c016a0cc>] ? __pdflush+0xcc/0x1a0
>    [<c016a0d2>] __pdflush+0xd2/0x1a0
>    [<c016a1a0>] ? pdflush+0x0/0x40
>    [<c016a1d1>] pdflush+0x31/0x40
>    [<c0169230>] ? wb_kupdate+0x0/0x100
>    [<c016a1a0>] ? pdflush+0x0/0x40
>    [<c0141e2c>] kthread+0x5c/0xa0
>    [<c0141dd0>] ? kthread+0x0/0xa0
>    [<c0103d67>] kernel_thread_helper+0x7/0x10
>    =======================
> 2 locks held by pdflush/393:
>    #0:  (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0
>    #1:  (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20
> 
>     ...snip... Repeats of above message ad-infintum .


Hmm... I guess I clipped a bit too much for our XFS friends to know
the context.
bonnie is being run on an XFS filesystem on md/raid6. and it gets
this warning a lot and essentially hangs.

NeilBrown

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
  2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown
@ 2008-07-21 23:43   ` Mr. James W. Laferriere
  2008-07-22  2:20     ` Dave Chinner
  2008-07-22  1:25   ` Timothy Shimmin
  1 sibling, 1 reply; 4+ messages in thread
From: Mr. James W. Laferriere @ 2008-07-21 23:43 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid maillist, xfs

 	Hello Neil ,

On Tue, 22 Jul 2008, Neil Brown wrote:
> On Monday July 21, babydr@baby-dragons.com wrote:
>> INFO: task pdflush:393 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> pdflush       D c8209f80  4748   393      2
>>          f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
>>          f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
>>          00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
>> Call Trace:
>>    [<c014fc57>] ? mark_held_locks+0x67/0x80
>>    [<c0142233>] ? add_wait_queue+0x33/0x50
>>    [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
>>    [<c0127a60>] ? default_wake_function+0x0/0x10
>>    [<c0127a60>] ? default_wake_function+0x0/0x10
>>    [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
>>    [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
>>    [<c03a495c>] xfs_bwrite+0x5c/0xe0
>>    [<c039e941>] xfs_syncsub+0x121/0x2b0
>>    [<c018a43b>] ? lock_super+0x1b/0x20
>>    [<c018a43b>] ? lock_super+0x1b/0x20
>>    [<c039e1d8>] xfs_sync+0x48/0x70
>>    [<c03af833>] xfs_fs_write_super+0x23/0x30
>>    [<c018a80f>] sync_supers+0xaf/0xc0
>
> Looks a lot like an XFS problem to me.
> Or at least, XFS people would be able to interpret this stack the
> best.
 	Hmm ,  Ok ,  I'll post there ,  I can provide a -complete- boot -> 
renboot log of the actions ,  But it ain't small ~ 649K .  So I'll post that on 
the back of my website , ie:

http://www.baby-dragons.com/bonnie++1.03c-2.6.26-rc9.console.trace.log

 	This includes the Sysrq-w & sysrq-b outputs at the bottom .
 	If you're interested in that kind of thing ;-) .


>>    [<c0169259>] wb_kupdate+0x29/0x100
>>    [<c016a0cc>] ? __pdflush+0xcc/0x1a0
>>    [<c016a0d2>] __pdflush+0xd2/0x1a0
>>    [<c016a1a0>] ? pdflush+0x0/0x40
>>    [<c016a1d1>] pdflush+0x31/0x40
>>    [<c0169230>] ? wb_kupdate+0x0/0x100
>>    [<c016a1a0>] ? pdflush+0x0/0x40
>>    [<c0141e2c>] kthread+0x5c/0xa0
>>    [<c0141dd0>] ? kthread+0x0/0xa0
>>    [<c0103d67>] kernel_thread_helper+0x7/0x10
>>    =======================
>> 2 locks held by pdflush/393:
>>    #0:  (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0
>>    #1:  (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20
>>
>>     ...snip... Repeats of above message ad-infintum .
>
>
> Hmm... I guess I clipped a bit too much for our XFS friends to know
> the context.
> bonnie is being run on an XFS filesystem on md/raid6. and it gets
> this warning a lot and essentially hangs.
> NeilBrown
 	Nah ,  wasn't you who did the clipping t'was me .  in order to get the 
size of the original message below ~ 80K .

 		Tnx ,  JimL
-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network&System Engineer | 2133    McCullam Ave |  Give me Linux  |
| babydr@baby-dragons.com | Fairbanks, AK. 99701 |   only  on  AXP |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
  2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown
  2008-07-21 23:43   ` Mr. James W. Laferriere
@ 2008-07-22  1:25   ` Timothy Shimmin
  1 sibling, 0 replies; 4+ messages in thread
From: Timothy Shimmin @ 2008-07-22  1:25 UTC (permalink / raw)
  To: Neil Brown; +Cc: Mr. James W. Laferriere, linux-raid maillist, xfs

Neil Brown wrote:
> On Monday July 21, babydr@baby-dragons.com wrote:
>> INFO: task pdflush:393 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> pdflush       D c8209f80  4748   393      2
>>          f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
>>          f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
>>          00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
>> Call Trace:
>>    [<c014fc57>] ? mark_held_locks+0x67/0x80
>>    [<c0142233>] ? add_wait_queue+0x33/0x50
>>    [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
>>    [<c0127a60>] ? default_wake_function+0x0/0x10
>>    [<c0127a60>] ? default_wake_function+0x0/0x10
>>    [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
>>    [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
>>    [<c03a495c>] xfs_bwrite+0x5c/0xe0
>>    [<c039e941>] xfs_syncsub+0x121/0x2b0
>>    [<c018a43b>] ? lock_super+0x1b/0x20
>>    [<c018a43b>] ? lock_super+0x1b/0x20
>>    [<c039e1d8>] xfs_sync+0x48/0x70
>>    [<c03af833>] xfs_fs_write_super+0x23/0x30
>>    [<c018a80f>] sync_supers+0xaf/0xc0
> 
> Looks a lot like an XFS problem to me.
> Or at least, XFS people would be able to interpret this stack the
> best.
> 
I presume if it is waiting in xfs_buf_wait_unpin() for a long
time (>2min) then maybe a journal-log io completion hasn't come
back to say that the matching buffer item has made to the ondisk log.
i.e the buffer hasn't been unpinned yet (pincount>0) which is supposed
to happen when its data hits the ondisk log.
 
>>    [<c0169259>] wb_kupdate+0x29/0x100
>>    [<c016a0cc>] ? __pdflush+0xcc/0x1a0
>>    [<c016a0d2>] __pdflush+0xd2/0x1a0
>>    [<c016a1a0>] ? pdflush+0x0/0x40
>>    [<c016a1d1>] pdflush+0x31/0x40
>>    [<c0169230>] ? wb_kupdate+0x0/0x100
>>    [<c016a1a0>] ? pdflush+0x0/0x40
>>    [<c0141e2c>] kthread+0x5c/0xa0
>>    [<c0141dd0>] ? kthread+0x0/0xa0
>>    [<c0103d67>] kernel_thread_helper+0x7/0x10
>>    =======================
>> 2 locks held by pdflush/393:
>>    #0:  (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0
>>    #1:  (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20
>>
>>     ...snip... Repeats of above message ad-infintum .
> 
> 
> Hmm... I guess I clipped a bit too much for our XFS friends to know
> the context.
> bonnie is being run on an XFS filesystem on md/raid6. and it gets
> this warning a lot and essentially hangs.
> 
Just for the record,
in rc-9 we hadn't removed the QUEUE_ORDERED tag check yet and
so I presume for md/raid6, barriers will be disabled.
So barrier writes on the log won't be being issued.
I don't see that as anything to do with the problem here -
that is more of an issue on replay if we have the cache on
and no barrier support - I just thought I'd mention it.

--Tim

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
  2008-07-21 23:43   ` Mr. James W. Laferriere
@ 2008-07-22  2:20     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2008-07-22  2:20 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: Neil Brown, linux-raid maillist, xfs

On Mon, Jul 21, 2008 at 03:43:03PM -0800, Mr. James W. Laferriere wrote:
> 	Hello Neil ,
>
> On Tue, 22 Jul 2008, Neil Brown wrote:
>> On Monday July 21, babydr@baby-dragons.com wrote:
>>> INFO: task pdflush:393 blocked for more than 120 seconds.
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> pdflush       D c8209f80  4748   393      2
>>>          f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
>>>          f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
>>>          00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
>>> Call Trace:
>>>    [<c014fc57>] ? mark_held_locks+0x67/0x80
>>>    [<c0142233>] ? add_wait_queue+0x33/0x50
>>>    [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
>>>    [<c0127a60>] ? default_wake_function+0x0/0x10
>>>    [<c0127a60>] ? default_wake_function+0x0/0x10
>>>    [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
>>>    [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
>>>    [<c03a495c>] xfs_bwrite+0x5c/0xe0
>>>    [<c039e941>] xfs_syncsub+0x121/0x2b0
>>>    [<c018a43b>] ? lock_super+0x1b/0x20
>>>    [<c018a43b>] ? lock_super+0x1b/0x20
>>>    [<c039e1d8>] xfs_sync+0x48/0x70
>>>    [<c03af833>] xfs_fs_write_super+0x23/0x30
>>>    [<c018a80f>] sync_supers+0xaf/0xc0
>>
>> Looks a lot like an XFS problem to me.
>> Or at least, XFS people would be able to interpret this stack the
>> best.
> 	Hmm ,  Ok ,  I'll post there ,  I can provide a -complete- boot ->  
> renboot log of the actions ,  But it ain't small ~ 649K .  So I'll post 
> that on the back of my website , ie:
>
> http://www.baby-dragons.com/bonnie++1.03c-2.6.26-rc9.console.trace.log

Given that it's a log hang on 2.6.29-rc9, I'd first say add this commit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=49641f1acfdfd437ed9b0a70b86bf36626c02afe

to your build (went in after -rc9 but before 2.6.26 was released)
and see if that solves the problem.

In more detail, this stack trace implies log I/O has not completed
after the log force was triggered in xfs_buf_wait_unpin(). The above
patch fixes a bug in log I/o dispatch where an non-atomic compare
and decrement would result in log I/O not being dispatched.

So, you've got a hang waiting for log I/o to complete on a kernel
that has a known problem with log I/O dispatch, so it's likely
that's what you've hit.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-07-22  2:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.64.0807210936410.7212@filesrv1.baby-dragons.com>
2008-07-21 23:12 ` INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Neil Brown
2008-07-21 23:43   ` Mr. James W. Laferriere
2008-07-22  2:20     ` Dave Chinner
2008-07-22  1:25   ` Timothy Shimmin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox