Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Timothy Shimmin <tes@sgi.com>
To: Neil Brown <neilb@suse.de>
Cc: "Mr. James W. Laferriere" <babydr@baby-dragons.com>,
	linux-raid maillist <linux-raid@vger.kernel.org>,
	xfs@oss.sgi.com
Subject: Re: INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd)
Date: Tue, 22 Jul 2008 11:25:35 +1000	[thread overview]
Message-ID: <4885370F.9000301@sgi.com> (raw)
In-Reply-To: <18565.6095.988483.628391@notabene.brown>

Neil Brown wrote:
> On Monday July 21, babydr@baby-dragons.com wrote:
>> INFO: task pdflush:393 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> pdflush       D c8209f80  4748   393      2
>>          f75e5e58 00000046 f7f7ad50 c8209f80 f7f7a8a0 f75e5e24 c014fc57 00000000
>>          f7f7a8a0 e5d0dd00 c8209f80 f75e4000 c0819e00 c8209f80 f7f7aaf4 f75e5e44
>>          00000286 f75e5e80 f510de30 f75e5e58 c0142233 f510de00 f75e5e80 f510de30
>> Call Trace:
>>    [<c014fc57>] ? mark_held_locks+0x67/0x80
>>    [<c0142233>] ? add_wait_queue+0x33/0x50
>>    [<c03a7f85>] xfs_buf_wait_unpin+0xb5/0xe0
>>    [<c0127a60>] ? default_wake_function+0x0/0x10
>>    [<c0127a60>] ? default_wake_function+0x0/0x10
>>    [<c03a84fb>] xfs_buf_iorequest+0x4b/0x80
>>    [<c03adeee>] xfs_bdstrat_cb+0x3e/0x50
>>    [<c03a495c>] xfs_bwrite+0x5c/0xe0
>>    [<c039e941>] xfs_syncsub+0x121/0x2b0
>>    [<c018a43b>] ? lock_super+0x1b/0x20
>>    [<c018a43b>] ? lock_super+0x1b/0x20
>>    [<c039e1d8>] xfs_sync+0x48/0x70
>>    [<c03af833>] xfs_fs_write_super+0x23/0x30
>>    [<c018a80f>] sync_supers+0xaf/0xc0
> 
> Looks a lot like an XFS problem to me.
> Or at least, XFS people would be able to interpret this stack the
> best.
> 
I presume if it is waiting in xfs_buf_wait_unpin() for a long
time (>2min) then maybe a journal-log io completion hasn't come
back to say that the matching buffer item has made to the ondisk log.
i.e the buffer hasn't been unpinned yet (pincount>0) which is supposed
to happen when its data hits the ondisk log.
 
>>    [<c0169259>] wb_kupdate+0x29/0x100
>>    [<c016a0cc>] ? __pdflush+0xcc/0x1a0
>>    [<c016a0d2>] __pdflush+0xd2/0x1a0
>>    [<c016a1a0>] ? pdflush+0x0/0x40
>>    [<c016a1d1>] pdflush+0x31/0x40
>>    [<c0169230>] ? wb_kupdate+0x0/0x100
>>    [<c016a1a0>] ? pdflush+0x0/0x40
>>    [<c0141e2c>] kthread+0x5c/0xa0
>>    [<c0141dd0>] ? kthread+0x0/0xa0
>>    [<c0103d67>] kernel_thread_helper+0x7/0x10
>>    =======================
>> 2 locks held by pdflush/393:
>>    #0:  (&type->s_umount_key#17){----}, at: [<c018a7b2>] sync_supers+0x52/0xc0
>>    #1:  (&type->s_lock_key#7){--..}, at: [<c018a43b>] lock_super+0x1b/0x20
>>
>>     ...snip... Repeats of above message ad-infintum .
> 
> 
> Hmm... I guess I clipped a bit too much for our XFS friends to know
> the context.
> bonnie is being run on an XFS filesystem on md/raid6. and it gets
> this warning a lot and essentially hangs.
> 
Just for the record,
in rc-9 we hadn't removed the QUEUE_ORDERED tag check yet and
so I presume for md/raid6, barriers will be disabled.
So barrier writes on the log won't be being issued.
I don't see that as anything to do with the problem here -
that is more of an issue on replay if we have the cache on
and no barrier support - I just thought I'd mention it.

--Tim

     prev parent reply	other threads:[~2008-07-22  1:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-21 17:40 INFO: task pdflush:393 blocked for more than 120 seconds. & Call traces ... (fwd) Mr. James W. Laferriere
2008-07-21 18:17 ` Randy Dunlap
2008-07-21 23:12 ` Neil Brown
2008-07-21 23:43   ` Mr. James W. Laferriere
2008-07-22  0:57     ` Richard Scobie
2008-07-22  2:20     ` Dave Chinner
2008-07-27  1:47       ` i/O Thruput to devices in a raidset not even Mr. James W. Laferriere
2008-07-22  1:25   ` Timothy Shimmin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4885370F.9000301@sgi.com \
    --to=tes@sgi.com \
    --cc=babydr@baby-dragons.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).