From: Kris Kersey <kkersey@steelbox.com>
To: David Chinner <dgc@sgi.com>
Cc: xfs@oss.sgi.com, Bill Vaughan <billv@steelbox.com>
Subject: Re: pdflush hang on xlog_grant_log_space()
Date: Mon, 10 Mar 2008 07:48:00 -0400 [thread overview]
Message-ID: <47D51FF0.2080000@steelbox.com> (raw)
In-Reply-To: <20080307223510.GM155407@sgi.com>
Thank you for your help. Two questions:
1) Can you define "much larger number"? I know you recently increased
this number from 10 to 1000, so should I increase it to 10,000? 100,000?
2) Is this a fix or a work-around? If this is a work-around, is there a
fix in the works? Can you explain the issue a bit, or if it's been
covered, can you point me to the explanation? I'd just like to
understand what's going on.
Thanks,
Kris Kersey
David Chinner wrote:
> On Thu, Mar 06, 2008 at 04:31:27PM -0500, Kris Kersey wrote:
>> Hello,
>>
>> I'm working on a NAS product and we're currently having lock-ups that
>> seem to be hanging in XFS code. We're running a NAS that has 1024 NFSD
>> threads accessing three RAID mounts. All three mounts are running XFS
>> file systems. Lately we've had random lockups on these boxes and I am
>> now running a kernel with KDB built-in.
>>
>> The lock-up takes the form of all NFSD threads in D state with one out
>> of three pdflush threads in D state. The assumption can be made that
>> all NFSD threads are waiting on the one pdflush thread to complete. So
>> two times now when an NAS has gotten in this state I have accessed KDB
>> and ran a stack trace on the pdflush thread. Both times the thread was
>> stuck on xlog_grant_log_space+0xdb.
>
> Try bumping XFS_TRANS_PUSH_AIL_RESTARTS to a much larger number and
> seeing if the problem goes away....
>
> Alternatively, that restart hack is backed by a "watchdog" timeout
> in 2.6.25-rc1, so if that is the cause of the problem perhaps the
> latest -rcX kernel will prevent the hang?
>
> BTW, you can get all the traces of D state threads through the sysrq
> interface, so you don't need to drop into kdb to get this.....
>
> Cheers,
>
> Dave.
prev parent reply other threads:[~2008-03-10 11:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-06 21:31 pdflush hang on xlog_grant_log_space() Kris Kersey
2008-03-07 1:20 ` Mark Goodwin
2008-03-07 22:35 ` David Chinner
2008-03-10 11:48 ` Kris Kersey [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47D51FF0.2080000@steelbox.com \
--to=kkersey@steelbox.com \
--cc=billv@steelbox.com \
--cc=dgc@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.