public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Roger Heflin <rheflin@atipa.com>
To: Charles Weber <chaweber@gmail.com>
Cc: linux-xfs@oss.sgi.com
Subject: Re: xfs partial dismount issue
Date: Mon, 05 Mar 2007 15:07:22 -0600	[thread overview]
Message-ID: <45EC868A.4060607@atipa.com> (raw)
In-Reply-To: <loom.20070305T190248-708@post.gmane.org>

Charles Weber wrote:
> Eric Sandeen <sandeen <at> sandeen.net> writes:
> 
>> Chuck Weber wrote:
>>> Hi everyone, I have a long running problem perhaps you can help with. I
>>> will include as much detail as I can. I can set up a spare server-disk
>>> set for testing if you have any bright ideas.
>>>
>>> We use XFS for samba and nfs on x86_64 Fedora Proliant DL585/385
>>> servers. Our busiest server has disk partitions go away. 
>> What do you mean by this, exactly?  The partitions themselves go away,
>> or are you talking about the problem described below where processes
>> start hanging?
>>
> Here is an example partition (1 of 6 or more xfs storage only).
> /share/store3 with samba shares on /share/store3/lls, lds, lxs and so on.
> I will get a call saying my groups share (lxs) is no longer accessable. I ssh
> into server and can ls /share/store3 but ls will hang when I ls
> /share/store3/lxs. Shortly there after ls will hang for the root or any
> directory on the partition. Other partitions will be fine and other samba shares
> will be fine until the queued up process load bogs the server down.
> 

Charles,

I have seen what may be a similar issue on SLES9SP2, we had 1 xfs
partition, and under certain conditions it would stop responding, all
non-xfs partitions were ok, and everything was fine after a reboot.

Under sysrq-t it appeared to me that 2 separate processes were calling
fsync and were causing each other to deadlock (and locking all others
out of changing the xfs partition).  I was not able to determine exactly
what the underlying bug was, but all of the hung processes
were waiting on locks in at least several widely different parts of the
xfs and kernel code, and adjusting the application to not fsync has
apparently resulted in the deadlock not occuring.   In this case
there were multiple (2-4) different instances of the application calling
fsync apparently sometimes at close to the same time.   With the
given application the failure was almost a certainly on one machine
(of 100) running the application overnight.

                            Roger

  reply	other threads:[~2007-03-05 21:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-05 13:13 xfs partial dismount issue Chuck Weber
2007-03-05 15:57 ` Eric Sandeen
2007-03-05 18:25   ` Charles Weber
2007-03-05 21:07     ` Roger Heflin [this message]
2007-04-02 21:18       ` Charles Weber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45EC868A.4060607@atipa.com \
    --to=rheflin@atipa.com \
    --cc=chaweber@gmail.com \
    --cc=linux-xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox