All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roger Heflin <rheflin@atipa.com>
To: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: nfs@lists.sourceforge.net, linux-kernel@vger.kernel.org
Subject: Re: Apparent Deadlock with nfsd/jfs on 2.6.21.1 under bonnie.
Date: Tue, 15 May 2007 17:06:38 -0500	[thread overview]
Message-ID: <464A2EEE.7070509@atipa.com> (raw)
In-Reply-To: <pan.2007.05.15.22.03.07.359081@linux.vnet.ibm.com>

Dave Kleikamp wrote:
> Sorry if I'm missing anyone on the reply, but my mail feed is messed up
> and I'm replying from the gmane archive.
> 
> On Tue, 15 May 2007 09:08:25 -0500, Roger Heflin wrote:
> 
>> Hello,
>>
>> Running 2.6.21.1 (FC6 Dist), with a RHEL client (client
>> appears to not be having issues) I am getting what I believe
>> is a deadlock on the server end.    This is with JFS and
>> NFSD, I have not tested yet with a non-JFS filesystem,
>> though our customer indicated that they have duplicated it with
>> the ext3 filesystem.
> 
> I don't have an answer to an ext3 deadlock, but this looks like a jfs
> problem that was recently fixed in linux-2.6.22-rc1.  I had intended to
> send it to the stable kernel after it was picked up in mainline, but
> hadn't gotten to it yet.
> 
> The patch is here:
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=05ec9e26be1f668ccba4ca54d9a4966c6208c611
> 

Ok.

My customer reported that he though he had a ext3, so far I have
not been able to duplicate the ext3 hang.

If ext3 survives until tomorrow, I will retest unpatched jfs, and then
patch it and test again.


>> The basic setup is:
>> fiber channel array -> qlogic fiber card -> /dev/sdx -> LVM stripe ->
>> jfs -> nfs.
>>
>> Running bonnie on a NFS share has apparently produced a deadlock.   I
>> have ran bonnie several times without having any issues, I don't believe
>> this is a HW issue, we have a couple of other machines configured with
>> slightly different HW and are also able to duplicate this problem on
>> those machines.  There are no abnormal messages in dmesg or in the
>> messages file.
>>
>> After having the apparent deadlock I started a dd of a on the deadlocked
>> filesystem and according to vmstat 1 that was actually working, I then
>> did a "mkdir junk" on the deadlocked filesystem and that apparently put
>> the cat into a permanent "D" state.   I will include the sysrq -t from
>> before the cat/mkdir and after the cat/mkdir.
>>
>> I believe I can duplicate this again, and other than the processes going
>> into the "D" state everything else seems to work.   Other filesytems
>> appear to be functional, I can still login to the machine.
>>
>> Right now the machine is in the deadlocked state, and I will wait for
>> any suggestions of more data to collect or other tests to try.
> 
> I haven't tried it on a locked-up system, but you may try waking up the
> [jfsIO] kernel thread with a signal.  I'm not sure what signals may get
> through, since the thread doesn't specifically act on a signal.
> 

I will try on the next lockup.

                    Roger

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

WARNING: multiple messages have this Message-ID (diff)
From: Roger Heflin <rheflin@atipa.com>
To: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, nfs@lists.sourceforge.net
Subject: Re: Apparent Deadlock with nfsd/jfs on 2.6.21.1 under bonnie.
Date: Tue, 15 May 2007 17:06:38 -0500	[thread overview]
Message-ID: <464A2EEE.7070509@atipa.com> (raw)
In-Reply-To: <pan.2007.05.15.22.03.07.359081@linux.vnet.ibm.com>

Dave Kleikamp wrote:
> Sorry if I'm missing anyone on the reply, but my mail feed is messed up
> and I'm replying from the gmane archive.
> 
> On Tue, 15 May 2007 09:08:25 -0500, Roger Heflin wrote:
> 
>> Hello,
>>
>> Running 2.6.21.1 (FC6 Dist), with a RHEL client (client
>> appears to not be having issues) I am getting what I believe
>> is a deadlock on the server end.    This is with JFS and
>> NFSD, I have not tested yet with a non-JFS filesystem,
>> though our customer indicated that they have duplicated it with
>> the ext3 filesystem.
> 
> I don't have an answer to an ext3 deadlock, but this looks like a jfs
> problem that was recently fixed in linux-2.6.22-rc1.  I had intended to
> send it to the stable kernel after it was picked up in mainline, but
> hadn't gotten to it yet.
> 
> The patch is here:
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=05ec9e26be1f668ccba4ca54d9a4966c6208c611
> 

Ok.

My customer reported that he though he had a ext3, so far I have
not been able to duplicate the ext3 hang.

If ext3 survives until tomorrow, I will retest unpatched jfs, and then
patch it and test again.


>> The basic setup is:
>> fiber channel array -> qlogic fiber card -> /dev/sdx -> LVM stripe ->
>> jfs -> nfs.
>>
>> Running bonnie on a NFS share has apparently produced a deadlock.   I
>> have ran bonnie several times without having any issues, I don't believe
>> this is a HW issue, we have a couple of other machines configured with
>> slightly different HW and are also able to duplicate this problem on
>> those machines.  There are no abnormal messages in dmesg or in the
>> messages file.
>>
>> After having the apparent deadlock I started a dd of a on the deadlocked
>> filesystem and according to vmstat 1 that was actually working, I then
>> did a "mkdir junk" on the deadlocked filesystem and that apparently put
>> the cat into a permanent "D" state.   I will include the sysrq -t from
>> before the cat/mkdir and after the cat/mkdir.
>>
>> I believe I can duplicate this again, and other than the processes going
>> into the "D" state everything else seems to work.   Other filesytems
>> appear to be functional, I can still login to the machine.
>>
>> Right now the machine is in the deadlocked state, and I will wait for
>> any suggestions of more data to collect or other tests to try.
> 
> I haven't tried it on a locked-up system, but you may try waking up the
> [jfsIO] kernel thread with a signal.  I'm not sure what signals may get
> through, since the thread doesn't specifically act on a signal.
> 

I will try on the next lockup.

                    Roger

  parent reply	other threads:[~2007-05-15 22:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-15 14:08 Apparent Deadlock with nfsd/jfs on 2.6.21.1 under bonnie Roger Heflin
     [not found] ` <pan.2007.05.15.22.03.07.359081@linux.vnet.ibm.com>
2007-05-15 22:06   ` Roger Heflin [this message]
2007-05-15 22:06     ` Roger Heflin
2007-05-17 14:37   ` Roger Heflin
2007-05-17 14:48     ` Dave Kleikamp
2007-05-29 17:16       ` Roger Heflin
2007-05-29 18:10         ` Dave Kleikamp
2007-05-29 18:14           ` Roger Heflin
2007-05-29 18:30             ` Dave Kleikamp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=464A2EEE.7070509@atipa.com \
    --to=rheflin@atipa.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nfs@lists.sourceforge.net \
    --cc=shaggy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.