All of lore.kernel.org
 help / color / mirror / Atom feed
From: Orion Poplawski <orion@cora.nwra.com>
To: linux-nfs@vger.kernel.org
Subject: nfs4 mount hanging suddenly
Date: Wed, 29 Feb 2012 15:29:36 -0700	[thread overview]
Message-ID: <4F4EA6D0.30606@cora.nwra.com> (raw)

Just starting today, one of our user's nfs mounted home directory has started 
locking up.  Client is Fedora 16 32-bit, server is CentOS 5.7 32-bit.  Have 
not seen this particular problem elsewhere (yet).

I captured this trace on the server after the hang:

http://sw.cora.nwra.com/tmp/marie-nfs-home-lwang-hang.pcap

   1   0.000000  10.10.20.15 -> 10.10.10.1   NFS V4 COMP Call <EMPTY> 
PUTFH;GETATTR GETATTR
   2   0.000133   10.10.10.1 -> 10.10.20.15  NFS V4 COMP Reply (Call In 1) 
<EMPTY> PUTFH;GETATTR GETATTR
   3   0.000421  10.10.20.15 -> 10.10.10.1   TCP 879 > nfs [ACK] Seq=137 
Ack=225 Win=17738 Len=0 TSV=3584653 TSER=2438333196
   4   0.000519  10.10.20.15 -> 10.10.10.1   NFS V4 COMP Call <EMPTY> 
PUTFH;ACCESS ACCESS;GETATTR GETATTR
   5   0.000587   10.10.10.1 -> 10.10.20.15  NFS V4 COMP Reply (Call In 4) 
<EMPTY> PUTFH;ACCESS ACCESS;GETATTR GETATTR[Unreassembled Packet [incorrect 
TCP checksum]]
   6   0.040522  10.10.20.15 -> 10.10.10.1   TCP 879 > nfs [ACK] Seq=289 
Ack=465 Win=17738 Len=0 TSV=3584694 TSER=2438333196
   7   0.451636  10.10.20.15 -> 10.10.10.1   NFS V4 COMP Call <EMPTY> 
PUTFH;SAVEFH SAVEFH;OPEN OPEN;DELEGRETURN DELEGRETURN;Unknown
   8   0.451892   10.10.10.1 -> 10.10.20.15  NFS V4 COMP Reply (Call In 7) 
<EMPTY> PUTFH;SAVEFH SAVEFH;OPEN OPEN(10008)
   9   0.452164  10.10.20.15 -> 10.10.10.1   TCP 879 > nfs [ACK] Seq=529 
Ack=529 Win=17738 Len=0 TSV=3585105 TSER=2438333648
.....
120  53.161949  10.10.20.15 -> 10.10.10.1   NFS V4 COMP Call <EMPTY> 
PUTFH;GETATTR GETATTR
121  53.162281   10.10.10.1 -> 10.10.20.15  NFS V4 COMP Reply (Call In 120) 
<EMPTY> PUTFH;GETATTR GETATTR
122  53.162596  10.10.20.15 -> 10.10.10.1   TCP 879 > nfs [ACK] Seq=8205 
Ack=10341 Win=17738 Len=0 TSV=3637816 TSER=2438386366
123  53.162680  10.10.20.15 -> 10.10.10.1   NFS V4 COMP Call <EMPTY> 
PUTFH;GETATTR GETATTR
124  53.162748   10.10.10.1 -> 10.10.20.15  NFS V4 COMP Reply (Call In 123) 
<EMPTY> PUTFH;GETATTR GETATTR[Unreassembled Packet [incorrect TCP checksum]]
125  53.163245  10.10.20.15 -> 10.10.10.1   NFS V4 COMP Call <EMPTY> 
PUTFH;GETATTR GETATTR
126  53.163418   10.10.10.1 -> 10.10.20.15  NFS V4 COMP Reply (Call In 125) 
<EMPTY> PUTFH;GETATTR GETATTR
127  53.203530  10.10.20.15 -> 10.10.10.1   TCP 879 > nfs [ACK] Seq=8493 
Ack=10685 Win=17738 Len=0 TSV=3637857 TSER=2438386368
128  53.450308  10.10.20.15 -> 10.10.10.1   NFS V4 COMP Call <EMPTY> 
PUTFH;ACCESS ACCESS;GETATTR GETATTR
129  53.450457   10.10.10.1 -> 10.10.20.15  NFS V4 COMP Reply (Call In 128) 
<EMPTY> PUTFH;ACCESS ACCESS;GETATTR GETATTR[Unreassembled Packet [incorrect 
TCP checksum]]
130  53.450671  10.10.20.15 -> 10.10.10.1   TCP 879 > nfs [ACK] Seq=8645 
Ack=10925 Win=17738 Len=0 TSV=3638104 TSER=2438386655


I was not able to find any error messages anywhere.  Server has been up 28 
days.  Client was up for 14 days before first hang, then 2 more today.  Home 
directories are automounted and I was able to access a different home 
directory that is served off the save server and filesystem.

client kernels: 3.2.3-2.fc16.i68, 3.2.7-1.fc16.i68
server kernel: 2.6.18-274.17.1.el5

earth:/export/home/lwang on /home/lwang type nfs4 
(rw,noatime,vers=4,rsize=32768,wsize=32768,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.20.15,minorversion=0,local_lock=none,addr=10.10.10.1)

There is a newer nfs-utils:
Jan 24 03:34:43 Updated: 1:nfs-utils-1.2.5-4.fc16.i686

may try backing that off, but doesn't seem like a big change:

* Mon Jan 16 2012 Steve Dickson <steved@redhat.com> 1.2.5-4
- Reworked how the nfsd service requires the rpcbind service (bz 768550)

and seems to only affect nfs-server.

Anything else to check?

TIA,

  Orion

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder Office                  FAX: 303-415-9702
3380 Mitchell Lane                  orion@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

             reply	other threads:[~2012-02-29 22:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-29 22:29 Orion Poplawski [this message]
2012-02-29 23:17 ` nfs4 mount hanging suddenly J. Bruce Fields
2012-02-29 23:21   ` Orion Poplawski
2012-03-01 13:50     ` Myklebust, Trond
2012-03-01 15:34       ` Orion Poplawski
2012-03-01 19:28         ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4EA6D0.30606@cora.nwra.com \
    --to=orion@cora.nwra.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.