From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Edward Hibbert" Subject: Stale NFS mounts Date: Tue, 1 Nov 2005 10:01:45 -0000 Message-ID: <9E621559095AA746B22B21CD4FAF99A54E3126@EDINMAIL1.ad.datcon.co.uk> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C5DECB.448B3629" Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.91] helo=mail.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1EWsxn-0004SB-GU for nfs@lists.sourceforge.net; Tue, 01 Nov 2005 02:01:55 -0800 Received: from smtp2.dataconnection.com ([192.91.191.8]) by mail.sourceforge.net with esmtp (Exim 4.44) id 1EWsxl-000534-6i for nfs@lists.sourceforge.net; Tue, 01 Nov 2005 02:01:55 -0800 To: Sender: nfs-admin@lists.sourceforge.net Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: This is a multi-part message in MIME format. ------_=_NextPart_001_01C5DECB.448B3629 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I'm having a problem with stale mount points. What we see is that under load all the clients get stale mount points pretty much simultaneously. We can rule out various things: * There's no indication that the NFS server has rebooted (uptime indicates it hasn't). * There's nothing in /var/log/messages to indicate a problem. =20 * The nfsd processes have a start time that matches uptime, so they haven't restarted. =20 * We know that the directories we're mounting are not getting renamed/deleted. We can cure this by a client reboot, but obviously that's massively disruptive. We're wondering whether this is related to nested mounts - the clients mount a top-level directory, and then subdirectories within it. There's circumstantial but not conclusive evidence that we've only seen stale handles on nested mounts. So: * Are nested mounts safe to use? * Are there any common causes of stale mounts which I might not be aware of? * Any suggestions for how to investigate this further? The NFS server is a Fedora system with kernel 2.6.9-1.667smp. The NFS clients are RedHat AS 4.0 2.6.9-11.ELsmp. Do ask for more details if needed. Regards, Edward. ------_=_NextPart_001_01C5DECB.448B3629 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Stale NFS mounts

I'm having a problem = with stale mount points.

What we see is that = under load all the clients get stale mount points pretty much = simultaneously.  We can rule out various things:

  • There's no = indication that the NFS server has rebooted (uptime indicates it = hasn't).
  • There's nothing in = /var/log/messages to indicate a problem. 
  • The nfsd processes = have a start time that matches uptime, so they haven't restarted.  =
  • We know that the = directories we're mounting are not getting renamed/deleted.

We can cure this by a = client reboot, but obviously that's massively disruptive.

We're wondering = whether this is related to nested mounts - the clients mount a top-level = directory, and then subdirectories within it.  There's = circumstantial but not conclusive evidence that we've only seen stale = handles on nested mounts.

So:

  • Are nested mounts = safe to use?
  • Are there any common = causes of stale mounts which I might not be aware of?
  • Any suggestions for = how to investigate this further?

The NFS server is a = Fedora system with kernel 2.6.9-1.667smp.  The NFS clients are = RedHat AS 4.0 2.6.9-11.ELsmp.  Do ask for more details if = needed.

Regards,

Edward.

------_=_NextPart_001_01C5DECB.448B3629-- ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today * Register for a JBoss Training Course Free Certification Exam for All Training Attendees Through End of 2005 Visit http://www.jboss.com/services/certification for more information _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Brown Subject: Re: Stale NFS mounts Date: Tue, 1 Nov 2005 21:11:47 +1100 Message-ID: <17255.16227.712240.839790@cse.unsw.edu.au> References: <9E621559095AA746B22B21CD4FAF99A54E3126@EDINMAIL1.ad.datcon.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.91] helo=mail.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1EWt7c-0004lU-Nx for nfs@lists.sourceforge.net; Tue, 01 Nov 2005 02:12:04 -0800 Received: from mail.suse.de ([195.135.220.2] helo=mx1.suse.de) by mail.sourceforge.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.44) id 1EWt7c-00089p-77 for nfs@lists.sourceforge.net; Tue, 01 Nov 2005 02:12:04 -0800 To: "Edward Hibbert" In-Reply-To: message from Edward Hibbert on Tuesday November 1 Sender: nfs-admin@lists.sourceforge.net Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: On Tuesday November 1, Edward.Hibbert@dataconnection.com wrote: > > So: > * Are nested mounts safe to use? Should be. > * Are there any common causes of stale mounts which I might not be > aware of? Not sure... > * Any suggestions for how to investigate this further? See below. > > The NFS server is a Fedora system with kernel 2.6.9-1.667smp. The NFS > clients are RedHat AS 4.0 2.6.9-11.ELsmp. Do ask for more details if > needed. On the server: cat /etc/exports cat /proc/fs/nfsd/exports grep . /proc/net/rpc/*/content Do this while it is working, and then again when it isn't working. NeilBrown ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today * Register for a JBoss Training Course Free Certification Exam for All Training Attendees Through End of 2005 Visit http://www.jboss.com/services/certification for more information _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Edward Hibbert" Subject: RE: Stale NFS mounts Date: Tue, 1 Nov 2005 10:58:28 -0000 Message-ID: <9E621559095AA746B22B21CD4FAF99A54E3140@EDINMAIL1.ad.datcon.co.uk> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C5DED3.30E80C21" Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.91] helo=mail.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1EWtqf-0006S3-M1 for nfs@lists.sourceforge.net; Tue, 01 Nov 2005 02:58:37 -0800 Received: from smtp.dataconnection.com ([192.91.191.4]) by mail.sourceforge.net with esmtp (Exim 4.44) id 1EWtqe-00056S-6e for nfs@lists.sourceforge.net; Tue, 01 Nov 2005 02:58:37 -0800 To: Sender: nfs-admin@lists.sourceforge.net Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: This is a multi-part message in MIME format. ------_=_NextPart_001_01C5DED3.30E80C21 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable > > The NFS server is a Fedora system with kernel 2.6.9-1.667smp. The NFS > clients are RedHat AS 4.0 2.6.9-11.ELsmp. Do ask for more details if > needed. On the server: cat /etc/exports cat /proc/fs/nfsd/exports grep . /proc/net/rpc/*/content Ok, I've done this now while it's working - but what should I be looking for here? Maybe the last couple of lines changing? root[vitorbelfort]:/var/log> cat /etc/exports=20 /opt/dcl/data/disk1 *(rw,sync,no_root_squash) /opt/dcl/data/disk2 *(rw,sync,no_root_squash) root[vitorbelfort]:/var/log> cd /proc/fs/nfsd/exports=20 -bash: cd: /proc/fs/nfsd/exports: Not a directory root[vitorbelfort]:/var/log> cat /proc/fs/nfsd/exports =20 # Version 1.1 # Path Client(Flags) # IPs /opt/dcl/data/disk1 *(rw,no_root_squash,sync,wdelay) /opt/dcl/data/disk2 *(rw,no_root_squash,sync,wdelay) root[vitorbelfort]:/var/log> grep . /proc/net/rpc/*/content /proc/net/rpc/auth.unix.ip/content:#class IP domain /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.102 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.101 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.97 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.91 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.92 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.89 * /proc/net/rpc/nfs4.idtoname/content:#domain type id [name] /proc/net/rpc/nfs4.nametoid/content:#domain type name [id] /proc/net/rpc/nfsd.export/content:#path domain(flags) /proc/net/rpc/nfsd.export/content:/opt/dcl/data/disk1 *(rw,no_root_squash,sync,wdelay) /proc/net/rpc/nfsd.export/content:/opt/dcl/data/disk2 *(rw,no_root_squash,sync,wdelay) /proc/net/rpc/nfsd.fh/content:#domain fsidtype fsid [path] /proc/net/rpc/nfsd.fh/content:* 0 0x0300080000000080 /opt/dcl/data/disk1 /proc/net/rpc/nfsd.fh/content:* 0 0x1100080000000080 /opt/dcl/data/disk2 Regards, Edward. ------_=_NextPart_001_01C5DED3.30E80C21 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

>
> The NFS server is a Fedora system with = kernel=20 2.6.9-1.667smp.  The NFS
> clients are RedHat AS 4.0=20 2.6.9-11.ELsmp.  Do ask for more details if
> = needed.

On the=20 server:
  cat /etc/exports
  cat = /proc/fs/nfsd/exports
 =20 grep . /proc/net/rpc/*/content

Ok, I've=20 done this now while it's working - but what should I be looking for = here? =20 Maybe the last couple of lines changing?

root[vitorbelfort]:/var/log> cat=20 /etc/exports
/opt/dcl/data/disk1=20 *(rw,sync,no_root_squash)
/opt/dcl/data/disk2=20 *(rw,sync,no_root_squash)
root[vitorbelfort]:/var/log> cd=20 /proc/fs/nfsd/exports
-bash: cd: /proc/fs/nfsd/exports: Not a=20 directory
root[vitorbelfort]:/var/log> cat = /proc/fs/nfsd/exports =20
# Version 1.1
# Path Client(Flags) #=20 IPs
/opt/dcl/data/disk1    =20 *(rw,no_root_squash,sync,wdelay)
/opt/dcl/data/disk2   =  =20 *(rw,no_root_squash,sync,wdelay)
root[vitorbelfort]:/var/log> =  =20 grep . = /proc/net/rpc/*/content
/proc/net/rpc/auth.unix.ip/content:#class IP=20 domain
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.102=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.101=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.97=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.91=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.92=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.89=20 *
/proc/net/rpc/nfs4.idtoname/content:#domain type id=20 [name]
/proc/net/rpc/nfs4.nametoid/content:#domain type name=20 [id]
/proc/net/rpc/nfsd.export/content:#path=20 domain(flags)
/proc/net/rpc/nfsd.export/content:/opt/dcl/data/disk1&nb= sp; =20 *(rw,no_root_squash,sync,wdelay)
/proc/net/rpc/nfsd.export/content:/op= t/dcl/data/disk2  =20 *(rw,no_root_squash,sync,wdelay)
/proc/net/rpc/nfsd.fh/content:#domain= =20 fsidtype fsid [path]
/proc/net/rpc/nfsd.fh/content:* 0 = 0x0300080000000080=20 /opt/dcl/data/disk1
/proc/net/rpc/nfsd.fh/content:* 0 = 0x1100080000000080=20 /opt/dcl/data/disk2

Regards,

Edward.

------_=_NextPart_001_01C5DED3.30E80C21-- ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today * Register for a JBoss Training Course Free Certification Exam for All Training Attendees Through End of 2005 Visit http://www.jboss.com/services/certification for more information _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Edward Hibbert" Subject: RE: Stale NFS mounts Date: Tue, 1 Nov 2005 15:29:21 -0000 Message-ID: <9E621559095AA746B22B21CD4FAF99A54E3197@EDINMAIL1.ad.datcon.co.uk> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C5DEF9.085B485D" Return-path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.92] helo=mail.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1EWy4s-0007yo-AG for nfs@lists.sourceforge.net; Tue, 01 Nov 2005 07:29:34 -0800 Received: from smtp.dataconnection.com ([192.91.191.4]) by mail.sourceforge.net with esmtp (Exim 4.44) id 1EWy4r-0006yB-7K for nfs@lists.sourceforge.net; Tue, 01 Nov 2005 07:29:34 -0800 To: Sender: nfs-admin@lists.sourceforge.net Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: This is a multi-part message in MIME format. ------_=_NextPart_001_01C5DEF9.085B485D Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Bit more on this. =20 =20 If we export a directory X, then mount subdirectories within it (so from a client, mount X/Y) then we see the problem. If we export X/Y and mount X/Y then we don't. =20 Can anyone suggest any reason for this? =20 Regards, =20 Edward. ________________________________ From: Edward Hibbert=20 Sent: 01 November 2005 10:58 To: 'nfs@lists.sourceforge.net' Subject: RE: [NFS] Stale NFS mounts > > The NFS server is a Fedora system with kernel 2.6.9-1.667smp. The NFS > clients are RedHat AS 4.0 2.6.9-11.ELsmp. Do ask for more details if > needed. On the server: cat /etc/exports cat /proc/fs/nfsd/exports grep . /proc/net/rpc/*/content Ok, I've done this now while it's working - but what should I be looking for here? Maybe the last couple of lines changing? root[vitorbelfort]:/var/log> cat /etc/exports=20 /opt/dcl/data/disk1 *(rw,sync,no_root_squash) /opt/dcl/data/disk2 *(rw,sync,no_root_squash) root[vitorbelfort]:/var/log> cd /proc/fs/nfsd/exports=20 -bash: cd: /proc/fs/nfsd/exports: Not a directory root[vitorbelfort]:/var/log> cat /proc/fs/nfsd/exports =20 # Version 1.1 # Path Client(Flags) # IPs /opt/dcl/data/disk1 *(rw,no_root_squash,sync,wdelay) /opt/dcl/data/disk2 *(rw,no_root_squash,sync,wdelay) root[vitorbelfort]:/var/log> grep . /proc/net/rpc/*/content /proc/net/rpc/auth.unix.ip/content:#class IP domain /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.102 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.101 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.97 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.91 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.92 * /proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.89 * /proc/net/rpc/nfs4.idtoname/content:#domain type id [name] /proc/net/rpc/nfs4.nametoid/content:#domain type name [id] /proc/net/rpc/nfsd.export/content:#path domain(flags) /proc/net/rpc/nfsd.export/content:/opt/dcl/data/disk1 *(rw,no_root_squash,sync,wdelay) /proc/net/rpc/nfsd.export/content:/opt/dcl/data/disk2 *(rw,no_root_squash,sync,wdelay) /proc/net/rpc/nfsd.fh/content:#domain fsidtype fsid [path] /proc/net/rpc/nfsd.fh/content:* 0 0x0300080000000080 /opt/dcl/data/disk1 /proc/net/rpc/nfsd.fh/content:* 0 0x1100080000000080 /opt/dcl/data/disk2 Regards, Edward. ------_=_NextPart_001_01C5DEF9.085B485D Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Bit more on this. 
 
If we export a directory X, then mount = subdirectories=20 within it (so from a client, mount X/Y) then we see the=20 problem.
If we export X/Y and mount X/Y then we=20 don't.
 
Can anyone suggest any reason for=20 this?
 
Regards,
 
Edward.


From: Edward Hibbert =
Sent: 01=20 November 2005 10:58
To: = 'nfs@lists.sourceforge.net'
Subject:=20 RE: [NFS] Stale NFS mounts

>
> The NFS server is a Fedora system with = kernel=20 2.6.9-1.667smp.  The NFS
> clients are RedHat AS 4.0=20 2.6.9-11.ELsmp.  Do ask for more details if
> = needed.

On the=20 server:
  cat /etc/exports
  cat = /proc/fs/nfsd/exports
 =20 grep . /proc/net/rpc/*/content

Ok, I've=20 done this now while it's working - but what should I be looking for = here? =20 Maybe the last couple of lines changing?

root[vitorbelfort]:/var/log> cat=20 /etc/exports
/opt/dcl/data/disk1=20 *(rw,sync,no_root_squash)
/opt/dcl/data/disk2=20 *(rw,sync,no_root_squash)
root[vitorbelfort]:/var/log> cd=20 /proc/fs/nfsd/exports
-bash: cd: /proc/fs/nfsd/exports: Not a=20 directory
root[vitorbelfort]:/var/log> cat = /proc/fs/nfsd/exports =20
# Version 1.1
# Path Client(Flags) #=20 IPs
/opt/dcl/data/disk1    =20 *(rw,no_root_squash,sync,wdelay)
/opt/dcl/data/disk2   =  =20 *(rw,no_root_squash,sync,wdelay)
root[vitorbelfort]:/var/log> =  =20 grep . = /proc/net/rpc/*/content
/proc/net/rpc/auth.unix.ip/content:#class IP=20 domain
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.102=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.101=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.97=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.91=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.92=20 *
/proc/net/rpc/auth.unix.ip/content:nfsd 172.19.15.89=20 *
/proc/net/rpc/nfs4.idtoname/content:#domain type id=20 [name]
/proc/net/rpc/nfs4.nametoid/content:#domain type name=20 [id]
/proc/net/rpc/nfsd.export/content:#path=20 domain(flags)
/proc/net/rpc/nfsd.export/content:/opt/dcl/data/disk1&nb= sp; =20 *(rw,no_root_squash,sync,wdelay)
/proc/net/rpc/nfsd.export/content:/op= t/dcl/data/disk2  =20 *(rw,no_root_squash,sync,wdelay)
/proc/net/rpc/nfsd.fh/content:#domain= =20 fsidtype fsid [path]
/proc/net/rpc/nfsd.fh/content:* 0 = 0x0300080000000080=20 /opt/dcl/data/disk1
/proc/net/rpc/nfsd.fh/content:* 0 = 0x1100080000000080=20 /opt/dcl/data/disk2

Regards,

Edward.

------_=_NextPart_001_01C5DEF9.085B485D-- ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Jolk Subject: Re: Stale NFS mounts Date: Wed, 02 Nov 2005 10:07:04 +0100 Message-ID: <436881B8.5040403@buf.com> References: <9E621559095AA746B22B21CD4FAF99A54E3126@EDINMAIL1.ad.datcon.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Cc: nfs@lists.sourceforge.net Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.91] helo=mail.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1EXEaR-0005ZY-BA for nfs@lists.sourceforge.net; Wed, 02 Nov 2005 01:07:15 -0800 Received: from mail.buf.fr ([195.68.52.65]) by mail.sourceforge.net with esmtps (TLSv1:DES-CBC3-SHA:168) (Exim 4.44) id 1EXEaO-0008IN-OV for nfs@lists.sourceforge.net; Wed, 02 Nov 2005 01:07:15 -0800 To: Edward Hibbert In-Reply-To: <9E621559095AA746B22B21CD4FAF99A54E3126@EDINMAIL1.ad.datcon.co.uk> Sender: nfs-admin@lists.sourceforge.net Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: Edward Hibbert wrote: > I'm having a problem with stale mount points. Might this be related to the problem described in http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-09/7161.html? You could try the export option `no_subtree_check' to see whether that helps. Alex -- Alexander Jolk * BUF Compagnie * alexj@buf.com Tel +33-1 42 68 18 28 * Fax +33-1 42 68 18 29 ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs