All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@redhat.com>
To: Garrick Staples <garrick@usc.edu>
Cc: nfs@lists.sourceforge.net
Subject: Re: possible client stale filehandle bug?
Date: Wed, 26 Jan 2005 08:11:50 -0500	[thread overview]
Message-ID: <41F79716.6020303@redhat.com> (raw)
In-Reply-To: <20050126063547.GS12269@polop.usc.edu>

[-- Attachment #1: Type: text/plain, Size: 2949 bytes --]

Garrick Staples wrote:
> On Tue, Jan 25, 2005 at 10:06:27PM -0800, Trond Myklebust alleged:
> 
>>ty den 25.01.2005 Klokka 09:39 (-0800) skreiv Garrick Staples:
>>
>>>Hi all,
>>>   I have lots of storage in a large Solaris samfs environment that is NFS
>>>shared to a large number of Solaris and RHEL3 clients.  Under some conditions,
>>>linux apps have been getting stale filehandles during the normal course of
>>>their activity.  Various file handling syscalls like read() or open() might
>>>error.  Lots of renames and setattrs calls seem to trigger the problem.  
>>>'ci' and 'cvs commit' are particularly good at this.
>>
>>ESTALE is usually a sign that someone is deleting a file on the server
>>that is in use by the client. It is a sign that you are doing something
>>that violates the caching rules of NFS.
> 
> 
> Nothing of the kind is happening here.  I've tested this a thousand times over
> the last few days trying to find a solution.  In this case, Sun's samfs
> filesystem is definitely at fault and doing the wrong thing.  Backline
> engineers at Sun confirm this and are working on a fix.  
> 
> The reason for _this_ email isn't because of the ESTALEs, it's regarding the
> handling of the ESTALEs.  Right now I need the Solaris client behaviour to
> deal with this particular buggy server.
> 
> Incidentally, 2.6.10 never has a problem.  It's behaviour never creates ESTALEs in
> the first place.
> 
>  
> 
>>>It seems that the Solaris clients never report any such errors, only the Linux
>>>clients.  However, watching 'snoop' on the Solaris NFS server, I see that it IS
>>>returning stale file handles to both OSes, but Solaris clients seem to retry
>>>the request several times; and the Linux clients immediately pass the error up
>>>to the application.
>>>
>>>Is there some condition that the 2.4 kernel is handling incorrectly?
>>
>>I do not believe that Solaris redrives ESTALE on read, but they may do
>>it on open(). Linux does not redrive either case. See the many
>>discussions in the NFS list archives for why.
> 
> 
> Did you look at the 'snoop' bits in the previous email?  During that time, the
> process on the Solaris client is hanging in a write() call.
> 
> I'd be very happy to see any patches lieing around that might do this
> behaviour.  It would get me through the short term until Sun fixes this bug in
> samfs.
> 

The easiest, cleanest thing to do is retry the operation in the sys_* 
function.  Red Hat's latest kernel does this to prevent ESTALE.  I've 
attached the patch for your reference.  Mind you, due to the reasons 
you'll find in the list archives that Trond referenced, you'll not see 
this get into upstream kernels, so caveat emptor.

HTH
Neil


-- 
/***************************************************
  *Neil Horman
  *Software Engineer
  *Red Hat, Inc.
  *nhorman@redhat.com
  *gpg keyid: 1024D / 0x92A74FA1
  *http://pgp.mit.edu
  ***************************************************/

[-- Attachment #2: linux-2.4.21-nfs-estale.patch --]
[-- Type: text/plain, Size: 5410 bytes --]

diff -urNp linux-6090/fs/nfs/inode.c linux-6091/fs/nfs/inode.c
--- linux-6090/fs/nfs/inode.c
+++ linux-6091/fs/nfs/inode.c
@@ -1022,7 +1022,15 @@ int
 nfs_revalidate(struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
-	return nfs_revalidate_inode(NFS_SERVER(inode), inode);
+	int error;
+
+	error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+	if (error == -ESTALE) {
+		nfs_zap_caches(dentry->d_parent->d_inode);
+		d_drop(dentry);
+	}
+
+	return error;
 }
 
 /*
diff -urNp linux-6090/fs/nfs/xattr.c linux-6091/fs/nfs/xattr.c
--- linux-6090/fs/nfs/xattr.c
+++ linux-6091/fs/nfs/xattr.c
@@ -53,8 +53,13 @@ nfs_getxattr(struct dentry *dentry, cons
 	acl = ERR_PTR(-EOPNOTSUPP);
 	if (NFS_PROTO(inode)->version == 3 && NFS_PROTO(inode)->getacl)
 		acl = NFS_PROTO(inode)->getacl(inode, type);
-	if (IS_ERR(acl))
+	if (IS_ERR(acl)) {
+		if (PTR_ERR(acl) == -ESTALE) {
+			nfs_zap_caches(dentry->d_parent->d_inode);
+			d_drop(dentry);
+		}
 		return PTR_ERR(acl);
+	}
 	else if (acl) {
 		if (type == ACL_TYPE_ACCESS && acl->a_count == 0)
 			error = -ENODATA;
diff -urNp linux-6090/fs/open.c linux-6091/fs/open.c
--- linux-6090/fs/open.c
+++ linux-6091/fs/open.c
@@ -807,6 +807,30 @@ asmlinkage long sys_open(const char * fi
 		if (fd >= 0) {
 			struct file *f = filp_open(tmp, flags, mode);
 			error = PTR_ERR(f);
+
+			/*
+			 * ESTALE errors can be a pain.  On some
+			 * filesystems (e.g. NFS), ESTALE can often
+			 * be resolved by retry, as the ESTALE resulted
+			 * in a cache invalidation.  We perform this
+			 * retry here, once for every directory element
+			 * in the path to avoid the case where the removal
+			 * of the nth parent directory of the file we're
+			 * trying to open results in n ESTALE errors.
+			 */
+			if (error == -ESTALE) {
+				int nretries = 1;
+				char *cp;
+
+				for (cp = tmp; *cp; cp++) {
+					if (*cp == '/')
+						nretries++;
+				}
+				do {
+					f = filp_open(tmp, flags, mode);
+					error = PTR_ERR(f);
+				} while (error == -ESTALE && --nretries > 0);
+			}
 			if (IS_ERR(f))
 				goto out_error;
 			fd_install(fd, f);
diff -urNp linux-6090/fs/stat.c linux-6091/fs/stat.c
--- linux-6090/fs/stat.c
+++ linux-6091/fs/stat.c
@@ -143,8 +143,9 @@ static int cp_new_stat(struct inode * in
 asmlinkage long sys_stat(char * filename, struct __old_kernel_stat * statbuf)
 {
 	struct nameidata nd;
-	int error;
+	int error, errcnt = 0;
 
+again:
 	error = user_path_walk(filename, &nd);
 	if (!error) {
 		error = do_revalidate(nd.dentry);
@@ -152,6 +153,10 @@ asmlinkage long sys_stat(char * filename
 			error = cp_old_stat(nd.dentry->d_inode, statbuf);
 		path_release(&nd);
 	}
+	if (error == -ESTALE && !errcnt) {
+		errcnt++;
+		goto again;
+	}
 	return error;
 }
 #endif
@@ -159,8 +164,9 @@ asmlinkage long sys_stat(char * filename
 asmlinkage long sys_newstat(char * filename, struct stat * statbuf)
 {
 	struct nameidata nd;
-	int error;
+	int error, errcnt = 0;
 
+again:
 	error = user_path_walk(filename, &nd);
 	if (!error) {
 		error = do_revalidate(nd.dentry);
@@ -168,6 +174,11 @@ asmlinkage long sys_newstat(char * filen
 			error = cp_new_stat(nd.dentry->d_inode, statbuf);
 		path_release(&nd);
 	}
+	if (error == -ESTALE && !errcnt) {
+		errcnt++;
+		goto again;
+	}
+
 	return error;
 }
 
@@ -180,8 +191,9 @@ asmlinkage long sys_newstat(char * filen
 asmlinkage long sys_lstat(char * filename, struct __old_kernel_stat * statbuf)
 {
 	struct nameidata nd;
-	int error;
+	int error, errcnt = 0;
 
+again:
 	error = user_path_walk_link(filename, &nd);
 	if (!error) {
 		error = do_revalidate(nd.dentry);
@@ -189,6 +201,11 @@ asmlinkage long sys_lstat(char * filenam
 			error = cp_old_stat(nd.dentry->d_inode, statbuf);
 		path_release(&nd);
 	}
+	if (error == -ESTALE && !errcnt) {
+		errcnt++;
+		goto again;
+	}
+
 	return error;
 }
 
@@ -197,8 +214,9 @@ asmlinkage long sys_lstat(char * filenam
 asmlinkage long sys_newlstat(char * filename, struct stat * statbuf)
 {
 	struct nameidata nd;
-	int error;
+	int error, errcnt = 0;
 
+again:
 	error = user_path_walk_link(filename, &nd);
 	if (!error) {
 		error = do_revalidate(nd.dentry);
@@ -206,6 +224,12 @@ asmlinkage long sys_newlstat(char * file
 			error = cp_new_stat(nd.dentry->d_inode, statbuf);
 		path_release(&nd);
 	}
+
+	if (error == -ESTALE && !errcnt) {
+		errcnt++;
+		goto again;
+	}
+
 	return error;
 }
 
@@ -340,8 +364,9 @@ static long cp_new_stat64(struct inode *
 asmlinkage long sys_stat64(char * filename, struct stat64 * statbuf, long flags)
 {
 	struct nameidata nd;
-	int error;
+	int error, errcnt = 0;
 
+again:
 	error = user_path_walk(filename, &nd);
 	if (!error) {
 		error = do_revalidate(nd.dentry);
@@ -349,14 +374,20 @@ asmlinkage long sys_stat64(char * filena
 			error = cp_new_stat64(nd.dentry->d_inode, statbuf);
 		path_release(&nd);
 	}
+	if (error == -ESTALE && !errcnt) {
+		errcnt++;
+		goto again;
+	}
+
 	return error;
 }
 
 asmlinkage long sys_lstat64(char * filename, struct stat64 * statbuf, long flags)
 {
 	struct nameidata nd;
-	int error;
+	int error, errcnt = 0;
 
+again:
 	error = user_path_walk_link(filename, &nd);
 	if (!error) {
 		error = do_revalidate(nd.dentry);
@@ -364,6 +395,11 @@ asmlinkage long sys_lstat64(char * filen
 			error = cp_new_stat64(nd.dentry->d_inode, statbuf);
 		path_release(&nd);
 	}
+	if (error == -ESTALE && !errcnt) {
+		errcnt++;
+		goto again;
+	}
+
 	return error;
 }
 

  reply	other threads:[~2005-01-26 13:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-25 17:39 possible client stale filehandle bug? Garrick Staples
2005-01-26  6:06 ` Trond Myklebust
2005-01-26  6:35   ` Garrick Staples
2005-01-26 13:11     ` Neil Horman [this message]
2005-01-26 14:31     ` raven
2005-01-26 17:49       ` Garrick Staples
2005-01-28  0:49         ` Ian Kent
2005-01-26 13:07   ` Neil Horman
  -- strict thread matches above, loose matches on Subject: below --
2005-02-16 21:18 Lever, Charles
2005-02-16 21:23 ` Neil Horman
2005-02-24 19:33   ` Trond Myklebust
2005-02-24 20:43     ` nhorman
2005-02-16 21:42 Lever, Charles
2005-02-16 21:47 ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41F79716.6020303@redhat.com \
    --to=nhorman@redhat.com \
    --cc=garrick@usc.edu \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.