* Rename dir on server can cause client to get ESTALE @ 2011-11-14 2:19 NeilBrown 2011-12-01 1:49 ` Rename dir on server can cause client to get ESTALE - this time with PATCH NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2011-11-14 2:19 UTC (permalink / raw) To: Trond Myklebust, NFS, Alexander Viro [-- Attachment #1: Type: text/plain, Size: 2011 bytes --] hi, I've run into another issue that seems to related to FS_REVAL_DOT. The script below makes the details precise, but the essence is that if I 'cd' into a directory on the client, then rename it on the server, then it is possible that the client will start getting ESTALE when accessing '.' - even though the directory still exists. The ESTALE is generated because nfs_lookup_revalidate fails on the dentry, so complete_walk (in fs/namei.c) gets failure from d_revalidate() and so sets the status to -ESTALE. nfs_lookup_revalidate fails because when it repeats the lookup it sees a different directory (as you will see the script creates a new directory with the old name). I think it only makes sense to do a ->lookup revalidate of the dentry at the end of the path when there was a real non '.' or '..' name leading to the dentry. If we were just looking up '.', we want to revalidate the inode, but not the dentry. Unfortunately I cannot see how that distinction could be introduced into the current path-walk code. Any ideas? Thanks, NeilBrown SERVER=eli # name of server. ssh access required. DIR=/home # directory on server to mount MPOINT=/mnt # location on client to mount it. TMP=/neilb/tmp # path to scratch area in $DIR sudo umount $MPOINT sudo mount -o vers=3 $SERVER:$DIR $MPOINT cd / ssh $SERVER "rm -r $DIR$TMP/*dir*" ssh $SERVER "mkdir $DIR$TMP/adir" while [ ! -d $MPOINT$TMP/adir ]; do echo -n . ; sleep 2; done cd $MPOINT$TMP/adir || exit echo "Entered directory" ls -la > /dev/null ssh $SERVER "cd $DIR$TMP; mv adir adir.moved" echo "Moved directory on server" ls -la > /dev/null echo -n "Waiting for move to be visible on client" while ls -la $MPOINT$TMP/adir >/dev/null 2>&1 do echo -n . sleep 3 (cd / ; ssh $SERVER "cd $DIR$TMP; mkdir bdir ; rmdir bdir" ) done echo echo "Make replacement directory on server" (cd / ; ssh $SERVER "cd $DIR$TMP; mkdir adir") ls -la $MPOINT$TMP/adir ls -la [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rename dir on server can cause client to get ESTALE - this time with PATCH 2011-11-14 2:19 Rename dir on server can cause client to get ESTALE NeilBrown @ 2011-12-01 1:49 ` NeilBrown 2011-12-01 2:12 ` Al Viro 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2011-12-01 1:49 UTC (permalink / raw) To: Trond Myklebust, Alexander Viro; +Cc: NFS [-- Attachment #1: Type: text/plain, Size: 4492 bytes --] On Mon, 14 Nov 2011 13:19:29 +1100 NeilBrown <neilb@suse.de> wrote: > > hi, > I've run into another issue that seems to related to FS_REVAL_DOT. > > The script below makes the details precise, but the essence is that if I 'cd' > into a directory on the client, then rename it on the server, then it is > possible that the client will start getting ESTALE when accessing '.' - even > though the directory still exists. > > The ESTALE is generated because nfs_lookup_revalidate fails on the dentry, so > complete_walk (in fs/namei.c) gets failure from d_revalidate() and so sets the > status to -ESTALE. > > nfs_lookup_revalidate fails because when it repeats the lookup it sees a > different directory (as you will see the script creates a new directory with > the old name). > > I think it only makes sense to do a ->lookup revalidate of the dentry at the > end of the path when there was a real non '.' or '..' name leading to the > dentry. If we were just looking up '.', we want to revalidate the inode, but > not the dentry. > > Unfortunately I cannot see how that distinction could be introduced into the > current path-walk code. > > Any ideas? > > Thanks, > NeilBrown > > > SERVER=eli # name of server. ssh access required. > DIR=/home # directory on server to mount > MPOINT=/mnt # location on client to mount it. > TMP=/neilb/tmp # path to scratch area in $DIR > > sudo umount $MPOINT > sudo mount -o vers=3 $SERVER:$DIR $MPOINT > > cd / > ssh $SERVER "rm -r $DIR$TMP/*dir*" > ssh $SERVER "mkdir $DIR$TMP/adir" > while [ ! -d $MPOINT$TMP/adir ]; > do echo -n . ; sleep 2; > done > cd $MPOINT$TMP/adir || exit > echo "Entered directory" > ls -la > /dev/null > ssh $SERVER "cd $DIR$TMP; mv adir adir.moved" > echo "Moved directory on server" > ls -la > /dev/null > echo -n "Waiting for move to be visible on client" > while ls -la $MPOINT$TMP/adir >/dev/null 2>&1 > do echo -n . > sleep 3 > (cd / ; ssh $SERVER "cd $DIR$TMP; mkdir bdir ; rmdir bdir" ) > done > echo > echo "Make replacement directory on server" > (cd / ; ssh $SERVER "cd $DIR$TMP; mkdir adir") > ls -la $MPOINT$TMP/adir > ls -la > .. but answer came there none.... I've looked some more at the code and now would like to propose a patch. This fixes it for me and feels right. Opinions? Thanks, NeilBrown From 7abb2d77b4c8d8ca340e372447467d8a47241f83 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Wed, 30 Nov 2011 18:35:13 +1100 Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly. When d_revalidate is called on a dentry because FS_REVAL_DOT is set it isn't really appropriate to revalidate the name. If the path was simply ".", then the current-working-directory could have been renamed on the server and should still be accessible as "." even if it has a new name. If the path was "/some/long/path/.", then the final component ("path" in this case) has already been revalidated and there is no particular need to do it again. If we change nd->last_type to refer to "the last component looked at" rather than just "the last component", then these cases can be detected by "nd->last_type != LAST_NORM". Signed-off-by: NeilBrown <neilb@suse.de> --- fs/namei.c | 2 +- fs/nfs/dir.c | 9 +++++++++ 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 5008f01..6a720f7 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1434,6 +1434,7 @@ static int link_path_walk(const char *name, struct nameidata *nd) } } + nd->last_type = type; /* remove trailing slashes? */ if (!c) goto last_component; @@ -1458,7 +1459,6 @@ static int link_path_walk(const char *name, struct nameidata *nd) last_component: nd->last = this; - nd->last_type = type; return 0; } terminate_walk(nd); diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index ac28990..f62827a 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1137,6 +1137,15 @@ static int nfs_lookup_revalidate(struct dentry *dentry, struct nameidata *nd) if (NFS_STALE(inode)) goto out_bad; + if (nd->last_type != LAST_NORM) { + /* name not relevant, just inode */ + error = nfs_revalidate_inode(NFS_SERVER(inode), inode); + if (error) + goto out_bad; + else + goto out_valid; + } + error = -ENOMEM; fhandle = nfs_alloc_fhandle(); fattr = nfs_alloc_fattr(); -- 1.7.7.3 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Rename dir on server can cause client to get ESTALE - this time with PATCH 2011-12-01 1:49 ` Rename dir on server can cause client to get ESTALE - this time with PATCH NeilBrown @ 2011-12-01 2:12 ` Al Viro 2011-12-01 2:24 ` Trond Myklebust 0 siblings, 1 reply; 5+ messages in thread From: Al Viro @ 2011-12-01 2:12 UTC (permalink / raw) To: NeilBrown; +Cc: Trond Myklebust, NFS On Thu, Dec 01, 2011 at 12:49:22PM +1100, NeilBrown wrote: > If the path was "/some/long/path/.", then the final component ("path" in > this case) has already been revalidated and there is no particular > need to do it again. > > If we change nd->last_type to refer to "the last component looked at" > rather than just "the last component", then these cases can be > detected by "nd->last_type != LAST_NORM". This is just plain wrong. Let's *not* bring more dependencies on nameidata into ->d_revalidate(). The goal is to get rid of it there... FWIW, if you want a really nasty bug in that area, consider this: mkdir /tmp/a mkdir /tmp/b echo "local file" >/tmp/x mount -t nfs4 $SOMETHING /tmp/a mount -t nfs4 $SOMETHING /tmp/b echo "NFS file" >/tmp/a/x mount --bind /tmp/x /tmp/a/x now try opening /tmp/b/x. And watch the NFS traffic; there won't be OPEN request for x on server. Why? Because NFS sees that x is a mountpoint in *some* instance of that filesystem. And decides that opening it would be wrong. And so it would, if we were asked to open /tmp/a/x. Alas, in this case, while dentry is the same, it does *not* have anything mounted on it. What we get is ->d_revalidate() returning without issuing OPEN and ->open() being called - again, without issuing OPEN, since it assumes that ->lookup() or ->d_revalidate() had done it for us. Plain IO on resulting descriptor will work and work correcly (you'll get "NFS file\n" read from it), but try to do F_SETLK on it and it'll fail since that requires the server to have seen an OPEN. As far as I can tell, the idea of open done in ->d_revalidate() is unsalvagable. It's simply the wrong place for that. Note that NFS is the only filesystem trying to do atomic open stuff in its ->d_revalidate() and it's not succeeding. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rename dir on server can cause client to get ESTALE - this time with PATCH 2011-12-01 2:12 ` Al Viro @ 2011-12-01 2:24 ` Trond Myklebust 2011-12-01 2:47 ` Al Viro 0 siblings, 1 reply; 5+ messages in thread From: Trond Myklebust @ 2011-12-01 2:24 UTC (permalink / raw) To: Al Viro; +Cc: NeilBrown, NFS On Thu, 2011-12-01 at 02:12 +0000, Al Viro wrote: > On Thu, Dec 01, 2011 at 12:49:22PM +1100, NeilBrown wrote: > > > If the path was "/some/long/path/.", then the final component ("path" in > > this case) has already been revalidated and there is no particular > > need to do it again. > > > > If we change nd->last_type to refer to "the last component looked at" > > rather than just "the last component", then these cases can be > > detected by "nd->last_type != LAST_NORM". > > This is just plain wrong. Let's *not* bring more dependencies on > nameidata into ->d_revalidate(). The goal is to get rid of it there... > > FWIW, if you want a really nasty bug in that area, consider this: > > mkdir /tmp/a > mkdir /tmp/b > echo "local file" >/tmp/x > mount -t nfs4 $SOMETHING /tmp/a > mount -t nfs4 $SOMETHING /tmp/b > echo "NFS file" >/tmp/a/x > mount --bind /tmp/x /tmp/a/x > > now try opening /tmp/b/x. And watch the NFS traffic; there won't be OPEN > request for x on server. Why? Because NFS sees that x is a mountpoint in > *some* instance of that filesystem. And decides that opening it would be > wrong. And so it would, if we were asked to open /tmp/a/x. Alas, in this > case, while dentry is the same, it does *not* have anything mounted on it. > What we get is ->d_revalidate() returning without issuing OPEN and ->open() > being called - again, without issuing OPEN, since it assumes that ->lookup() > or ->d_revalidate() had done it for us. > > Plain IO on resulting descriptor will work and work correcly (you'll get > "NFS file\n" read from it), but try to do F_SETLK on it and it'll fail > since that requires the server to have seen an OPEN. We can possibly fix this for the NFSv4.1 case since that adds support for open-by-filehandle. However, I agree that NFSv4.0 is unfixable: all OPENs are required to do the equivalent of a lookup, which isn't possible in the bind mount case. > As far as I can tell, the idea of open done in ->d_revalidate() is > unsalvagable. It's simply the wrong place for that. Note that NFS > is the only filesystem trying to do atomic open stuff in its ->d_revalidate() > and it's not succeeding. Not doing an open there is prohibitively expensive, though: you are likely to see your cached inode flushed down the toilet if you just drop the dentry... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rename dir on server can cause client to get ESTALE - this time with PATCH 2011-12-01 2:24 ` Trond Myklebust @ 2011-12-01 2:47 ` Al Viro 0 siblings, 0 replies; 5+ messages in thread From: Al Viro @ 2011-12-01 2:47 UTC (permalink / raw) To: Trond Myklebust; +Cc: NeilBrown, NFS On Wed, Nov 30, 2011 at 09:24:18PM -0500, Trond Myklebust wrote: > > As far as I can tell, the idea of open done in ->d_revalidate() is > > unsalvagable. It's simply the wrong place for that. Note that NFS > > is the only filesystem trying to do atomic open stuff in its ->d_revalidate() > > and it's not succeeding. > > Not doing an open there is prohibitively expensive, though: you are > likely to see your cached inode flushed down the toilet if you just drop > the dentry... Wrong. All you really need is to have that attempt to issue OPEN shifted into ->open() itself. The only interesting part is that we might need to drop the original dentry and use a new one for ->f_path.dentry. Don't drop that dentry; after the case in ->d_revalidate() that would have attempted that OPEN you would either cross into covering vfsmount (in which case dentry should be left alone as you are doing now) or issue ->open(). If it's really not valid (i.e. if OPEN yields a different inode), we can deal with that in ->open() just fine. The *only* subtle part is how to deal with "it's a symlink, go away" from the server. Which will require changes in do_last(). I have that stuff; it'll need debugging serious review once posted. Which I'm going to do over weekend. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-12-01 2:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-14 2:19 Rename dir on server can cause client to get ESTALE NeilBrown 2011-12-01 1:49 ` Rename dir on server can cause client to get ESTALE - this time with PATCH NeilBrown 2011-12-01 2:12 ` Al Viro 2011-12-01 2:24 ` Trond Myklebust 2011-12-01 2:47 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).