* Rename dir on server can cause client to get ESTALE
@ 2011-11-14 2:19 NeilBrown
2011-12-01 1:49 ` Rename dir on server can cause client to get ESTALE - this time with PATCH NeilBrown
0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2011-11-14 2:19 UTC (permalink / raw)
To: Trond Myklebust, NFS, Alexander Viro
[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]
hi,
I've run into another issue that seems to related to FS_REVAL_DOT.
The script below makes the details precise, but the essence is that if I 'cd'
into a directory on the client, then rename it on the server, then it is
possible that the client will start getting ESTALE when accessing '.' - even
though the directory still exists.
The ESTALE is generated because nfs_lookup_revalidate fails on the dentry, so
complete_walk (in fs/namei.c) gets failure from d_revalidate() and so sets the
status to -ESTALE.
nfs_lookup_revalidate fails because when it repeats the lookup it sees a
different directory (as you will see the script creates a new directory with
the old name).
I think it only makes sense to do a ->lookup revalidate of the dentry at the
end of the path when there was a real non '.' or '..' name leading to the
dentry. If we were just looking up '.', we want to revalidate the inode, but
not the dentry.
Unfortunately I cannot see how that distinction could be introduced into the
current path-walk code.
Any ideas?
Thanks,
NeilBrown
SERVER=eli # name of server. ssh access required.
DIR=/home # directory on server to mount
MPOINT=/mnt # location on client to mount it.
TMP=/neilb/tmp # path to scratch area in $DIR
sudo umount $MPOINT
sudo mount -o vers=3 $SERVER:$DIR $MPOINT
cd /
ssh $SERVER "rm -r $DIR$TMP/*dir*"
ssh $SERVER "mkdir $DIR$TMP/adir"
while [ ! -d $MPOINT$TMP/adir ];
do echo -n . ; sleep 2;
done
cd $MPOINT$TMP/adir || exit
echo "Entered directory"
ls -la > /dev/null
ssh $SERVER "cd $DIR$TMP; mv adir adir.moved"
echo "Moved directory on server"
ls -la > /dev/null
echo -n "Waiting for move to be visible on client"
while ls -la $MPOINT$TMP/adir >/dev/null 2>&1
do echo -n .
sleep 3
(cd / ; ssh $SERVER "cd $DIR$TMP; mkdir bdir ; rmdir bdir" )
done
echo
echo "Make replacement directory on server"
(cd / ; ssh $SERVER "cd $DIR$TMP; mkdir adir")
ls -la $MPOINT$TMP/adir
ls -la
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rename dir on server can cause client to get ESTALE - this time with PATCH
2011-11-14 2:19 Rename dir on server can cause client to get ESTALE NeilBrown
@ 2011-12-01 1:49 ` NeilBrown
2011-12-01 2:12 ` Al Viro
0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2011-12-01 1:49 UTC (permalink / raw)
To: Trond Myklebust, Alexander Viro; +Cc: NFS
[-- Attachment #1: Type: text/plain, Size: 4492 bytes --]
On Mon, 14 Nov 2011 13:19:29 +1100 NeilBrown <neilb@suse.de> wrote:
>
> hi,
> I've run into another issue that seems to related to FS_REVAL_DOT.
>
> The script below makes the details precise, but the essence is that if I 'cd'
> into a directory on the client, then rename it on the server, then it is
> possible that the client will start getting ESTALE when accessing '.' - even
> though the directory still exists.
>
> The ESTALE is generated because nfs_lookup_revalidate fails on the dentry, so
> complete_walk (in fs/namei.c) gets failure from d_revalidate() and so sets the
> status to -ESTALE.
>
> nfs_lookup_revalidate fails because when it repeats the lookup it sees a
> different directory (as you will see the script creates a new directory with
> the old name).
>
> I think it only makes sense to do a ->lookup revalidate of the dentry at the
> end of the path when there was a real non '.' or '..' name leading to the
> dentry. If we were just looking up '.', we want to revalidate the inode, but
> not the dentry.
>
> Unfortunately I cannot see how that distinction could be introduced into the
> current path-walk code.
>
> Any ideas?
>
> Thanks,
> NeilBrown
>
>
> SERVER=eli # name of server. ssh access required.
> DIR=/home # directory on server to mount
> MPOINT=/mnt # location on client to mount it.
> TMP=/neilb/tmp # path to scratch area in $DIR
>
> sudo umount $MPOINT
> sudo mount -o vers=3 $SERVER:$DIR $MPOINT
>
> cd /
> ssh $SERVER "rm -r $DIR$TMP/*dir*"
> ssh $SERVER "mkdir $DIR$TMP/adir"
> while [ ! -d $MPOINT$TMP/adir ];
> do echo -n . ; sleep 2;
> done
> cd $MPOINT$TMP/adir || exit
> echo "Entered directory"
> ls -la > /dev/null
> ssh $SERVER "cd $DIR$TMP; mv adir adir.moved"
> echo "Moved directory on server"
> ls -la > /dev/null
> echo -n "Waiting for move to be visible on client"
> while ls -la $MPOINT$TMP/adir >/dev/null 2>&1
> do echo -n .
> sleep 3
> (cd / ; ssh $SERVER "cd $DIR$TMP; mkdir bdir ; rmdir bdir" )
> done
> echo
> echo "Make replacement directory on server"
> (cd / ; ssh $SERVER "cd $DIR$TMP; mkdir adir")
> ls -la $MPOINT$TMP/adir
> ls -la
>
.. but answer came there none....
I've looked some more at the code and now would like to propose a patch.
This fixes it for me and feels right.
Opinions?
Thanks,
NeilBrown
From 7abb2d77b4c8d8ca340e372447467d8a47241f83 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 30 Nov 2011 18:35:13 +1100
Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly.
When d_revalidate is called on a dentry because FS_REVAL_DOT is set
it isn't really appropriate to revalidate the name.
If the path was simply ".", then the current-working-directory could
have been renamed on the server and should still be accessible as "."
even if it has a new name.
If the path was "/some/long/path/.", then the final component ("path" in
this case) has already been revalidated and there is no particular
need to do it again.
If we change nd->last_type to refer to "the last component looked at"
rather than just "the last component", then these cases can be
detected by "nd->last_type != LAST_NORM".
Signed-off-by: NeilBrown <neilb@suse.de>
---
fs/namei.c | 2 +-
fs/nfs/dir.c | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 5008f01..6a720f7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1434,6 +1434,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
}
}
+ nd->last_type = type;
/* remove trailing slashes? */
if (!c)
goto last_component;
@@ -1458,7 +1459,6 @@ static int link_path_walk(const char *name, struct nameidata *nd)
last_component:
nd->last = this;
- nd->last_type = type;
return 0;
}
terminate_walk(nd);
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ac28990..f62827a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1137,6 +1137,15 @@ static int nfs_lookup_revalidate(struct dentry *dentry, struct nameidata *nd)
if (NFS_STALE(inode))
goto out_bad;
+ if (nd->last_type != LAST_NORM) {
+ /* name not relevant, just inode */
+ error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (error)
+ goto out_bad;
+ else
+ goto out_valid;
+ }
+
error = -ENOMEM;
fhandle = nfs_alloc_fhandle();
fattr = nfs_alloc_fattr();
--
1.7.7.3
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Rename dir on server can cause client to get ESTALE - this time with PATCH
2011-12-01 1:49 ` Rename dir on server can cause client to get ESTALE - this time with PATCH NeilBrown
@ 2011-12-01 2:12 ` Al Viro
2011-12-01 2:24 ` Trond Myklebust
0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2011-12-01 2:12 UTC (permalink / raw)
To: NeilBrown; +Cc: Trond Myklebust, NFS
On Thu, Dec 01, 2011 at 12:49:22PM +1100, NeilBrown wrote:
> If the path was "/some/long/path/.", then the final component ("path" in
> this case) has already been revalidated and there is no particular
> need to do it again.
>
> If we change nd->last_type to refer to "the last component looked at"
> rather than just "the last component", then these cases can be
> detected by "nd->last_type != LAST_NORM".
This is just plain wrong. Let's *not* bring more dependencies on
nameidata into ->d_revalidate(). The goal is to get rid of it there...
FWIW, if you want a really nasty bug in that area, consider this:
mkdir /tmp/a
mkdir /tmp/b
echo "local file" >/tmp/x
mount -t nfs4 $SOMETHING /tmp/a
mount -t nfs4 $SOMETHING /tmp/b
echo "NFS file" >/tmp/a/x
mount --bind /tmp/x /tmp/a/x
now try opening /tmp/b/x. And watch the NFS traffic; there won't be OPEN
request for x on server. Why? Because NFS sees that x is a mountpoint in
*some* instance of that filesystem. And decides that opening it would be
wrong. And so it would, if we were asked to open /tmp/a/x. Alas, in this
case, while dentry is the same, it does *not* have anything mounted on it.
What we get is ->d_revalidate() returning without issuing OPEN and ->open()
being called - again, without issuing OPEN, since it assumes that ->lookup()
or ->d_revalidate() had done it for us.
Plain IO on resulting descriptor will work and work correcly (you'll get
"NFS file\n" read from it), but try to do F_SETLK on it and it'll fail
since that requires the server to have seen an OPEN.
As far as I can tell, the idea of open done in ->d_revalidate() is
unsalvagable. It's simply the wrong place for that. Note that NFS
is the only filesystem trying to do atomic open stuff in its ->d_revalidate()
and it's not succeeding.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rename dir on server can cause client to get ESTALE - this time with PATCH
2011-12-01 2:12 ` Al Viro
@ 2011-12-01 2:24 ` Trond Myklebust
2011-12-01 2:47 ` Al Viro
0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2011-12-01 2:24 UTC (permalink / raw)
To: Al Viro; +Cc: NeilBrown, NFS
On Thu, 2011-12-01 at 02:12 +0000, Al Viro wrote:
> On Thu, Dec 01, 2011 at 12:49:22PM +1100, NeilBrown wrote:
>
> > If the path was "/some/long/path/.", then the final component ("path" in
> > this case) has already been revalidated and there is no particular
> > need to do it again.
> >
> > If we change nd->last_type to refer to "the last component looked at"
> > rather than just "the last component", then these cases can be
> > detected by "nd->last_type != LAST_NORM".
>
> This is just plain wrong. Let's *not* bring more dependencies on
> nameidata into ->d_revalidate(). The goal is to get rid of it there...
>
> FWIW, if you want a really nasty bug in that area, consider this:
>
> mkdir /tmp/a
> mkdir /tmp/b
> echo "local file" >/tmp/x
> mount -t nfs4 $SOMETHING /tmp/a
> mount -t nfs4 $SOMETHING /tmp/b
> echo "NFS file" >/tmp/a/x
> mount --bind /tmp/x /tmp/a/x
>
> now try opening /tmp/b/x. And watch the NFS traffic; there won't be OPEN
> request for x on server. Why? Because NFS sees that x is a mountpoint in
> *some* instance of that filesystem. And decides that opening it would be
> wrong. And so it would, if we were asked to open /tmp/a/x. Alas, in this
> case, while dentry is the same, it does *not* have anything mounted on it.
> What we get is ->d_revalidate() returning without issuing OPEN and ->open()
> being called - again, without issuing OPEN, since it assumes that ->lookup()
> or ->d_revalidate() had done it for us.
>
> Plain IO on resulting descriptor will work and work correcly (you'll get
> "NFS file\n" read from it), but try to do F_SETLK on it and it'll fail
> since that requires the server to have seen an OPEN.
We can possibly fix this for the NFSv4.1 case since that adds support
for open-by-filehandle. However, I agree that NFSv4.0 is unfixable: all
OPENs are required to do the equivalent of a lookup, which isn't
possible in the bind mount case.
> As far as I can tell, the idea of open done in ->d_revalidate() is
> unsalvagable. It's simply the wrong place for that. Note that NFS
> is the only filesystem trying to do atomic open stuff in its ->d_revalidate()
> and it's not succeeding.
Not doing an open there is prohibitively expensive, though: you are
likely to see your cached inode flushed down the toilet if you just drop
the dentry...
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rename dir on server can cause client to get ESTALE - this time with PATCH
2011-12-01 2:24 ` Trond Myklebust
@ 2011-12-01 2:47 ` Al Viro
0 siblings, 0 replies; 5+ messages in thread
From: Al Viro @ 2011-12-01 2:47 UTC (permalink / raw)
To: Trond Myklebust; +Cc: NeilBrown, NFS
On Wed, Nov 30, 2011 at 09:24:18PM -0500, Trond Myklebust wrote:
> > As far as I can tell, the idea of open done in ->d_revalidate() is
> > unsalvagable. It's simply the wrong place for that. Note that NFS
> > is the only filesystem trying to do atomic open stuff in its ->d_revalidate()
> > and it's not succeeding.
>
> Not doing an open there is prohibitively expensive, though: you are
> likely to see your cached inode flushed down the toilet if you just drop
> the dentry...
Wrong. All you really need is to have that attempt to issue OPEN shifted
into ->open() itself. The only interesting part is that we might need
to drop the original dentry and use a new one for ->f_path.dentry.
Don't drop that dentry; after the case in ->d_revalidate() that would have
attempted that OPEN you would either cross into covering vfsmount (in which
case dentry should be left alone as you are doing now) or issue ->open().
If it's really not valid (i.e. if OPEN yields a different inode), we can
deal with that in ->open() just fine. The *only* subtle part is how to
deal with "it's a symlink, go away" from the server. Which will require
changes in do_last(). I have that stuff; it'll need debugging serious
review once posted. Which I'm going to do over weekend.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-12-01 2:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-14 2:19 Rename dir on server can cause client to get ESTALE NeilBrown
2011-12-01 1:49 ` Rename dir on server can cause client to get ESTALE - this time with PATCH NeilBrown
2011-12-01 2:12 ` Al Viro
2011-12-01 2:24 ` Trond Myklebust
2011-12-01 2:47 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).