From: NeilBrown <neilb@suse.de>
To: Jeff Layton <jlayton@redhat.com>
Cc: viro@ZenIV.linux.org.uk, trond.myklebust@netapp.com,
linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1] vfs: kill FS_REVAL_DOT by adding a d_reval_jumped dentry op
Date: Thu, 21 Feb 2013 09:32:25 +1100 [thread overview]
Message-ID: <20130221093225.5390bb77@notabene.brown> (raw)
In-Reply-To: <1361377145-28094-1-git-send-email-jlayton@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5748 bytes --]
On Wed, 20 Feb 2013 11:19:05 -0500 Jeff Layton <jlayton@redhat.com> wrote:
> The following set of operations on a NFS client and server will cause
>
> server# mkdir a
> client# cd a
> server# mv a a.bak
> client# sleep 30 # (or whatever the dir attrcache timeout is)
> client# stat .
> stat: cannot stat ‘.’: Stale NFS file handle
>
> Obviously, we should not be getting an ESTALE error back there since the
> inode still exists on the server. The problem is that the lookup code
> will call d_revalidate on the dentry that "." refers to, because NFS has
> FS_REVAL_DOT set.
>
> nfs_lookup_revalidate will see that the parent directory has changed and
> will try to reverify the dentry by redoing a LOOKUP. That of course
> fails, so the lookup code returns ESTALE.
>
> The problem here is that d_revalidate is really a bad fit for this case.
> What we really want to know at this point is whether the inode is still
> good or not, but we don't really care what name it goes by or whether
> the dcache is still valid.
>
> Add a new d_op->d_reval_jumped operation and have complete_walk call
> that instead of d_revalidate. The intent there is to allow for a
> "weaker" d_revalidate that just checks to see whether the inode is still
> good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
> special casing.
>
> In a perfect world, this would be a new inode operation instead, but
> I don't see a way to cleanly handle that for 9p, which needs a
> dentry in order to get a fid.
The earlier i_op->revalidate inode operation took a 'dentry', not an inode.
If you look at struct inode_operations, you will see that 8 of them take a
dentry as their first argument.
Never the less, I would leave it in dentry_operations. It makes it easier to
use the DCACHE_OP_ optimisation.
>
> Cc: NeilBrown <neilb@suse.de>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
> Documentation/filesystems/Locking | 2 ++
> Documentation/filesystems/vfs.txt | 32 ++++++++++++++++++++++++++--
> fs/9p/vfs_dentry.c | 1 +
> fs/9p/vfs_super.c | 2 +-
> fs/dcache.c | 3 +++
> fs/namei.c | 8 ++-----
> fs/nfs/dir.c | 45 +++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs4super.c | 6 +++---
> fs/nfs/super.c | 6 +++---
> include/linux/dcache.h | 3 +++
> include/linux/fs.h | 1 -
> 11 files changed, 93 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> index f48e0c6..9718b667 100644
> --- a/Documentation/filesystems/Locking
> +++ b/Documentation/filesystems/Locking
> @@ -10,6 +10,7 @@ be able to use diff(1).
> --------------------------- dentry_operations --------------------------
> prototypes:
> int (*d_revalidate)(struct dentry *, unsigned int);
> + int (*d_reval_jumped)(struct dentry *, unsigned int);
> int (*d_hash)(const struct dentry *, const struct inode *,
> struct qstr *);
> int (*d_compare)(const struct dentry *, const struct inode *,
I cannot get excited about the name "d_reval_jumped" .... though once you
read the explanation in the doco (thanks for that) it makes sense. I guess
I'll get used to it.
> /*
> + * A weaker form of d_revalidate for revalidating just the dentry->d_inode
> + * when we don't really care about the dentry name. This is called when a
> + * pathwalk ends on a dentry that was not found via a normal lookup in the
> + * parent dir (e.g.: ".", "..", procfs symlinks or mountpoint traversals).
> + *
> + * In this situation, we just want to verify that the inode itself is OK
> + * since the dentry might have changed on the server.
> + */
> +static int nfs_reval_jumped(struct dentry *dentry, unsigned int flags)
> +{
> + int error;
> + struct inode *inode = dentry->d_inode;
> +
> + if (flags & LOOKUP_RCU)
> + return -ECHILD;
> +
> + /*
> + * I believe we can only get a negative dentry here in the case of a
> + * procfs-style symlink. Just assume it's correct for now, but we may
> + * eventually need to do something more here.
> + */
> + if (!inode) {
> + dfprintk(LOOKUPCACHE, "%s: %s/%s has negative inode\n",
> + __func__, dentry->d_parent->d_name.name,
> + dentry->d_name.name);
> + return 1;
> + }
> +
> + if (is_bad_inode(inode)) {
> + dfprintk(LOOKUPCACHE, "%s: %s/%s has dud inode\n",
> + __func__, dentry->d_parent->d_name.name,
> + dentry->d_name.name);
> + return 0;
> + }
> +
> + error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> + dfprintk(LOOKUPCACHE, "NFS: %s: inode %lu is %s\n",
> + __func__, inode->i_ino, error ? "invalid" : "valid");
> + if (error)
> + return 0;
> + return 1;
> +}
I wonder if we can delay the "-ECHILD" return a bit.
Leaving it to after the first two tests should be safe, but doesn't gain us
anything.
Open-coding the nfs_revalidate_inode as:
if (!(NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATTR)
&& !nfs_attribute_cache_expired(inode))
return NFS_STALE(inode) ? 0 : 1;
error = __nfs_revalidate_inode(server, inode);
and then inserting the -ECHILD code in before the __nfs_revalidate_inode
should be safe, and means we still benefit from the RCU path in the common
case.
Of course, for that to be really useful, nfs_lookup_revalidate would need to
be changed to only return -ECHILD if it really needed to block, and maybe
that is too hard, or at least is a job for another day.
Otherwise, looks good - thanks.
Reviewed-by: NeilBrown <neilb@suse.de>
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-02-20 22:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-20 16:19 [PATCH v1] vfs: kill FS_REVAL_DOT by adding a d_reval_jumped dentry op Jeff Layton
2013-02-20 22:32 ` NeilBrown [this message]
2013-02-21 16:17 ` Jeff Layton
2013-02-21 21:51 ` Al Viro
2013-02-22 12:28 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130221093225.5390bb77@notabene.brown \
--to=neilb@suse.de \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@netapp.com \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).