public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
	linux-nfs@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] Should we expect close-to-open consistency on directories?
Date: Mon, 10 May 2010 13:01:27 +1000	[thread overview]
Message-ID: <20100510130127.7edf6e9c@notabene.brown> (raw)
In-Reply-To: <4BE76F87.4000809@oracle.com>

On Sun, 09 May 2010 22:29:27 -0400
Chuck Lever <chuck.lever@oracle.com> wrote:

> On 05/ 8/10 06:08 PM, Neil Brown wrote:
> > On Sat, 08 May 2010 09:05:04 -0400
> > Chuck Lever<chuck.lever@oracle.com>  wrote:
> >
> >> On 05/07/2010 06:34 PM, Neil Brown wrote:
> >>> On Thu, 06 May 2010 09:58:31 -0400 Trond Myklebust<trond.myklebust@fys.uio.no>   wrote:
> >>>>>
> >>>>>
> >>>>> diff --git a/fs/namei.c b/fs/namei.c
> >>>>> index a7dce91..256ae13 100644
> >>>>> --- a/fs/namei.c
> >>>>> +++ b/fs/namei.c
> >>>>> @@ -719,7 +719,11 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
> >>>>>    done:
> >>>>>    	path->mnt = mnt;
> >>>>>    	path->dentry = dentry;
> >>>>> -	__follow_mount(path);
> >>>>> +	if (__follow_mount(path)&&
> >>>>> +	    (path->mnt->mnt_sb->s_type->fs_flags&   FS_REVAL_DOT)) {
> >>>>> +		if (!path->dentry->d_op->d_revalidate(path->dentry, nd))
> >>>>> +			return -ESTALE;
> >>>>
> >>>> Won't this prevent you from ever being able to unmount the stale
> >>>> filesystem?
> >>>>
> >>>
> >>> Good point - I think you are right.
> >>>
> >>> It seems to me that ->d_revalidate is being used for two distinct, though
> >>> related, tasks.
> >>> One is to revalidate the dentry - make sure the name still refers to the same
> >>> inode.  The other is to revalidate the inode - make sure the cached
> >>> attributes are still valid.  In NFS (v3 and v4 at least) these are both
> >>> performed by one call so it makes some sense to combine them.  But from the
> >>> VFS perspective I would have thought they were quite separate.
> >>> nfs_lookup_revalidate sometimes does a GETATTR, and sometimes does a LOOKUP,
> >>> depending to some extent on the 'intent' in the nameidata.  I find this makes
> >>> it a bit hard to follow what is really happening, or how d_revalidate should
> >>> really be used.
> >>>
> >>> Maybe we should just ignore the return value above.  Or maybe d_revalidate
> >>> should never do a GETATTR - that should be done by ->open ??
> >>>
> >>> Confused :-(
> >>
> >> The reason for this arrangement is that by the time ->open is called,
> >> it's too late to return ESTALE.  During the path walk, there is still an
> >> opportunity to renew the mapping between dentry and inode, should the
> >> cached value of that mapping be stale.
> >>
> >
> > Is it?
> > ->open seems to be able to return an error code, and that error code seems to
> > be propagated up.  What am I missing?
> 
> "too late to return ESTALE" ... to the VFS layer.  When d_revalidate 
> returns ESTALE while a pathname is being resolved, our VFS layer 
> re-walks the pathname from the root, passing in a flag that says "please 
> don't use our cache, but do each LOOKUP again."

I see. path_walk does that, the flag being "LOOKUP_REVAL".
But this is just saying that 'open' is too late to validate a name, and I
agree with that.  I want 'open' to revalidate the cached attributes -
particularly in the case where there is no name to revalidated.
i.e. a mount point or '.'.  In those cases we would want to return ESTALE,
not try to redo the lookup (because there is no lookup to do).

> 
> That way, we can rely on the dentry cache most of the time, and have 
> some way to recover when file system objects are renamed on the server.
> 
> > I appreciate that a previous lookup might have already done the equivalent of
> > 'open' (e.g. for O_EXCL) but it seems wrong to assume that the last lookup
> > will always do the GETATTR required for open (because there isn't always a
> > 'last lookup').
> 
> open(".") has always been funky.
> 
> I had an argument years ago with someone who pointed out that the POSIX 
> standards are written in such a way that it is entirely optional to 
> revalidate ".", and so we don't.  I don't recall the specifics.

The rationale I fall back on is that to preserve "close-to-open-coherency"
you need to get the change attribute (aka mtime) when you open something - a
directory in this case.
This isn't so much "revalidating" as "refreshing".
So prima-facie 'open' should always issue a GETATTR and check if the file has
changed.  In practice that isn't necessary in most cases as a d_revalidate
has issued a LOOKUP or a GETATTR and the cached attributes are current.
But in the cases where there is no dentry to revalidate (mountpoint and "."),
something more is needed.

NeilBrown


> 
> > I would rather see the lookup (or d_revalidate) remember how much of an open
> > it has done, and then ->open does the rest.  i.e. it does a GETATTR if that
> > hasn't already been done.
> > I don't think there is currently an easy way for 'open' to know how much work
> > 'd_revalidate' or 'lookup' has done, though I suspect that could be fixed.
> > Maybe pass (as pointer to) the whole intent structure into open rather than
> > just the intent.open.file ??
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


      reply	other threads:[~2010-05-10  3:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-20  7:22 [PATCH] Should we expect close-to-open consistency on directories? Neil Brown
2010-04-20 13:02 ` Trond Myklebust
2010-04-21  7:03   ` Neil Brown
2010-05-06  4:13     ` Neil Brown
     [not found]       ` <20100506141347.06451f56-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-05-06 13:58         ` Trond Myklebust
2010-05-07 22:34           ` Neil Brown
2010-05-08 13:05             ` Chuck Lever
2010-05-08 22:08               ` Neil Brown
2010-05-10  2:29                 ` Chuck Lever
2010-05-10  3:01                   ` Neil Brown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100510130127.7edf6e9c@notabene.brown \
    --to=neilb@suse.de \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox