* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) [not found] ` <Pine.LNX.4.44.0205161105520.5254-100000@alumno.inacap.cl> @ 2002-10-17 20:38 ` Jan Harkes 2002-10-17 21:48 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Jan Harkes @ 2002-10-17 20:38 UTC (permalink / raw) To: linux-fsdevel; +Cc: Marcelo Tosatti, alan, Alexander Viro, Trond Myklebust Hi, I'm pulling up this old thread again, because there are some serious issues with a patch that was merged in 2.4.19-pre3. Basically it boils down to this, NFS had problems with '.' and '..' lookups as they were bypassing the dcache revalidation/lookup. However, the patch that got merged in 2.4.19-pre3 to fix this is causing some significant problems, it changed the semantics of d_revalidate. Originally, returning a error from d_revalidate in cached_lookup would force a subsequent real_lookup. After the patch, returning an error from d_revalidate propagates (in some cases) ESTALE back up to userspace. NFS seems to do it's own replacement for real_lookup in dentry_revalidate, but most other filesystems (like Samba) 'fixed' the problem by simply revalidating the cached attributes of the object with the server. This ofcourse breaks when an object is simply renamed, as the revalidation will reinstantiate the old and incorrect lookup path for the object in the dcache. Now either every filesystem will have to follow NFS's lead and implement some form of real_lookup inside of the dentry_revalidate function which is not the prettiest solution. Or the VFS patch should be reverted/fixed to actually walk the tree for '.' and '..' lookups. btw. the patch also leaks dentries when a 'stale dentry' happens to have children because it doesn't properly check and handle the d_invalidate returncode. Jan For reference here is the thread from last May when Joerg Prante tracked down his problems in supermount to the NFS patch. http://marc.theaimsgroup.com/?l=linux-kernel&m=102158542315351&w=2 Here is the 'controversial patch', cut-n-pasted out from linux.bkbits.net. --- 1.19/fs/namei.c Thu Feb 28 05:57:29 2002 +++ 1.20/fs/namei.c Tue Mar 12 07:35:02 2002 @@ -457,7 +457,7 @@ while (*name=='/') name++; if (!*name) - goto return_base; + goto return_reval; inode = nd->dentry->d_inode; if (current->link_count) @@ -576,7 +576,7 @@ inode = nd->dentry->d_inode; /* fallthrough */ case 1: - goto return_base; + goto return_reval; } if (nd->dentry->d_op && nd->dentry->d_op->d_hash) { err = nd->dentry->d_op->d_hash(nd->dentry, &this); @@ -627,6 +627,19 @@ nd->last_type = LAST_DOT; else if (this.len == 2 && this.name[1] == '.') nd->last_type = LAST_DOTDOT; +return_reval: + /* + * We bypassed the ordinary revalidation routines. + * Check the cached dentry for staleness. + */ + dentry = nd->dentry; + if (dentry && dentry->d_op && dentry->d_op->d_revalidate) { + err = -ESTALE; + if (!dentry->d_op->d_revalidate(dentry, 0)) { + d_invalidate(dentry); + break; + } + } return_base: return 0; out_dput: ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-17 20:38 ` [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) Jan Harkes @ 2002-10-17 21:48 ` Trond Myklebust 2002-10-17 22:16 ` Jan Harkes 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2002-10-17 21:48 UTC (permalink / raw) To: Jan Harkes Cc: linux-fsdevel, Marcelo Tosatti, alan, Alexander Viro, Trond Myklebust >>>>> " " == Jan Harkes <jaharkes@cs.cmu.edu> writes: > Originally, returning a error from d_revalidate in cached_lookup would > force a subsequent real_lookup. After the patch, returning an error from > d_revalidate propagates (in some cases) ESTALE back up to > userspace. Which is the whole point of the patch. If you are trying to read or modify a directory that is invalid, you need to be notified of that. > Now either every filesystem will have to follow NFS's lead and ^^^^^ The vast majority of filesystems are coping fine as they are. > implement some form of real_lookup inside of the > dentry_revalidate function which is not the prettiest > solution. Or the VFS patch should be reverted/fixed to actually > walk the tree for '.' and '..' lookups. The whole problem was that we did *not* walk the tree for '.' and '.. 'lookups. Reverting the VFS patch simply results in us stupidly reusing current->fs->pwd, dentry->d_parent and friends without checking if it is safe to do so. > btw. the patch also leaks dentries when a 'stale dentry' > happens to have children because it doesn't properly check and > handle the d_invalidate returncode. Why? d_invalidate() doesn't dget() anything, and if you are doing 'open(".")' then it will return -EBUSY anyway. What are you going to do about it? Cheers, Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-17 21:48 ` Trond Myklebust @ 2002-10-17 22:16 ` Jan Harkes 2002-10-17 23:57 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Jan Harkes @ 2002-10-17 22:16 UTC (permalink / raw) To: Trond Myklebust Cc: Jan Harkes, linux-fsdevel, Marcelo Tosatti, alan, Alexander Viro On Thu, Oct 17, 2002 at 11:48:27PM +0200, Trond Myklebust wrote: > >>>>> " " == Jan Harkes <jaharkes@cs.cmu.edu> writes: > > Originally, returning a error from d_revalidate in cached_lookup would > > force a subsequent real_lookup. After the patch, returning an error from > > d_revalidate propagates (in some cases) ESTALE back up to > > userspace. > > Which is the whole point of the patch. If you are trying to read or > modify a directory that is invalid, you need to be notified of that. Yes, by failing in real_lookup, not randomly crapping out with ESTALE. > > Now either every filesystem will have to follow NFS's lead and > ^^^^^ > The vast majority of filesystems are coping fine as they are. Right, well only network filesystems could possibly be affected as on-disk filesystems don't have any reason why the cached path would suddenly become invalid, so let's look at network filesytems, - Coda definitely doesn't cope just fine. - smbfs, calls into revalidate_inode instead of retrying the lookup. Broken. - ncpfs, dentry->d_parent->d_inode. Hmm, probably breaks when revalidating the root of the mounted fs? - intermezzo, revalidates data and attributes of the object, same broken behaviour as smbfs. - NFS works! Were there any other network filesystems in the kernel? So we've already got 4 out of 5 broken. Now let's look at out of kernel network and virtual filesystems, - OpenAFS, guess they're broken. - Arla, guess what :) - Supermount, well that is the guy who traced where the strange ESTALE problems came from, he worked on a patch that probably ended up working because I guess nothing can suddenly get renamed from underneath him. - CDFS, same deal. > > implement some form of real_lookup inside of the > > dentry_revalidate function which is not the prettiest > > solution. Or the VFS patch should be reverted/fixed to actually > > walk the tree for '.' and '..' lookups. > > The whole problem was that we did *not* walk the tree for '.' and > '.. 'lookups. Reverting the VFS patch simply results in us stupidly > reusing current->fs->pwd, dentry->d_parent and friends without > checking if it is safe to do so. Sorry it should have been 'reverted, or fixed to actually revalidate all entries on the cached tree leading up to '.' and '..'. > > btw. the patch also leaks dentries when a 'stale dentry' > > happens to have children because it doesn't properly check and > > handle the d_invalidate returncode. > > Why? d_invalidate() doesn't dget() anything, and if you are doing > 'open(".")' then it will return -EBUSY anyway. What are you going to > do about it? /** * d_invalidate - invalidate a dentry * @dentry: dentry to invalidate * * Try to invalidate the dentry if it turns out to be * possible. If there are other dentries that can be * reached through this one we can't delete it and we * return -EBUSY. On success we return 0. * * no dcache lock. */ It's the missing 'd_put' that is the problem. The code should probably look like all other places where d_invalidate is called, i.e. if (!dentry->d_op->d_revalidate(dentry, flags)) { - d_invalidate(dentry); + if (!d_invalidate(dentry)) { + dput(dentry); + dentry = NULL; + } break; } But why would I submit a patch to fix something that I believe is fundamentally broken in the first place. I agree about the problem, I just disagree about the fix that was applied. And I actually thought it had already been reverted last May. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-17 22:16 ` Jan Harkes @ 2002-10-17 23:57 ` Trond Myklebust 2002-10-18 16:49 ` Jan Harkes 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2002-10-17 23:57 UTC (permalink / raw) To: Jan Harkes Cc: Trond Myklebust, linux-fsdevel, Marcelo Tosatti, alan, Alexander Viro >>>>> " " == Jan Harkes <jaharkes@cs.cmu.edu> writes: >> Which is the whole point of the patch. If you are trying to >> read or modify a directory that is invalid, you need to be >> notified of that. > Yes, by failing in real_lookup, not randomly crapping out with > ESTALE. You are completely missing the point: open("."); never did call 'real_lookup', and POSIX compatibility say that it shouldn't (see example below). > Sorry it should have been 'reverted, or fixed to actually > revalidate all entries on the cached tree leading up to '.' and > '..'. No. Client Server ------ -------- cd foo mv foo bar open(".") You are basically saying that you believe that the above scenario must always fail and that the VFS should enforce a violation of POSIX rules. The current code has the possibility to recover from the above sort of thing: this will not be the case if you have to look up 'foo' on the server in order to do open(".") > /** > * d_invalidate - invalidate a dentry > * @dentry: dentry to invalidate > * > * Try to invalidate the dentry if it turns out to be > * possible. If there are other dentries that can be > * reached through this one we can't delete it and we > * return -EBUSY. On success we return 0. > * > * no dcache lock. > */ > It's the missing 'd_put' that is the problem. The code should > probably look like all other places where d_invalidate is > called, i.e. > if (!dentry->d_op->d_revalidate(dentry, flags)) { > - d_invalidate(dentry); > + if (!d_invalidate(dentry)) { > + dput(dentry); > + dentry = NULL; > + } > break; > } So? That's just because those lines usually lie just after a cached_lookup(), which bumps the count. Read the code in question: it is not leaking dentries. Cheers, Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-17 23:57 ` Trond Myklebust @ 2002-10-18 16:49 ` Jan Harkes 2002-10-18 17:03 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Jan Harkes @ 2002-10-18 16:49 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-fsdevel On Fri, Oct 18, 2002 at 01:57:17AM +0200, Trond Myklebust wrote: > Client Server > ------ -------- > cd foo > mv foo bar > open(".") > > You are basically saying that you believe that the above scenario must > always fail and that the VFS should enforce a violation of POSIX > rules. The current code has the possibility to recover from the above > sort of thing: this will not be the case if you have to look up 'foo' > on the server in order to do open(".") I must be blind, because nfs_lookup_revalidate does, dir = dentry->d_parent->d_inode; ... error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle, &fattr); if (error) goto out_bad; So how is this recovering any better from the scenario you just gave? > So? That's just because those lines usually lie just after a > cached_lookup(), which bumps the count. Read the code in question: it > is not leaking dentries. Ah, got confused by the dput in the out_dput path, must have missed the return 0; right above it. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-18 16:49 ` Jan Harkes @ 2002-10-18 17:03 ` Trond Myklebust 2002-10-18 17:12 ` Jan Harkes 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2002-10-18 17:03 UTC (permalink / raw) To: Jan Harkes; +Cc: linux-fsdevel >>>>> " " == Jan Harkes <jaharkes@cs.cmu.edu> writes: > I must be blind, because nfs_lookup_revalidate does, > dir = dentry->d_parent->d_inode; ... error = > NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle, > &fattr); if (error) > goto out_bad; > So how is this recovering any better from the scenario you just > gave? I agree that is a bug (and thanks for pointing it out). I do not, however, see it as an argument for turning off revalidation altogether. Perhaps the right thing to do then is to pass down a flag (LOOKUP_DOT?) that states that we are in fact doing a revalidation of '.' and/or '..' ? Cheers, Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-18 17:03 ` Trond Myklebust @ 2002-10-18 17:12 ` Jan Harkes 2002-10-18 17:41 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Jan Harkes @ 2002-10-18 17:12 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-fsdevel On Fri, Oct 18, 2002 at 07:03:27PM +0200, Trond Myklebust wrote: > I agree that is a bug (and thanks for pointing it out). I do not, > however, see it as an argument for turning off revalidation > altogether. > Perhaps the right thing to do then is to pass down a flag > (LOOKUP_DOT?) that states that we are in fact doing a revalidation of > '.' and/or '..' ? That would help me a lot as I can then reliably recognize when d_revalidate is called from the new 'context' and try to appropriately handle this case. Let me get the POSIX stuff right, In the case of open('.') in your example, I guess d_revalidate would check the inode to see if the object was removed. And possibly retry the lookup, but if that fails only unhash the dentry, not return a 'validation failure'. So, we're not really revalidating the dcache entry at all. Maybe the code really wants to revalidate the inode. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-18 17:12 ` Jan Harkes @ 2002-10-18 17:41 ` Trond Myklebust 2002-10-18 18:23 ` Jan Harkes 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2002-10-18 17:41 UTC (permalink / raw) To: Jan Harkes; +Cc: Trond Myklebust, linux-fsdevel >>>>> " " == Jan Harkes <jaharkes@cs.cmu.edu> writes: > So, we're not really revalidating the dcache entry at > all. Maybe the code really wants to revalidate the inode. That is indeed the correct thing to do here according to POSIX, and is really all I want to do for NFS too. For that reason, I originally proposed to use i_op->revalidate(), but that was vetoed as you may recall. Cheers, Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-18 17:41 ` Trond Myklebust @ 2002-10-18 18:23 ` Jan Harkes 2002-10-18 19:23 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Jan Harkes @ 2002-10-18 18:23 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-fsdevel On Fri, Oct 18, 2002 at 07:41:40PM +0200, Trond Myklebust wrote: > >>>>> " " == Jan Harkes <jaharkes@cs.cmu.edu> writes: > > > So, we're not really revalidating the dcache entry at > > all. Maybe the code really wants to revalidate the inode. > > That is indeed the correct thing to do here according to POSIX, and is > really all I want to do for NFS too. For that reason, I originally > proposed to use i_op->revalidate(), but that was vetoed as you may > recall. It was? I really hope I wasn't one of the veto-ers because looking at this problem it does seem like the correct thing to do as we're not trying to see whether the path is correct, but only the object. Ok, it will cost Coda a whole upcall/context switch, but as it is '.' we're talking about it should already be locally cached. btw. you can easily drop the lookup in nfs_lookup_revalidate. It is a bug for this case, and not necessary in all other places d_revalidate is called as the VFS already uses real_lookup in those cases. Jan Haven't tried to compile it yet, but I guess it would look something like this? diff -urN linux-2.4.19/fs/namei.c linux-2.4.19-revalidate/fs/namei.c --- linux-2.4.19/fs/namei.c 2002-08-28 01:07:35.000000000 -0400 +++ linux-2.4.19-revalidate/fs/namei.c 2002-10-18 14:21:13.000000000 -0400 @@ -633,9 +633,12 @@ * Check the cached dentry for staleness. */ dentry = nd->dentry; - if (dentry && dentry->d_op && dentry->d_op->d_revalidate) { + if (dentry) { err = -ESTALE; - if (!dentry->d_op->d_revalidate(dentry, 0)) { + if (!dentry->d_inode || is_bad_inode(dentry->d_inode) || + (dentry->d_inode->i_op && + dentry->d_inode->i_op->revalidate && + !dentry->d_inode->i_op->revalidate(dentry))) { d_invalidate(dentry); break; } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-18 18:23 ` Jan Harkes @ 2002-10-18 19:23 ` Trond Myklebust 2002-10-21 17:07 ` Jan Harkes 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2002-10-18 19:23 UTC (permalink / raw) To: Jan Harkes; +Cc: Trond Myklebust, linux-fsdevel >>>>> " " == Jan Harkes <jaharkes@cs.cmu.edu> writes: > It was? I really hope I wasn't one of the veto-ers because > looking at this problem it does seem like the correct thing to > do as we're not trying to see whether the path is correct, but > only the object. Al was the main vetoer. His objection to i_op->revalidate() was that it is interpreted differently for NFS, smbfs, coda,... For some it just works like getattr(), for others it actually does an inode revalidation... > Ok, it will cost Coda a whole upcall/context switch, but as it > is '.' we're talking about it should already be locally > cached. But if all you want to do is read the cache, you'd be better off with d_revalidate(dentry,LOOKUP_DOT) / d_revalidate(dentry, LOOKUP_DOTDOT). Wouldn't that allow you to drop the upcall+context switch too? > btw. you can easily drop the lookup in nfs_lookup_revalidate. > It is a bug for this case, and not necessary in all other > places d_revalidate is called as the VFS already uses > real_lookup in those cases. True, but that would be at the expense of increased dentry aliasing. Normally that is not a problem (and so we do tolerate it if operations on the server are the cause), but under NFS we have to play tricks to get around the problem of somebody doing 'unlink("file")' while we are keeping the same file open for read()/write(). Basically we rename the file to '.nfsxxxxx', and keep it until the last process to call close() can do the actual unlink() rpc call to the server. The resulting 'sillyrename()' feature (which is a recommended implementation for all NFS clients) is intolerant of dentry/filename aliasing. Cheers, Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) 2002-10-18 19:23 ` Trond Myklebust @ 2002-10-21 17:07 ` Jan Harkes 0 siblings, 0 replies; 11+ messages in thread From: Jan Harkes @ 2002-10-21 17:07 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-fsdevel On Fri, Oct 18, 2002 at 09:23:59PM +0200, Trond Myklebust wrote: > > Ok, it will cost Coda a whole upcall/context switch, but as it > > is '.' we're talking about it should already be locally > > cached. > > But if all you want to do is read the cache, you'd be better off with > d_revalidate(dentry,LOOKUP_DOT) / d_revalidate(dentry, LOOKUP_DOTDOT). > Wouldn't that allow you to drop the upcall+context switch too? Actually, as long as Coda hasn't received a callback from userspace indicating that the object has changed, it doesn't even make the getattr upcall. In fact I would probably end up with coda_dentry_revalidate calling coda_inode_revalidate in either of these cases. So why not just use iops->revalidate in the first place. As far as I can see, d_revalidate should just be used to check whether a cached directory path might have been changed (i.e. forces a real_lookup). While iops->revalidate checks the validity of the object. It also looks like 2.5 doesn't have this patch, so I'm assuming a change like this will be made to that tree at some point. And I'd rather see a variant that makes sense and doesn't change existing semantics, even when it might be less efficient for some filesystems. As long as it remains definitely correct. Especially the changed semantics of d_revalidate is what troubles me, - If I do the test on the object, I might 'reinstantiate' a possibly bad lookup path in the dcache because we revalidate the 'path' based on the object's state. - When I test the path, we return ESTALE whenever the path has changed due to rename, or on 'volatile objects' (Coda specific, mountpoints, objects under repair, etc.). So neither test ends up being correct. But if we use dentry_revalidate and inode_revalidate in the correct places it is immediately clear _what_ we're testing, the path leading up to the object or the object itself. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2002-10-21 17:07 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200205162142.AWF00051@netmail.netcologne.de>
[not found] ` <E178TUb-0005Bh-00@the-village.bc.nu>
[not found] ` <20020517034357.GA18449@ravel.coda.cs.cmu.edu>
[not found] ` <Pine.LNX.4.44.0205161105520.5254-100000@alumno.inacap.cl>
2002-10-17 20:38 ` [PATCH 2.4.19pre8][RFC] remove-NFS-close-to-open from VFS (was Re: [PATCHSET] 2.4.19-pre8-jp12) Jan Harkes
2002-10-17 21:48 ` Trond Myklebust
2002-10-17 22:16 ` Jan Harkes
2002-10-17 23:57 ` Trond Myklebust
2002-10-18 16:49 ` Jan Harkes
2002-10-18 17:03 ` Trond Myklebust
2002-10-18 17:12 ` Jan Harkes
2002-10-18 17:41 ` Trond Myklebust
2002-10-18 18:23 ` Jan Harkes
2002-10-18 19:23 ` Trond Myklebust
2002-10-21 17:07 ` Jan Harkes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).