* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate [not found] <20130725055209.GA15621@sh-el5.eng.rdu2.redhat.com> @ 2013-07-25 14:42 ` Ric Wheeler 2013-07-30 16:16 ` Miklos Szeredi 0 siblings, 1 reply; 12+ messages in thread From: Ric Wheeler @ 2013-07-25 14:42 UTC (permalink / raw) To: Anand Avati Cc: miklos, bfoster, Linux FS Devel, fuse-devel, Alexander Viro, David Howells, Eric Paris On 07/25/2013 01:52 AM, Anand Avati wrote: > Consider the following sequence of operations: > > // mount same backend at two places > > # mount -t fuse <some_src> /mnt1 > # mount -t fuse <same_src> /mnt2 > > // create a directory chain from 1 > > $ mkdir -p /mnt1/a/b > > // load it in 2's cache > > $ stat /mnt2/a/b # load it in cache > > // recreate same names from 1 > > $ rm -rf /mnt1/a > $ mkdir -p /mnt1/a/b > > // sleep long enough for entry_timeout to expire > > $ sleep 5 > > // access /mnt2/a/b from two threads in parallel > > $ stat /mnt2/a/b & stat /mnt2/a/b > > Depending on the race, none/either/both of the commands > executed in the last step can fail. > > This is because both the stat command threads execute the > resolver in parallel. > > - The resolver function lookup_fast() will acquire the dentry > (of /mnt2/a) reference with __d_lookup() > > - Call to d_revalidate() on the just acquired dentry will fail, > (i.e return 0) as FUSE gets a new nodeid from the server. > > - In the mean time another resolver thread enters lookup_fast() > and acquires the dentry of /mnt2/a with __d_lookup(), effectively > making dentry->d_count > 1 [+ child refs] > > - Now when first thread calls d_invalidate() because of the failed > d_revalidate(), d_invalidate() will find that even after calling > shrink_dcache_parent() we are left with d_count > 1, and fails > d_invalidate() with EBUSY. > > - The failed d_invalidate() makes the resolver use this "stale" dentry > as the result of this walk_component() call -- even though it just > witnessed d_revalidate() fail on it, only because d_invalidate() > could not succeed because of an innocent concurrent resolver in > progress. > > - Using the stale dentry (and inode), the call progress and stubles > with an error as the FUSE server is presented with a dead inode. > > - The other thread would fail in d_revalidate() too, and depending > on the progress relaitvely made between the two, the second > thread's d_invalidate() might get an EBUSY too, and stuble in the > same way as the first thread. > > If the same stat commands were issued serially, both would succeed. > > NFS is faced with a similar situation as FUSE (and in many other ways > in general too) and it checks for a submounts and conditionally calls > d_drop(). The call to d_drop() within ->d_revalidate() guarantees the > success of d_invalidate(), and a fresh lookup would be issued there on. > > Signed-off-by: Anand Avati <avati@redhat.com> > --- Adding in Al and others to the thread & correcting the fsdevel address :) Ric > > Background: > > The previous submission of this patch (on fuse-devel) had review comments > to investigate doing a d_drop() on the entire subtree rather than just > on the entry. That approach seems to be very complex. So reposting the > same patch to kick in the discussion again. This patch follows the NFS > approach to the problem. > > fs/fuse/dir.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index a1d9047..83c217e 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) > if (!err) { > struct fuse_inode *fi = get_fuse_inode(inode); > if (outarg.nodeid != get_node_id(inode)) { > + if (!have_submounts(entry)) { > + shrink_dcache_parent(entry); > + d_drop(entry); > + } > fuse_queue_forget(fc, forget, outarg.nodeid, 1); > return 0; > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate 2013-07-25 14:42 ` [PATCH] [REPOST] fuse: drop dentry on failed revalidate Ric Wheeler @ 2013-07-30 16:16 ` Miklos Szeredi 2013-07-30 19:30 ` Ric Wheeler 2013-08-02 12:17 ` Jeff Layton 0 siblings, 2 replies; 12+ messages in thread From: Miklos Szeredi @ 2013-07-30 16:16 UTC (permalink / raw) To: Ric Wheeler Cc: Anand Avati, Brian Foster, Linux FS Devel, fuse-devel, Alexander Viro, David Howells, Eric Paris >> fs/fuse/dir.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >> index a1d9047..83c217e 100644 >> --- a/fs/fuse/dir.c >> +++ b/fs/fuse/dir.c >> @@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry >> *entry, unsigned int flags) >> if (!err) { >> struct fuse_inode *fi = get_fuse_inode(inode); >> if (outarg.nodeid != get_node_id(inode)) { >> + if (!have_submounts(entry)) { >> + shrink_dcache_parent(entry); >> + d_drop(entry); >> + } Doing d_drop() on a subtree really has problems. Take this example: cd /mnt/foo/bar mkdir my-mount-point [->d_revalidate() drops "/mnt/foo"] mount whatever on ./my-mount-point cd / At this point that mount is not accessible in any way. The only way to umount it is to lazy umount the parent mount. If the parent mount was root, then that's not a practical thing to do. AFAICS nothing prevents this from happening on NFS and root privs are not required (e.g. mounting a fuse fs on an NFS home dir). The other problem is that, unlike NFS, fuse doesn't currently reconnect these subtrees when it finds them at a different point in the tree. d_drop on it just makes things worse because at that point that subtree will not be accessible anymore (while the fuse fs is mounted, that is). This could be fixed pretty easily by using the d_materialise_*() helpers. But issues with mounting over an unhashed subtree *should* be addressed. Recursively d_drop()-ing might still be the simplest way. Need to make sure the subtree doesn't change in the meantime. Holding rename_lock might do the trick. But then there's still an unlikely race with mount()... Thanks, Miklos ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate 2013-07-30 16:16 ` Miklos Szeredi @ 2013-07-30 19:30 ` Ric Wheeler 2013-08-01 16:39 ` Miklos Szeredi 2013-08-02 12:17 ` Jeff Layton 1 sibling, 1 reply; 12+ messages in thread From: Ric Wheeler @ 2013-07-30 19:30 UTC (permalink / raw) To: Miklos Szeredi Cc: Anand Avati, Brian Foster, Linux FS Devel, fuse-devel, Alexander Viro, David Howells, Eric Paris, Alexander Viro On 07/30/2013 12:16 PM, Miklos Szeredi wrote: >>> fs/fuse/dir.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >>> index a1d9047..83c217e 100644 >>> --- a/fs/fuse/dir.c >>> +++ b/fs/fuse/dir.c >>> @@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry >>> *entry, unsigned int flags) >>> if (!err) { >>> struct fuse_inode *fi = get_fuse_inode(inode); >>> if (outarg.nodeid != get_node_id(inode)) { >>> + if (!have_submounts(entry)) { >>> + shrink_dcache_parent(entry); >>> + d_drop(entry); >>> + } > Doing d_drop() on a subtree really has problems. Take this example: > > cd /mnt/foo/bar > mkdir my-mount-point > [->d_revalidate() drops "/mnt/foo"] > mount whatever on ./my-mount-point > cd / > > At this point that mount is not accessible in any way. The only way > to umount it is to lazy umount the parent mount. If the parent mount > was root, then that's not a practical thing to do. > > AFAICS nothing prevents this from happening on NFS and root privs are > not required (e.g. mounting a fuse fs on an NFS home dir). > > The other problem is that, unlike NFS, fuse doesn't currently > reconnect these subtrees when it finds them at a different point in > the tree. d_drop on it just makes things worse because at that point > that subtree will not be accessible anymore (while the fuse fs is > mounted, that is). This could be fixed pretty easily by using the > d_materialise_*() helpers. > > But issues with mounting over an unhashed subtree *should* be addressed. > > Recursively d_drop()-ing might still be the simplest way. Need to > make sure the subtree doesn't change in the meantime. Holding > rename_lock might do the trick. But then there's still an unlikely > race with mount()... > > Thanks, > Miklos Hi Miklos, Do you have a respin of the proposed patch that we can put through testing? We see this with hadoop on top of gluster and have lots of testers ready and waiting :) Al, do you have any thoughts on this? Thanks! ric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate 2013-07-30 19:30 ` Ric Wheeler @ 2013-08-01 16:39 ` Miklos Szeredi [not found] ` <20130801163940.GA1356-nYI/l+Q8b4r16c5iV7KQqR1Qg9XOENNVk/YoNI2nt5o@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Miklos Szeredi @ 2013-08-01 16:39 UTC (permalink / raw) To: Ric Wheeler Cc: Anand Avati, Brian Foster, Linux FS Devel, Alexander Viro, David Howells, Eric Paris On Tue, Jul 30, 2013 at 03:30:45PM -0400, Ric Wheeler wrote: > On 07/30/2013 12:16 PM, Miklos Szeredi wrote: > >>> fs/fuse/dir.c | 4 ++++ > >>> 1 file changed, 4 insertions(+) > >>> > >>>diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > >>>index a1d9047..83c217e 100644 > >>>--- a/fs/fuse/dir.c > >>>+++ b/fs/fuse/dir.c > >>>@@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry > >>>*entry, unsigned int flags) > >>> if (!err) { > >>> struct fuse_inode *fi = get_fuse_inode(inode); > >>> if (outarg.nodeid != get_node_id(inode)) { > >>>+ if (!have_submounts(entry)) { > >>>+ shrink_dcache_parent(entry); > >>>+ d_drop(entry); > >>>+ } > >Doing d_drop() on a subtree really has problems. Take this example: > > > >cd /mnt/foo/bar > >mkdir my-mount-point > >[->d_revalidate() drops "/mnt/foo"] > >mount whatever on ./my-mount-point > >cd / > > > >At this point that mount is not accessible in any way. The only way > >to umount it is to lazy umount the parent mount. If the parent mount > >was root, then that's not a practical thing to do. > > > >AFAICS nothing prevents this from happening on NFS and root privs are > >not required (e.g. mounting a fuse fs on an NFS home dir). > > Here's a patch that tries to address the mount issue in NFS (and in FUSE in the future). I haven't yet tested it, just interested in whether the concept looks OK or not. Not sure if the ENOENT is the best error. Also I have a little fear that this will cause a regression in some weird setup. Possibly we should use a new dentry flag for unhashing the directory dentry specifically to invalidate it. Thanks, Miklos diff --git a/fs/dcache.c b/fs/dcache.c index 87bdb53..5c51217 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1103,6 +1103,34 @@ rename_retry: } EXPORT_SYMBOL(have_submounts); +static bool __has_unlinked_ancestor(struct dentry *dentry) +{ + struct dentry *this; + + for (this = dentry; !IS_ROOT(this); this = this->d_parent) { + if (d_unhashed(this)) + return true; + } + return false; +} + +/* + * Called by mount code to check if the mountpoint is reachable (e.g. NFS can + * unhash a directory dentry and then the complete subtree can become + * unreachable). + */ +bool has_unlinked_ancestor(struct dentry *dentry) +{ + bool found; + + /* Need exclusion wrt. have_submounts() */ + write_seqlock(&rename_lock); + found = __has_unlinked_ancestor(dentry); + write_sequnlock(&rename_lock); + + return found; +} + /* * Search the dentry child list of the specified parent, * and move any unused dentries to the end of the unused diff --git a/fs/internal.h b/fs/internal.h index 7c5f01c..d232355 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -126,6 +126,7 @@ extern int invalidate_inodes(struct super_block *, bool); * dcache.c */ extern struct dentry *__d_alloc(struct super_block *, const struct qstr *); +extern bool has_unlinked_ancestor(struct dentry *dentry); /* * read_write.c diff --git a/fs/namespace.c b/fs/namespace.c index 7b1ca9b..bb92a9c 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -634,6 +634,15 @@ static struct mountpoint *new_mountpoint(struct dentry *dentry) } dentry->d_flags |= DCACHE_MOUNTED; spin_unlock(&dentry->d_lock); + + if (has_unlinked_ancestor(dentry)) { + spin_lock(&dentry->d_lock); + dentry->d_flags &= ~DCACHE_MOUNTED; + spin_unlock(&dentry->d_lock); + kfree(mp); + return ERR_PTR(-ENOENT); + } + mp->m_dentry = dentry; mp->m_count = 1; list_add(&mp->m_hash, chain); ^ permalink raw reply related [flat|nested] 12+ messages in thread
[parent not found: <20130801163940.GA1356-nYI/l+Q8b4r16c5iV7KQqR1Qg9XOENNVk/YoNI2nt5o@public.gmane.org>]
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate [not found] ` <20130801163940.GA1356-nYI/l+Q8b4r16c5iV7KQqR1Qg9XOENNVk/YoNI2nt5o@public.gmane.org> @ 2013-08-01 18:45 ` Ric Wheeler [not found] ` <51FAACD9.8020205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Ric Wheeler @ 2013-08-01 18:45 UTC (permalink / raw) To: Miklos Szeredi, Trond Myklebust Cc: Anand Avati, Brian Foster, Linux FS Devel, Alexander Viro, David Howells, Eric Paris, Linux NFS list On 08/01/2013 12:39 PM, Miklos Szeredi wrote: > On Tue, Jul 30, 2013 at 03:30:45PM -0400, Ric Wheeler wrote: >> On 07/30/2013 12:16 PM, Miklos Szeredi wrote: >>>>> fs/fuse/dir.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >>>>> index a1d9047..83c217e 100644 >>>>> --- a/fs/fuse/dir.c >>>>> +++ b/fs/fuse/dir.c >>>>> @@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry >>>>> *entry, unsigned int flags) >>>>> if (!err) { >>>>> struct fuse_inode *fi = get_fuse_inode(inode); >>>>> if (outarg.nodeid != get_node_id(inode)) { >>>>> + if (!have_submounts(entry)) { >>>>> + shrink_dcache_parent(entry); >>>>> + d_drop(entry); >>>>> + } >>> Doing d_drop() on a subtree really has problems. Take this example: >>> >>> cd /mnt/foo/bar >>> mkdir my-mount-point >>> [->d_revalidate() drops "/mnt/foo"] >>> mount whatever on ./my-mount-point >>> cd / >>> >>> At this point that mount is not accessible in any way. The only way >>> to umount it is to lazy umount the parent mount. If the parent mount >>> was root, then that's not a practical thing to do. >>> >>> AFAICS nothing prevents this from happening on NFS and root privs are >>> not required (e.g. mounting a fuse fs on an NFS home dir). >>> > Here's a patch that tries to address the mount issue in NFS (and in FUSE in the > future). I haven't yet tested it, just interested in whether the concept looks > OK or not. > > Not sure if the ENOENT is the best error. Also I have a little fear that this > will cause a regression in some weird setup. Possibly we should use a new > dentry flag for unhashing the directory dentry specifically to invalidate it. > > Thanks, > Miklos Adding in the NFS list for broader review. I thought that the issue was primarily in FUSE itself, not seen in practice in NFS? Ric > > > > diff --git a/fs/dcache.c b/fs/dcache.c > index 87bdb53..5c51217 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1103,6 +1103,34 @@ rename_retry: > } > EXPORT_SYMBOL(have_submounts); > > +static bool __has_unlinked_ancestor(struct dentry *dentry) > +{ > + struct dentry *this; > + > + for (this = dentry; !IS_ROOT(this); this = this->d_parent) { > + if (d_unhashed(this)) > + return true; > + } > + return false; > +} > + > +/* > + * Called by mount code to check if the mountpoint is reachable (e.g. NFS can > + * unhash a directory dentry and then the complete subtree can become > + * unreachable). > + */ > +bool has_unlinked_ancestor(struct dentry *dentry) > +{ > + bool found; > + > + /* Need exclusion wrt. have_submounts() */ > + write_seqlock(&rename_lock); > + found = __has_unlinked_ancestor(dentry); > + write_sequnlock(&rename_lock); > + > + return found; > +} > + > /* > * Search the dentry child list of the specified parent, > * and move any unused dentries to the end of the unused > diff --git a/fs/internal.h b/fs/internal.h > index 7c5f01c..d232355 100644 > --- a/fs/internal.h > +++ b/fs/internal.h > @@ -126,6 +126,7 @@ extern int invalidate_inodes(struct super_block *, bool); > * dcache.c > */ > extern struct dentry *__d_alloc(struct super_block *, const struct qstr *); > +extern bool has_unlinked_ancestor(struct dentry *dentry); > > /* > * read_write.c > diff --git a/fs/namespace.c b/fs/namespace.c > index 7b1ca9b..bb92a9c 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -634,6 +634,15 @@ static struct mountpoint *new_mountpoint(struct dentry *dentry) > } > dentry->d_flags |= DCACHE_MOUNTED; > spin_unlock(&dentry->d_lock); > + > + if (has_unlinked_ancestor(dentry)) { > + spin_lock(&dentry->d_lock); > + dentry->d_flags &= ~DCACHE_MOUNTED; > + spin_unlock(&dentry->d_lock); > + kfree(mp); > + return ERR_PTR(-ENOENT); > + } > + > mp->m_dentry = dentry; > mp->m_count = 1; > list_add(&mp->m_hash, chain); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <51FAACD9.8020205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate [not found] ` <51FAACD9.8020205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-08-02 9:02 ` Miklos Szeredi [not found] ` <CAJfpegu1RXAYOGWyuCeckJ8WHfK=3dFc5bm1zsM=1Qt3zcfbsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Miklos Szeredi @ 2013-08-02 9:02 UTC (permalink / raw) To: Ric Wheeler Cc: Trond Myklebust, Anand Avati, Brian Foster, Linux FS Devel, Alexander Viro, David Howells, Eric Paris, Linux NFS list On Thu, Aug 1, 2013 at 8:45 PM, Ric Wheeler <rwheeler-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > Adding in the NFS list for broader review. > > I thought that the issue was primarily in FUSE itself, not seen in practice > in NFS? I haven't tested NFS (have but a single notebook here) but AFAICS the issue of not fuse specific but is inherent in the fact that NFS does d_drop() on a non-empty directory dentry. It does check for submounts *before*, but it doesn't do anything about submounts *after* (or during) the d_drop(). It's probably not something people complain about, because they choose mountpoints to be pretty stable points in the tree that don't get removed or moved around. Thanks, Miklos >> >> diff --git a/fs/dcache.c b/fs/dcache.c >> index 87bdb53..5c51217 100644 >> --- a/fs/dcache.c >> +++ b/fs/dcache.c >> @@ -1103,6 +1103,34 @@ rename_retry: >> } >> EXPORT_SYMBOL(have_submounts); >> +static bool __has_unlinked_ancestor(struct dentry *dentry) >> +{ >> + struct dentry *this; >> + >> + for (this = dentry; !IS_ROOT(this); this = this->d_parent) { >> + if (d_unhashed(this)) >> + return true; >> + } >> + return false; >> +} >> + >> +/* >> + * Called by mount code to check if the mountpoint is reachable (e.g. NFS >> can >> + * unhash a directory dentry and then the complete subtree can become >> + * unreachable). >> + */ >> +bool has_unlinked_ancestor(struct dentry *dentry) >> +{ >> + bool found; >> + >> + /* Need exclusion wrt. have_submounts() */ >> + write_seqlock(&rename_lock); >> + found = __has_unlinked_ancestor(dentry); >> + write_sequnlock(&rename_lock); >> + >> + return found; >> +} >> + >> /* >> * Search the dentry child list of the specified parent, >> * and move any unused dentries to the end of the unused >> diff --git a/fs/internal.h b/fs/internal.h >> index 7c5f01c..d232355 100644 >> --- a/fs/internal.h >> +++ b/fs/internal.h >> @@ -126,6 +126,7 @@ extern int invalidate_inodes(struct super_block *, >> bool); >> * dcache.c >> */ >> extern struct dentry *__d_alloc(struct super_block *, const struct qstr >> *); >> +extern bool has_unlinked_ancestor(struct dentry *dentry); >> /* >> * read_write.c >> diff --git a/fs/namespace.c b/fs/namespace.c >> index 7b1ca9b..bb92a9c 100644 >> --- a/fs/namespace.c >> +++ b/fs/namespace.c >> @@ -634,6 +634,15 @@ static struct mountpoint *new_mountpoint(struct >> dentry *dentry) >> } >> dentry->d_flags |= DCACHE_MOUNTED; >> spin_unlock(&dentry->d_lock); >> + >> + if (has_unlinked_ancestor(dentry)) { >> + spin_lock(&dentry->d_lock); >> + dentry->d_flags &= ~DCACHE_MOUNTED; >> + spin_unlock(&dentry->d_lock); >> + kfree(mp); >> + return ERR_PTR(-ENOENT); >> + } >> + >> mp->m_dentry = dentry; >> mp->m_count = 1; >> list_add(&mp->m_hash, chain); > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <CAJfpegu1RXAYOGWyuCeckJ8WHfK=3dFc5bm1zsM=1Qt3zcfbsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate [not found] ` <CAJfpegu1RXAYOGWyuCeckJ8WHfK=3dFc5bm1zsM=1Qt3zcfbsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-08-02 11:43 ` Jeff Layton 2013-08-02 14:30 ` Miklos Szeredi 0 siblings, 1 reply; 12+ messages in thread From: Jeff Layton @ 2013-08-02 11:43 UTC (permalink / raw) To: Miklos Szeredi Cc: Ric Wheeler, Trond Myklebust, Anand Avati, Brian Foster, Linux FS Devel, Alexander Viro, David Howells, Eric Paris, Linux NFS list On Fri, 2 Aug 2013 11:02:25 +0200 Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> wrote: > On Thu, Aug 1, 2013 at 8:45 PM, Ric Wheeler <rwheeler-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > > > Adding in the NFS list for broader review. > > > > I thought that the issue was primarily in FUSE itself, not seen in practice > > in NFS? > > I haven't tested NFS (have but a single notebook here) but AFAICS the > issue of not fuse specific but is inherent in the fact that NFS does > d_drop() on a non-empty directory dentry. It does check for submounts > *before*, but it doesn't do anything about submounts *after* (or > during) the d_drop(). It's probably not something people complain > about, because they choose mountpoints to be pretty stable points in > the tree that don't get removed or moved around. > > Thanks, > Miklos > > >> > >> diff --git a/fs/dcache.c b/fs/dcache.c > >> index 87bdb53..5c51217 100644 > >> --- a/fs/dcache.c > >> +++ b/fs/dcache.c > >> @@ -1103,6 +1103,34 @@ rename_retry: > >> } > >> EXPORT_SYMBOL(have_submounts); > >> +static bool __has_unlinked_ancestor(struct dentry *dentry) > >> +{ > >> + struct dentry *this; > >> + > >> + for (this = dentry; !IS_ROOT(this); this = this->d_parent) { > >> + if (d_unhashed(this)) > >> + return true; > >> + } > >> + return false; > >> +} > >> + > >> +/* > >> + * Called by mount code to check if the mountpoint is reachable (e.g. NFS > >> can > >> + * unhash a directory dentry and then the complete subtree can become > >> + * unreachable). > >> + */ > >> +bool has_unlinked_ancestor(struct dentry *dentry) > >> +{ > >> + bool found; > >> + > >> + /* Need exclusion wrt. have_submounts() */ > >> + write_seqlock(&rename_lock); > >> + found = __has_unlinked_ancestor(dentry); > >> + write_sequnlock(&rename_lock); > >> + > >> + return found; > >> +} > >> + > >> /* > >> * Search the dentry child list of the specified parent, > >> * and move any unused dentries to the end of the unused > >> diff --git a/fs/internal.h b/fs/internal.h > >> index 7c5f01c..d232355 100644 > >> --- a/fs/internal.h > >> +++ b/fs/internal.h > >> @@ -126,6 +126,7 @@ extern int invalidate_inodes(struct super_block *, > >> bool); > >> * dcache.c > >> */ > >> extern struct dentry *__d_alloc(struct super_block *, const struct qstr > >> *); > >> +extern bool has_unlinked_ancestor(struct dentry *dentry); > >> /* > >> * read_write.c > >> diff --git a/fs/namespace.c b/fs/namespace.c > >> index 7b1ca9b..bb92a9c 100644 > >> --- a/fs/namespace.c > >> +++ b/fs/namespace.c > >> @@ -634,6 +634,15 @@ static struct mountpoint *new_mountpoint(struct > >> dentry *dentry) > >> } > >> dentry->d_flags |= DCACHE_MOUNTED; > >> spin_unlock(&dentry->d_lock); > >> + > >> + if (has_unlinked_ancestor(dentry)) { > >> + spin_lock(&dentry->d_lock); > >> + dentry->d_flags &= ~DCACHE_MOUNTED; > >> + spin_unlock(&dentry->d_lock); > >> + kfree(mp); > >> + return ERR_PTR(-ENOENT); > >> + } > >> + > >> mp->m_dentry = dentry; > >> mp->m_count = 1; > >> list_add(&mp->m_hash, chain); > > > > Ok, took me a couple of times to look over the code but I think I understand the problem now... IIUC, then this patch should only ever cause this to return -ENOENT in a situation similar to the one in Anand's reproducer, right? The mountpoint-to-be was unlinked in another tree, and thus we found it to be invalid in the tree that we're mounting in. If so, then the dentry didn't exist at some point during the race window. Returning -ENOENT seems reasonable to me in that situation. It might cause some strange setups to regress, but aren't those already broken anyway? Or does the fact that NFS uses d_materialise_* helpers to hook these dentries back into the tree help paper over that? -- Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate 2013-08-02 11:43 ` Jeff Layton @ 2013-08-02 14:30 ` Miklos Szeredi [not found] ` <CAJfpegthEZJEQhus=4CnvR+yb+vGj5c85kUnn18SrR0S1wbbtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Miklos Szeredi @ 2013-08-02 14:30 UTC (permalink / raw) To: Jeff Layton Cc: Ric Wheeler, Trond Myklebust, Anand Avati, Brian Foster, Linux FS Devel, Alexander Viro, David Howells, Eric Paris, Linux NFS list On Fri, Aug 2, 2013 at 1:43 PM, Jeff Layton <jlayton@redhat.com> wrote: > Ok, took me a couple of times to look over the code but I think I > understand the problem now... > > IIUC, then this patch should only ever cause this to return -ENOENT in > a situation similar to the one in Anand's reproducer, right? The > mountpoint-to-be was unlinked in another tree, and thus we found it to > be invalid in the tree that we're mounting in. If so, then the dentry > didn't exist at some point during the race window. Returning -ENOENT > seems reasonable to me in that situation. Yes, that's one part of it and ENOENT fits perfectly. The other part is when the subtree is moved on another host. Yes, NFS can reconnect it, but only if it is accessed through the new location. Until then it will be inaccessible and the new location of the mountpoint not discoverable through /proc/mounts or in any way without outside knowledge. And there was a pre-existing mount under the moved directory we don't allow the d_drop in this "move" case either, and the mount is accessible through the old name. I seem to recall that there was a discussion about this back then and Linus was quite adamant about mountpoints not being allowed to be dissolved or moved without an explicit action on the localhost (i.e. something that happens on remote hosts shouldn't affect the status or location of mounts on the localhost). So what happens in this case: host1: cd /nfs/foo/bar host2: mv /nfs/foo /nfs/old-foo host2: mkdir /nfs/foo host1: ls /nfs/foo [drops "old-foo" and adds a new foo dentry] host1: mkdir bin [cwd is now not accessible from root] host1: mount --bind /bin ./bin [???] Currently that last one succeeds, with my patch it gives ENOENT, but that's not the best error, since the mountpoint does exist. Thanks, Miklos ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <CAJfpegthEZJEQhus=4CnvR+yb+vGj5c85kUnn18SrR0S1wbbtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate [not found] ` <CAJfpegthEZJEQhus=4CnvR+yb+vGj5c85kUnn18SrR0S1wbbtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-08-02 16:58 ` Jeff Layton 0 siblings, 0 replies; 12+ messages in thread From: Jeff Layton @ 2013-08-02 16:58 UTC (permalink / raw) To: Miklos Szeredi Cc: Ric Wheeler, Trond Myklebust, Anand Avati, Brian Foster, Linux FS Devel, Alexander Viro, David Howells, Eric Paris, Linux NFS list On Fri, 2 Aug 2013 16:30:23 +0200 Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> wrote: > On Fri, Aug 2, 2013 at 1:43 PM, Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > Ok, took me a couple of times to look over the code but I think I > > understand the problem now... > > > > IIUC, then this patch should only ever cause this to return -ENOENT in > > a situation similar to the one in Anand's reproducer, right? The > > mountpoint-to-be was unlinked in another tree, and thus we found it to > > be invalid in the tree that we're mounting in. If so, then the dentry > > didn't exist at some point during the race window. Returning -ENOENT > > seems reasonable to me in that situation. > > Yes, that's one part of it and ENOENT fits perfectly. > > The other part is when the subtree is moved on another host. Yes, NFS > can reconnect it, but only if it is accessed through the new location. > Until then it will be inaccessible and the new location of the > mountpoint not discoverable through /proc/mounts or in any way without > outside knowledge. > > And there was a pre-existing mount under the moved directory we don't > allow the d_drop in this "move" case either, and the mount is > accessible through the old name. I seem to recall that there was a > discussion about this back then and Linus was quite adamant about > mountpoints not being allowed to be dissolved or moved without an > explicit action on the localhost (i.e. something that happens on > remote hosts shouldn't affect the status or location of mounts on the > localhost). > > So what happens in this case: > > host1: cd /nfs/foo/bar > host2: mv /nfs/foo /nfs/old-foo > host2: mkdir /nfs/foo > host1: ls /nfs/foo [drops "old-foo" and adds a new foo dentry] > host1: mkdir bin [cwd is now not accessible from root] > host1: mount --bind /bin ./bin [???] > > Currently that last one succeeds, with my patch it gives ENOENT, but > that's not the best error, since the mountpoint does exist. > Ok, good point... That's a tricky situation. We're rejecting the mount there because we can't _currently_ reach the mountpoint from root. It could become reachable later though, at which point you could mount on there just fine... It almost sounds like it could use a new error code (EUNREACH?). Or, maybe you could repurpose ENOLINK? In any case, I'd be inclined not to worry about it and just go with -ENOENT there. If someone complains we could consider a new error for that case later. -- Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate 2013-07-30 16:16 ` Miklos Szeredi 2013-07-30 19:30 ` Ric Wheeler @ 2013-08-02 12:17 ` Jeff Layton 2013-08-02 14:42 ` Miklos Szeredi 1 sibling, 1 reply; 12+ messages in thread From: Jeff Layton @ 2013-08-02 12:17 UTC (permalink / raw) To: Miklos Szeredi Cc: Ric Wheeler, Anand Avati, Brian Foster, Linux FS Devel, fuse-devel, Alexander Viro, David Howells, Eric Paris On Tue, 30 Jul 2013 18:16:55 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > >> fs/fuse/dir.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > >> index a1d9047..83c217e 100644 > >> --- a/fs/fuse/dir.c > >> +++ b/fs/fuse/dir.c > >> @@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry > >> *entry, unsigned int flags) > >> if (!err) { > >> struct fuse_inode *fi = get_fuse_inode(inode); > >> if (outarg.nodeid != get_node_id(inode)) { > >> + if (!have_submounts(entry)) { > >> + shrink_dcache_parent(entry); > >> + d_drop(entry); > >> + } > > Doing d_drop() on a subtree really has problems. Take this example: > > cd /mnt/foo/bar > mkdir my-mount-point > [->d_revalidate() drops "/mnt/foo"] > mount whatever on ./my-mount-point > cd / > > At this point that mount is not accessible in any way. The only way > to umount it is to lazy umount the parent mount. If the parent mount > was root, then that's not a practical thing to do. > > AFAICS nothing prevents this from happening on NFS and root privs are > not required (e.g. mounting a fuse fs on an NFS home dir). > > The other problem is that, unlike NFS, fuse doesn't currently > reconnect these subtrees when it finds them at a different point in > the tree. d_drop on it just makes things worse because at that point > that subtree will not be accessible anymore (while the fuse fs is > mounted, that is). This could be fixed pretty easily by using the > d_materialise_*() helpers. > Yes, but... This works on NFS since we have an expectation that we can identify an inode again when we see it. Given some of the strange userland filesystems that FUSE supports, does that expectation hold there? If not then you might still end up with disconnected subtrees. > But issues with mounting over an unhashed subtree *should* be addressed. > > Recursively d_drop()-ing might still be the simplest way. Need to > make sure the subtree doesn't change in the meantime. Holding > rename_lock might do the trick. But then there's still an unlikely > race with mount()... > > Thanks, > Miklos > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate 2013-08-02 12:17 ` Jeff Layton @ 2013-08-02 14:42 ` Miklos Szeredi 2013-08-02 17:32 ` Jeff Layton 0 siblings, 1 reply; 12+ messages in thread From: Miklos Szeredi @ 2013-08-02 14:42 UTC (permalink / raw) To: Jeff Layton Cc: Ric Wheeler, Anand Avati, Brian Foster, Linux FS Devel, fuse-devel, Alexander Viro, David Howells, Eric Paris On Fri, Aug 2, 2013 at 2:17 PM, Jeff Layton <jlayton@redhat.com> wrote: > On Tue, 30 Jul 2013 18:16:55 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: >> The other problem is that, unlike NFS, fuse doesn't currently >> reconnect these subtrees when it finds them at a different point in >> the tree. d_drop on it just makes things worse because at that point >> that subtree will not be accessible anymore (while the fuse fs is >> mounted, that is). This could be fixed pretty easily by using the >> d_materialise_*() helpers. >> > > Yes, but... > > This works on NFS since we have an expectation that we can identify an > inode again when we see it. Given some of the strange userland > filesystems that FUSE supports, does that expectation hold there? If > not then you might still end up with disconnected subtrees. We'll end up with disconnected subtrees in NFS as well. That state can remain indefinitely. It will either be reconnected when we come accross the inode "by chance" or dissolved when it is no longer referenced and the dentries reclaimed. And as long as there are no mounts under the disconnected subtree, there's no big problem. If some strange filesystem doesn't support identifying a disconnected subtree it will just not be reconnected. But that can happen with NFS as well, if the new location is never accessed, so it's not something new. Thanks, Miklos ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate 2013-08-02 14:42 ` Miklos Szeredi @ 2013-08-02 17:32 ` Jeff Layton 0 siblings, 0 replies; 12+ messages in thread From: Jeff Layton @ 2013-08-02 17:32 UTC (permalink / raw) To: Miklos Szeredi Cc: Ric Wheeler, Anand Avati, Brian Foster, Linux FS Devel, fuse-devel, Alexander Viro, David Howells, Eric Paris On Fri, 2 Aug 2013 16:42:00 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Aug 2, 2013 at 2:17 PM, Jeff Layton <jlayton@redhat.com> wrote: > > On Tue, 30 Jul 2013 18:16:55 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > > >> The other problem is that, unlike NFS, fuse doesn't currently > >> reconnect these subtrees when it finds them at a different point in > >> the tree. d_drop on it just makes things worse because at that point > >> that subtree will not be accessible anymore (while the fuse fs is > >> mounted, that is). This could be fixed pretty easily by using the > >> d_materialise_*() helpers. > >> > > > > Yes, but... > > > > This works on NFS since we have an expectation that we can identify an > > inode again when we see it. Given some of the strange userland > > filesystems that FUSE supports, does that expectation hold there? If > > not then you might still end up with disconnected subtrees. > > We'll end up with disconnected subtrees in NFS as well. That state > can remain indefinitely. It will either be reconnected when we come > accross the inode "by chance" or dissolved when it is no longer > referenced and the dentries reclaimed. > > And as long as there are no mounts under the disconnected subtree, > there's no big problem. > > If some strange filesystem doesn't support identifying a disconnected > subtree it will just not be reconnected. But that can happen with NFS > as well, if the new location is never accessed, so it's not something > new. > Ok, makes sense. So to summarize...the main issue you had is the case where a mount races in just before you shrink the subtree? If so, then I guess the patch you proposed earlier in this thread should take care of that (even if the error code returned isn't ideal). With that patch in place, would Anand's patch then be reasonable, or do you think other changes are needed there to make that safe? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-08-02 17:32 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20130725055209.GA15621@sh-el5.eng.rdu2.redhat.com> 2013-07-25 14:42 ` [PATCH] [REPOST] fuse: drop dentry on failed revalidate Ric Wheeler 2013-07-30 16:16 ` Miklos Szeredi 2013-07-30 19:30 ` Ric Wheeler 2013-08-01 16:39 ` Miklos Szeredi [not found] ` <20130801163940.GA1356-nYI/l+Q8b4r16c5iV7KQqR1Qg9XOENNVk/YoNI2nt5o@public.gmane.org> 2013-08-01 18:45 ` Ric Wheeler [not found] ` <51FAACD9.8020205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-08-02 9:02 ` Miklos Szeredi [not found] ` <CAJfpegu1RXAYOGWyuCeckJ8WHfK=3dFc5bm1zsM=1Qt3zcfbsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-08-02 11:43 ` Jeff Layton 2013-08-02 14:30 ` Miklos Szeredi [not found] ` <CAJfpegthEZJEQhus=4CnvR+yb+vGj5c85kUnn18SrR0S1wbbtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-08-02 16:58 ` Jeff Layton 2013-08-02 12:17 ` Jeff Layton 2013-08-02 14:42 ` Miklos Szeredi 2013-08-02 17:32 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).