From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: Re: [PATCH] [REPOST] fuse: drop dentry on failed revalidate Date: Thu, 25 Jul 2013 10:42:16 -0400 Message-ID: <51F13948.3050100@redhat.com> References: <20130725055209.GA15621@sh-el5.eng.rdu2.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: miklos@szeredi.hu, bfoster@redhat.com, Linux FS Devel , fuse-devel@lists.sourceforge.net, Alexander Viro , David Howells , Eric Paris To: Anand Avati Return-path: Received: from mx1.redhat.com ([209.132.183.28]:48366 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755609Ab3GYOm2 (ORCPT ); Thu, 25 Jul 2013 10:42:28 -0400 In-Reply-To: <20130725055209.GA15621@sh-el5.eng.rdu2.redhat.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 07/25/2013 01:52 AM, Anand Avati wrote: > Consider the following sequence of operations: > > // mount same backend at two places > > # mount -t fuse /mnt1 > # mount -t fuse /mnt2 > > // create a directory chain from 1 > > $ mkdir -p /mnt1/a/b > > // load it in 2's cache > > $ stat /mnt2/a/b # load it in cache > > // recreate same names from 1 > > $ rm -rf /mnt1/a > $ mkdir -p /mnt1/a/b > > // sleep long enough for entry_timeout to expire > > $ sleep 5 > > // access /mnt2/a/b from two threads in parallel > > $ stat /mnt2/a/b & stat /mnt2/a/b > > Depending on the race, none/either/both of the commands > executed in the last step can fail. > > This is because both the stat command threads execute the > resolver in parallel. > > - The resolver function lookup_fast() will acquire the dentry > (of /mnt2/a) reference with __d_lookup() > > - Call to d_revalidate() on the just acquired dentry will fail, > (i.e return 0) as FUSE gets a new nodeid from the server. > > - In the mean time another resolver thread enters lookup_fast() > and acquires the dentry of /mnt2/a with __d_lookup(), effectively > making dentry->d_count > 1 [+ child refs] > > - Now when first thread calls d_invalidate() because of the failed > d_revalidate(), d_invalidate() will find that even after calling > shrink_dcache_parent() we are left with d_count > 1, and fails > d_invalidate() with EBUSY. > > - The failed d_invalidate() makes the resolver use this "stale" dentry > as the result of this walk_component() call -- even though it just > witnessed d_revalidate() fail on it, only because d_invalidate() > could not succeed because of an innocent concurrent resolver in > progress. > > - Using the stale dentry (and inode), the call progress and stubles > with an error as the FUSE server is presented with a dead inode. > > - The other thread would fail in d_revalidate() too, and depending > on the progress relaitvely made between the two, the second > thread's d_invalidate() might get an EBUSY too, and stuble in the > same way as the first thread. > > If the same stat commands were issued serially, both would succeed. > > NFS is faced with a similar situation as FUSE (and in many other ways > in general too) and it checks for a submounts and conditionally calls > d_drop(). The call to d_drop() within ->d_revalidate() guarantees the > success of d_invalidate(), and a fresh lookup would be issued there on. > > Signed-off-by: Anand Avati > --- Adding in Al and others to the thread & correcting the fsdevel address :) Ric > > Background: > > The previous submission of this patch (on fuse-devel) had review comments > to investigate doing a d_drop() on the entire subtree rather than just > on the entry. That approach seems to be very complex. So reposting the > same patch to kick in the discussion again. This patch follows the NFS > approach to the problem. > > fs/fuse/dir.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index a1d9047..83c217e 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) > if (!err) { > struct fuse_inode *fi = get_fuse_inode(inode); > if (outarg.nodeid != get_node_id(inode)) { > + if (!have_submounts(entry)) { > + shrink_dcache_parent(entry); > + d_drop(entry); > + } > fuse_queue_forget(fc, forget, outarg.nodeid, 1); > return 0; > }