linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@poochiereds.net>
To: Al Viro <viro@ZenIV.linux.org.uk>, linux-nfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: parallel lookups on NFS
Date: Sun, 24 Apr 2016 08:46:15 -0400	[thread overview]
Message-ID: <1461501975.5219.40.camel@poochiereds.net> (raw)
In-Reply-To: <20160424023453.GK25498@ZenIV.linux.org.uk>

On Sun, 2016-04-24 at 03:34 +0100, Al Viro wrote:
> 	There's a fun problem - for all the complaints about evil, crude
> VFS exclusion not letting the smart filesystem developers Do It Right(tm),
> NFS has a homegrown kinda-sorta rwsem, with delayed unlinks being readers
> and lookups - writers.
> 
> 	IOW, nfs_block_sillyrename() still yields lookup/lookup exclusion,
> even with ->i_mutex replaced with rwsem and ->lookup() calls happening in
> parallel.  What's more, the thing is very much writer(==lookup)-starving.
> 
> 	What kind of ordering do we really want there?  Existing variant
> is very crude - lookups (along with readdir and atomic_open) are writers,
> delayed unlinks - readers, and there's no fairness whatsoever; if delayed
> unlink comes during lookup, it is put on a list and once lookup is done,
> everything on that list is executed.  Moreover, any unlinks coming during
> the execution of those are executed immediately.  And no lookup (in that
> directory) is allowed until there's no unlinks in progress.
> 
> 	Creating a storm of delayed unlinks isn't hard - open-and-unlink
> a lot, then exit and you've got it...
> 
> 	Suggestions?  Right now my local tree has nfs_lookup() and
> nfs_readdir() run with directory locked shared.  And they are still
> serialized by the damn ->silly_count ;-/
> 

Hmm...well, most of that was added in commit 565277f63c61. Looking at
the bug referenced in that commit log, I think that the main thing we
want to ensure is that rmdir waits until all of the sillydeletes for
files in its directory have finished.

But...there's also some code to ensure that if a lookup races in while
we're doing the sillydelete that we transfer all of the dentry info to
the new alias. That's the messy part.

The new infrastructure for parallel lookups might make it simpler
actually. When we go to do the sillydelete, could we add the dentry to
the "lookup in progress" hash that you're adding as part of the
parallel lookup infrastructure? Then the tasks doing lookups could find
it and just wait on the sillydelete to complete. After the sillydelete,
we'd turn the thing into a negative dentry and then wake up the waiters
(if any). That does make the whole dentry teardown a bit more complex
though.

OTOH...Al, I think you mentioned at one time that you thought the whole
sillydelete mechanism was overly-complicated, and that it might be
cleaner to do this somehow in f_op->release? I don't recall the details
of what you had in mind at the time, but it might be good to rethink
the whole mess.

> 	Incidentally, why does nfs_complete_unlink() recheck ->d_flags?
> The caller of ->d_iput() is holding the only reference to dentry; who and
> what could possibly clear DCACHE_NFSFS_RENAMED between the checks in
> nfs_dentry_iput() and nfs_complete_unlink()?

Yeah, that looks superfluous. I imagine that can be removed.

-- 
Jeff Layton <jlayton@poochiereds.net>

  reply	other threads:[~2016-04-24 12:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-24  2:34 parallel lookups on NFS Al Viro
2016-04-24 12:46 ` Jeff Layton [this message]
2016-04-24 19:18   ` Al Viro
2016-04-24 20:51     ` Jeff Layton
2016-04-29  7:58     ` Al Viro
2016-04-30 13:15       ` Jeff Layton
2016-04-30 13:22         ` Jeff Layton
2016-04-30 14:22           ` Al Viro
2016-04-30 14:43             ` Jeff Layton
2016-04-30 18:58               ` Al Viro
2016-04-30 19:29                 ` Al Viro
     [not found]                   ` <1462048765.10011.44.camel@poochiereds.net>
2016-04-30 20:57                     ` Al Viro
2016-04-30 22:17                       ` Jeff Layton
2016-04-30 22:33                       ` Jeff Layton
2016-04-30 23:31                         ` Al Viro
2016-05-01  0:02                           ` Al Viro
2016-05-01  0:18                             ` Al Viro
2016-05-01  1:08                               ` Al Viro
2016-05-01 13:35                                 ` Jeff Layton
2016-04-30 23:23                       ` Jeff Layton
2016-04-30 23:29                         ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1461501975.5219.40.camel@poochiereds.net \
    --to=jlayton@poochiereds.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=trond.myklebust@primarydata.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).