linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Kinglong Mee <kinglongmee@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	linux-fsdevel@vger.kernel.org,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	NeilBrown <neilb@suse.de>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Steve Dickson <SteveD@redhat.com>
Subject: Re: [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on
Date: Sat, 6 Jun 2015 03:21:58 +0100	[thread overview]
Message-ID: <20150606022158.GZ7232@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20150605150213.GV7232@ZenIV.linux.org.uk>

On Fri, Jun 05, 2015 at 04:02:13PM +0100, Al Viro wrote:
> On Sun, May 24, 2015 at 11:10:50PM +0800, Kinglong Mee wrote:
> > --- a/fs/nfsd/export.c
> > +++ b/fs/nfsd/export.c
> > @@ -43,9 +43,9 @@ static void expkey_put(struct kref *ref)
> >  
> >  	if (test_bit(CACHE_VALID, &key->h.flags) &&
> >  	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
> > -		path_put(&key->ek_path);
> > +		path_put_unpin(&key->ek_path, &key->ek_pin);
> >  	auth_domain_put(key->ek_client);
> > -	kfree(key);
> > +	kfree_rcu(key, rcu_head);
> >  }
> 
> That looks wrong.  OK, so you want umount() to proceed; fine, no problem
> with that.  However, what happens if the final mntput() hits while you
> are just approaching that path_put_unpin()?  ->kill() will be triggered,
> and it would bloody better
> 	a) make sure that expkey_put() is called for that key if it hadn't
> already been done and
> 	b) do not return until such expkey_put() completes.  Including the
> ones that might have been already entered by the time we'd got to ->kill().
> 
> Am I missing something subtle here?

Having looked through that code...  It *is* wrong.  Note that the normal
approach is to have pin_remove() called via pin_kill(), directly or triggered
from group_pin_kill() and/or cleanup_mnt() on the mount it's attached to.
pin_remove() should never be called outside of ->kill() callbacks.  It should
be called at the point where you are OK with fs being shut down.

The fundamental reason why it's broken is different, though - you *can't*
grab a reference if all you've got is a pin.  By the time the callback is
called, the mount in question is already irretrievably committed to being
killed.  There's one hell of a wide window between the point of no return
and the point where you are notified of anything, and that's by design -
you might very well have had several mounts doomed by a syscall and they
all get through cleanup_mnt() just before return to userland.  One by one.
So between the point where this puppy is doomed and the call of your callback
there might have been several filesystems going through shutdown, with tons
of IO, waiting for remote servers, etc.

We could add a primitive that would _try_ to grab a reference - that can
be done (lock_mount_hash(), check if it has MNT_DOOMED or MNT_SYNC_UMOUNT,
fail if it does, otherwise mnt_add_count(mnt, 1) and succeed, doing
unlock_mount_hash() on both exit paths).  HOWEVER, you'll need to think
very carefully where to use that primitive - unlike mntget() it _can_
fail and lock_mount_hash() can inflict quite a bit of cacheline pingpong
if used heavily.

Could you give details on lifecycle of those objects, including the stages
at which we might try to grab references?  Combination of such primitive with
a pin (doing just "NULL the references to vfsmount/dentry, do dput() on
what that dentry used to be and call pin_remove()") might work, if the
lifecycle is good enough.

  reply	other threads:[~2015-06-06  2:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-24 15:01 [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache Kinglong Mee
2015-05-24 15:10 ` [PATCH 1/5 v2] fs_pin: Fix uninitialized value in fs_pin Kinglong Mee
2015-05-24 15:10 ` [PATCH 2/5 v2] fs_pin: Export functions for specific filesystem Kinglong Mee
2015-05-24 15:10 ` [PATCH 3/5 v2] path: New helpers path_get_pin/path_put_unpin for path pin Kinglong Mee
2015-05-24 15:10 ` [PATCH 4/5 v2] sunrpc: New helper cache_force_expire for cache cleanup Kinglong Mee
2015-05-24 15:10 ` [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on Kinglong Mee
2015-06-05 15:02   ` Al Viro
2015-06-06  2:21     ` Al Viro [this message]
2015-06-06 13:38       ` Kinglong Mee
2015-06-01 18:21 ` [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache J. Bruce Fields
2015-06-02  1:41   ` Kinglong Mee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150606022158.GZ7232@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=SteveD@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=kinglongmee@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).