From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
NFS <linux-nfs@vger.kernel.org>
Subject: Re: [patch/rfc] allow exported (and *not* exported) filesystems to be unmounted.
Date: Wed, 5 Jun 2013 16:19:34 +1000 [thread overview]
Message-ID: <20130605161934.59ab6804@notabene.brown> (raw)
In-Reply-To: <20130605034115.GD13110@ZenIV.linux.org.uk>
[-- Attachment #1: Type: text/plain, Size: 4103 bytes --]
On Wed, 5 Jun 2013 04:41:15 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Wed, Jun 05, 2013 at 01:05:41PM +1000, NeilBrown wrote:
> >
> > Hi Bruce,
> > this is a little issue that seems to keep coming up so I thought it might be
> > time to fix it.
> >
> > As you know, a filesystem that is exported cannot be unmounted as the export
> > cache holds a reference to it. Though if it hasn't been accessed for a
> > while then it can.
> >
> > As I hadn't realised before sometimes *non* exported filesystems can be
> > pinned to. A negative entry in the cache can pin a filesystem just as
> > easily as a positive entry.
> > An amusing, if somewhat contrived, example is that if you export '/' with
> > crossmnt and:
> >
> > mount localhost:/ /mnt
> > ls -l /
> > umount /mnt
> >
> > the umount might fail. This is because the "ls -l" tried to export every
> > filesystem found mounted in '/'. The export of "/mnt" failed of course
> > because you cannot re-export an NFS filesystem. But it is still in the
> > cache.
> > An 'exportfs -f' fixes this, but shouldn't be necessary.
> >
> > So this RFC patch makes it possible to register a notifier which gets
> > called on unmount, and links the export table in to the notifier chain.
> >
> > The "atomic" flavour is used so that notifiers can be registered under a
> > spin_lock. This is needed for "expkey_update" as ->update is called under a
> > lock.
> >
> > As notifier callees cannot unregister themselves, the unregister needs to
> > happen in a workqueue item, and the unmount will wait for that.
> >
> > It seems to work for me (once I figured out all the locking issues and found
> > a way to make it work without deadlocking).
> >
> > If you are OK with in in general I'll make it into a proper patch series and
> > include Al Viro for the VFS bits.
>
> > @@ -1201,6 +1234,11 @@ static int do_umount(struct mount *mnt, int flags)
> > sb->s_op->umount_begin(sb);
> > }
> >
> > + /* Some in-kernel users (nfsd) might need to be asked to release
> > + * the filesystem
> > + */
> > + umount_notifier_call(mnt);
>
> NAK. I'm sorry, but it's a fundamentally wrong approach - there are _tons_
> of places where vfsmount could get evicted (consider shared-subtree umount
> propagation, for starters), not to mention that notifiers tend to be
> the wrong answer to just about any question.
>
> I'd suggest looking at what kernel/acct.c is doing; I'm absolutely serious
> about notifiers being unacceptable BS. If you want something generic,
> consider turning ->mnt_pinned into a list of callbacks, with mntput_no_expire
> calling them one by one; calling acct_auto_close_mnt() would be replaced with
> callbacks, each doing single acct_file_reopen(acct, NULL, NULL).
>
> I'm about to fall asleep right now, so any further details will have to wait
> until tomorrow; sorry...
When tomorrow does come:
- Can you say why you don't like notifiers?
- mnt_pinned handling happens *after* the unmount has happened. The unmount
is effectively always 'lazy' with respect to pinning users. I don't think
I want that.
If an NFS request is in progress when the unmount is requested, I think it
should fail, and with my code it will - the notifier handler will expire
the cache entries but they will continue to exist until the last user goes
away.
For most requests it probably wouldn't matter if they continued on a
lazy-unmounted filesystem, but the NFSv4 LOOKUPP (lookup parent)
request might get confused.
- Your point about shared subtree umount certainly deserved consideration.
It is probably time I wrapped my mind around that that really means.
Putting the umount_notifier_call() call in do_refcount_check() might
almost be the right place, except that it would introduce even more
locking problems (unless it is OK to call flush_sheduled_work() under
vfsmount_lock).
Thanks for the feedback. I'm very open to other ideas.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-06-05 6:19 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-05 3:05 [patch/rfc] allow exported (and *not* exported) filesystems to be unmounted NeilBrown
2013-06-05 3:41 ` Al Viro
2013-06-05 6:19 ` NeilBrown [this message]
2013-06-05 13:36 ` J. Bruce Fields
2013-06-06 0:05 ` NeilBrown
2013-07-01 19:12 ` J. Bruce Fields
2013-07-01 22:24 ` NeilBrown
2013-07-02 15:50 ` J. Bruce Fields
2013-07-08 7:30 ` NeilBrown
2013-07-08 20:04 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130605161934.59ab6804@notabene.brown \
--to=neilb@suse.de \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).