From: NeilBrown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>, NFS <linux-nfs@vger.kernel.org>
Subject: Re: [patch/rfc] allow exported (and *not* exported) filesystems to be unmounted.
Date: Mon, 8 Jul 2013 17:30:03 +1000 [thread overview]
Message-ID: <20130708173003.131cd901@notabene.brown> (raw)
In-Reply-To: <20130702155059.GD31697@fieldses.org>
[-- Attachment #1: Type: text/plain, Size: 6561 bytes --]
On Tue, 2 Jul 2013 11:50:59 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:
> On Tue, Jul 02, 2013 at 08:24:13AM +1000, NeilBrown wrote:
> > On Mon, 1 Jul 2013 15:12:38 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > wrote:
> >
> > > On Thu, Jun 06, 2013 at 10:05:12AM +1000, NeilBrown wrote:
> > > > On Wed, 5 Jun 2013 09:36:58 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > > > wrote:
> > > >
> > > > > On Wed, Jun 05, 2013 at 04:19:34PM +1000, NeilBrown wrote:
> > > > > > On Wed, 5 Jun 2013 04:41:15 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > > > > >
> > > > > > > On Wed, Jun 05, 2013 at 01:05:41PM +1000, NeilBrown wrote:
> > > > > > > >
> > > > > > > > Hi Bruce,
> > > > > > > > this is a little issue that seems to keep coming up so I thought it might be
> > > > > > > > time to fix it.
> > > > > > > >
> > > > > > > > As you know, a filesystem that is exported cannot be unmounted as the export
> > > > > > > > cache holds a reference to it. Though if it hasn't been accessed for a
> > > > > > > > while then it can.
> > > > > > > >
> > > > > > > > As I hadn't realised before sometimes *non* exported filesystems can be
> > > > > > > > pinned to. A negative entry in the cache can pin a filesystem just as
> > > > > > > > easily as a positive entry.
> > > > > > > > An amusing, if somewhat contrived, example is that if you export '/' with
> > > > > > > > crossmnt and:
> > > > > > > >
> > > > > > > > mount localhost:/ /mnt
> > > > > > > > ls -l /
> > > > > > > > umount /mnt
> > > > > > > >
> > > > > > > > the umount might fail. This is because the "ls -l" tried to export every
> > > > > > > > filesystem found mounted in '/'. The export of "/mnt" failed of course
> > > > > > > > because you cannot re-export an NFS filesystem. But it is still in the
> > > > > > > > cache.
> > > > > > > > An 'exportfs -f' fixes this, but shouldn't be necessary.
> > > > >
> > > > > Yeah, ugh. As a less contrived example, can the default v4 root export
> > > > > lead to arbitrary filesystems being pinned just because a client tried
> > > > > to mount the wrong path?
> > > >
> > > > I think it can only pin filesystems that are exported, either explicitly or
> > > > via a parent being exported with 'crossmnt'.
> > >
> > > But see utils/mountd/v4root.c, and:
> > >
> > > [root@server ~]# exportfs -v
> > > /export <world>(rw,...)
> > > [root@server ~]# mount /mnt
> > >
> > > [root@pip4 ~]# mount pip4:/ /mnt/
> > > [root@pip4 ~]# ls -l /mnt/
> > > total 4
> > > drwxrwxrwt 3 root root 4096 Jun 7 10:34 export
> > > [root@pip4 ~]#
> > >
> > > [root@server ~]# umount /mnt/
> > > umount: /mnt: target is busy.
> > > ...
> > > [root@server ~]# grep /mnt /proc/net/rpc/nfsd.export/content
> > > # /mnt *()
> >
> > You make a good point. I didn't think that would happen, and I think I could
> > argue that it is a bug.
>
> Definitely looks like a bug to me.
>
> > I have no idea how easy it would be to "fix" without
> > pouring over the code for a while though. Or whether it is worth "fixing".
>
> As long as clients are mostly just LOOKUPing down to the export they
> care about people may not hit this too much, but no doubt somebody will
> hit this eventually.
>
> > > > > Could the export cache be indexed on a path string instead of a struct
> > > > > path?
> > > >
> > > > Yes. It would mean lots of extra pathname lookups and possibly lots of
> > > > "d_path()" calls.
> > >
> > > Right. Ugh. Still, struct path seems wrong as it's not something gssd
> > > knows about, and there's not really a 1-1 mapping between the two (see
> > > e.g. the recent complaint about the case where the struct path
> > > represents a lazy-unmounted export
> > > http://mid.gmane.org/<20130625191008.GA20277@us.ibm.com> ).
> >
> > I noticed that but haven't really followed it (though I just checked and
> > there isn't much to follow...)
> >
> > What do we *want* to happen in this case? I would argue that when the
> > filesystem is detached the export should immediately become invalid.
> >
> > We could possibly put a check in exp_find_key() to see if ek->ek_path
> > was still attached (not sure how to do that) and if it is: invalidate the ek
> > before calling cache_check(). If the "is path detach" test is cheap, we
> > should probably do this.
> >
> > Or - we could flush relevant entries from the cache whenever there is a
> > change in the mount table. That is certainly the preferred option if the "is
> > path detached" test is at all expensive. But it seems it isn't completely
> > clear how that flushing should be triggered...
>
> There are probably other ways to get this kind of hang too: mounting
> over an export? mount --move?
I could argue that "mount --move" should trigger a flush just like umount
does. Mounting over an export (or mounting over a parent of an export) is
not so easy.
The core problem which causes the hang is that the path requested by
svc_export_request cannot be used to refer to the exp->ex_path.
This is something that svc_export_request could check.
e.g. after "d_path", it could "kern_path(pth, 0, &old_path);" and if that
fails or old_path doesn't match ex_path, then somehow invalidate the cache
entry.... though that is a bit awkward. We really want to catch the problem
at exp_find_key time. Doing it there would be more of a performance burden.
Can d_path tell us that the path isn't reachable?
I think __d_path can, but that isn't exported (yet). Using that could avoid
the "kern_path()" call.
So I'm thinking:
- in svc_export_request() use __d_path (which needs to be exported)
- if that fails, set a new CACHE_STALE flag on the cache_head
- cache_read notices CACHE_STALE and calls cache_fresh_unlocked() (I think)
to remove it from the queue.
- cache_is_valid treats CACHE_STALE like CACHE_NEGATIVE only -ESTALE is
returned
- if exp_find() gets -ESTALE from exp_get_by_name, it sets CACHE_STALE on
'ek'.
It gets a bit complicated doesn't it?
An alternative is that if rpc.mountd gets a request for a path that doesn't
exist, or that when it responds to a request, the request remains in the
channel, then it simply flushes everything.
This is a bit heavy handed, but it is a rare occurrence so maybe that doesn't
matter.
It doesn't solve my desire for umount of exported filesystems to "just work"
though.
NeilBrown
-
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-07-08 7:30 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-05 3:05 [patch/rfc] allow exported (and *not* exported) filesystems to be unmounted NeilBrown
2013-06-05 3:41 ` Al Viro
2013-06-05 6:19 ` NeilBrown
2013-06-05 13:36 ` J. Bruce Fields
2013-06-06 0:05 ` NeilBrown
2013-07-01 19:12 ` J. Bruce Fields
2013-07-01 22:24 ` NeilBrown
2013-07-02 15:50 ` J. Bruce Fields
2013-07-08 7:30 ` NeilBrown [this message]
2013-07-08 20:04 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130708173003.131cd901@notabene.brown \
--to=neilb@suse.de \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).