All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
Cc: Eric Biederman <ebiederm@xmission.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC] umount/__detach_mounts() race
Date: Mon, 21 Feb 2022 19:53:04 +0000	[thread overview]
Message-ID: <YhPtoCCjMCPDBzaz@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <YhMAy1WseafC+uIv@zeniv-ca.linux.org.uk>

	BTW, for the folks not familiar with the area -
refcounting rules for mounts are

	(1) being a child of some mount contributes 1.  That
applies to all mounts.
	(2) being a part of a tree of some namespace contributes
number of children; ditto for a tree that had never been in any
namespace.
	(3) being a part of decaying fragment (something detached
by umount_tree()) does not contribute anything.  (1) still applies,
but (2) does not.
	(4) being a root of some namespace contributes 1.
	(5) being a part of a list passing through ->mnt_umount
contributes 1.

Everything else is due to explicit references stored in some objects
other than mnt_namespace and mount instances, including local variables,
etc.

Decaying fragments are distinguishable by having MNT_UMOUNT on all
mounts in them (namespace_lock() stabilizes that, just as it does to
everything else tree-related).  There should never be a mix of
MNT_UMOUNT and non-MNT_UMOUNT mounts in any tree.

Everything in decaying fragment also has
	NULL ->mnt_ns
	empty ->mnt_list
	NULL ->mnt_master
	empty ->mnt_shared/->mnt_slave/->mnt_slave_list
	empty ->mnt_expire

Decaying fragments can fall apart into smaller subtrees.
If nothing else, that happens when there's no more external
references to the root of fragment - its children are torn away,
moved into a list that passes through their ->mnt_umount
(see mntput_no_expire(), circa line 1210) and later dropped
(cleanup_mnt(), circa line 1138).  It also happens when
mountpoint of something in a fragment gets invalidated
(see __detach_mounts(), MNT_UMOUNT case).

However, that's the only thing that can happen to those fragments -
they can't get anything mounted/unmounted/moved.  Stepping on
something like NFS referral point fails, instead of automounting
anything, etc.

	umount_tree() should never be called on those - it's about
tearing some part of live trees (whether they are in a namespace
or not) into decaying fragments and collecting the roots of those
fragments into a list ('unmounted', passing through ->mnt_umount)
for the subsequent namespace_unlock() to drop.

Another thing: namespace root may go into decaying fragment (which can
happen only if it's explicitly passed as argument to umount_tree())
only when the namespace is about to be freed.  I think we should be
OK there, but I'm not happy with the proof - it's too convoluted and
smells like it might be brittle.  Might be worth making that more
straightforward...

Another fun area is handling of internal mounts (which is what got
me started on that round of code review).  Both in terms of
RCU delays between ->mnt_ns going NULL and in terms of sync
behaviour needed in final mntput() for those.  Final mntput()
also involves rather delicate interactions with legitimizing
references acquired in RCU mode - that's what __legitimize_mnt()
and MNT_DOOMED is about and documentation is... lacking, to
put it mildly.  It's scattered over a bunch of commit messages,
and incomplete even there.

I'll post when I get the documentation of that stuff into more or
less usable shape.  IMO it should go into Documentation/filesystems/*,
but it'll obviously need a review first...

      parent reply	other threads:[~2022-02-21 19:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21  3:02 [RFC] umount/__detach_mounts() race Al Viro
2022-02-21  5:04 ` Al Viro
2022-02-21  5:19   ` Al Viro
2022-02-21 16:46   ` Eric W. Biederman
2022-02-21 18:31     ` Al Viro
2022-02-21 17:04 ` Eric W. Biederman
2022-02-21 19:53 ` Al Viro [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YhPtoCCjMCPDBzaz@zeniv-ca.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=ebiederm@xmission.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.