linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, ebiederm@xmission.com,
	jack@suse.cz,  torvalds@linux-foundation.org
Subject: Re: [PATCH 11/26] sanitize handling of long-term internal mounts
Date: Wed, 11 Jun 2025 12:56:07 +0200	[thread overview]
Message-ID: <20250611-minus-zugfahrt-d6e68d933f0f@brauner> (raw)
In-Reply-To: <20250610082148.1127550-11-viro@zeniv.linux.org.uk>

On Tue, Jun 10, 2025 at 09:21:33AM +0100, Al Viro wrote:
> Original rationale for those had been the reduced cost of mntput()
> for the stuff that is mounted somewhere.  Mount refcount increments and
> decrements are frequent; what's worse, they tend to concentrate on the
> same instances and cacheline pingpong is quite noticable.
> 
> As the result, mount refcounts are per-cpu; that allows a very cheap
> increment.  Plain decrement would be just as easy, but decrement-and-test
> is anything but (we need to add the components up, with exclusion against
> possible increment-from-zero, etc.).
> 
> Fortunately, there is a very common case where we can tell that decrement
> won't be the final one - if the thing we are dropping is currently
> mounted somewhere.  We have an RCU delay between the removal from mount
> tree and dropping the reference that used to pin it there, so we can
> just take rcu_read_lock() and check if the victim is mounted somewhere.
> If it is, we can go ahead and decrement without and further checks -
> the reference we are dropping is not the last one.  If it isn't, we
> get all the fun with locking, carefully adding up components, etc.,
> but the majority of refcount decrements end up taking the fast path.
> 
> There is a major exception, though - pipes and sockets.  Those live
> on the internal filesystems that are not going to be mounted anywhere.
> They are not going to be _un_mounted, of course, so having to take the
> slow path every time a pipe or socket gets closed is really obnoxious.
> Solution had been to mark them as long-lived ones - essentially faking
> "they are mounted somewhere" indicator.
> 
> With minor modification that works even for ones that do eventually get
> dropped - all it takes is making sure we have an RCU delay between
> clearing the "mounted somewhere" indicator and dropping the reference.
> 
> There are some additional twists (if you want to drop a dozen of such
> internal mounts, you'd be better off with clearing the indicator on
> all of them, doing an RCU delay once, then dropping the references),
> but in the basic form it had been
> 	* use kern_mount() if you want your internal mount to be
> a long-term one.
> 	* use kern_unmount() to undo that.
> 
> Unfortunately, the things did rot a bit during the mount API reshuffling.
> In several cases we have lost the "fake the indicator" part; kern_unmount()
> on the unmount side remained (it doesn't warn if you use it on a mount
> without the indicator), but all benefits regaring mntput() cost had been
> lost.
> 
> To get rid of that bitrot, let's add a new helper that would work
> with fs_context-based API: fc_mount_longterm().  It's a counterpart
> of fc_mount() that does, on success, mark its result as long-term.
> It must be paired with kern_unmount() or equivalents.
> 
> Converted:
> 	1) mqueue (it used to use kern_mount_data() and the umount side
> is still as it used to be)
> 	2) hugetlbfs (used to use kern_mount_data(), internal mount is
> never unmounted in this one)
> 	3) i915 gemfs (used to be kern_mount() + manual remount to set
> options, still uses kern_unmount() on umount side)
> 	4) v3d gemfs (copied from i915)
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

  reply	other threads:[~2025-06-11 10:56 UTC|newest]

Thread overview: 175+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-10  8:17 [PATCHES][RFC][CFR] mount-related stuff Al Viro
2025-06-10  8:21 ` [PATCH 01/26] copy_tree(): don't set ->mnt_mountpoint on the root of copy Al Viro
2025-06-10  8:21   ` [PATCH 02/26] constify mnt_has_parent() Al Viro
2025-06-11 10:26     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 03/26] pnode: lift peers() into pnode.h Al Viro
2025-06-11 10:29     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 04/26] new predicate: mount_is_ancestor() Al Viro
2025-06-11 10:32     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 05/26] constify is_local_mountpoint() Al Viro
2025-06-11 10:32     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 06/26] new predicate: anon_ns_root(mount) Al Viro
2025-06-11 10:39     ` Christian Brauner
2025-06-11 17:57       ` Al Viro
2025-06-10  8:21   ` [PATCH 07/26] dissolve_on_fput(): use anon_ns_root() Al Viro
2025-06-11 10:41     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 08/26] don't set MNT_LOCKED on parentless mounts Al Viro
2025-06-11 10:49     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 09/26] clone_mnt(): simplify the propagation-related logics Al Viro
2025-06-11 10:53     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 10/26] do_umount(): simplify the "is it still mounted" checks Al Viro
2025-06-11 10:54     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 11/26] sanitize handling of long-term internal mounts Al Viro
2025-06-11 10:56     ` Christian Brauner [this message]
2025-06-10  8:21   ` [PATCH 12/26] Rewrite of propagate_umount() Al Viro
2025-06-11 10:56     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 13/26] attach_mnt(): expand in attach_recursive_mnt(), then lose the flag argument Al Viro
2025-06-11 10:59     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 14/26] do_move_mount(): take dropping the old mountpoint into attach_recursive_mnt() Al Viro
2025-06-11 10:59     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 15/26] get rid of mnt_set_mountpoint_beneath() Al Viro
2025-06-11 11:01     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 16/26] make commit_tree() usable in same-namespace move case Al Viro
2025-06-11 11:03     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 17/26] attach_recursive_mnt(): unify the mnt_change_mountpoint() logics Al Viro
2025-06-11 11:05     ` Christian Brauner
2025-06-11 18:12       ` Al Viro
2025-06-12 12:08         ` Christian Brauner
2025-06-10  8:21   ` [PATCH 18/26] attach_recursive_mnt(): pass destination mount in all cases Al Viro
2025-06-11 11:07     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 19/26] attach_recursive_mnt(): get rid of flags entirely Al Viro
2025-06-11 11:08     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 20/26] do_move_mount(): get rid of 'attached' flag Al Viro
2025-06-11 11:08     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 21/26] attach_recursive_mnt(): remove from expiry list on move Al Viro
2025-06-11 11:09     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 22/26] take ->mnt_expire handling under mount_lock [read_seqlock_excl] Al Viro
2025-06-11 11:11     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 23/26] pivot_root(): reorder tree surgeries, collapse unhash_mnt() and put_mountpoint() Al Viro
2025-06-11 11:11     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 24/26] combine __put_mountpoint() with unhash_mnt() Al Viro
2025-06-11 11:12     ` Christian Brauner
2025-06-10  8:21   ` [PATCH 25/26] get rid of mountpoint->m_count Al Viro
2025-06-11 11:19     ` Christian Brauner
2025-06-11 18:47       ` Al Viro
2025-06-16 20:38         ` Al Viro
2025-06-16 21:52           ` Linus Torvalds
2025-06-10  8:21   ` [PATCH 26/26] don't have mounts pin their parents Al Viro
2025-06-11 11:22     ` Christian Brauner
2025-06-16  2:50     ` Ian Kent
2025-06-10 22:30   ` [PATCH 01/26] copy_tree(): don't set ->mnt_mountpoint on the root of copy Eric W. Biederman
2025-06-10 23:14     ` Al Viro
2025-06-11 10:31 ` [PATCHES][RFC][CFR] mount-related stuff Christian Brauner
2025-06-11 17:51   ` Al Viro
2025-06-12 12:09     ` Christian Brauner
2025-06-23  4:49 ` [PATCHES v2][RFC][CFR] " Al Viro
2025-06-23  4:53   ` [PATCH v2 01/35] replace collect_mounts()/drop_collected_mounts() with a safer variant Al Viro
2025-06-23  4:53     ` [PATCH v2 02/35] attach_recursive_mnt(): do not lock the covering tree when sliding something under it Al Viro
2025-06-23  4:53     ` [PATCH v2 03/35] attach_mnt(): expand in attach_recursive_mnt(), then lose the flag argument Al Viro
2025-06-23  4:53     ` [PATCH v2 04/35] get rid of mnt_set_mountpoint_beneath() Al Viro
2025-06-23  4:53     ` [PATCH v2 05/35] prevent mount hash conflicts Al Viro
2025-06-23  4:53     ` [PATCH v2 06/35] copy_tree(): don't set ->mnt_mountpoint on the root of copy Al Viro
2025-06-23  4:54     ` [PATCH v2 07/35] constify mnt_has_parent() Al Viro
2025-06-23  4:54     ` [PATCH v2 08/35] pnode: lift peers() into pnode.h Al Viro
2025-06-23  4:54     ` [PATCH v2 09/35] new predicate: mount_is_ancestor() Al Viro
2025-06-23  4:54     ` [PATCH v2 10/35] constify is_local_mountpoint() Al Viro
2025-06-23  4:54     ` [PATCH v2 11/35] new predicate: anon_ns_root(mount) Al Viro
2025-06-23  4:54     ` [PATCH v2 12/35] dissolve_on_fput(): use anon_ns_root() Al Viro
2025-06-23  4:54     ` [PATCH v2 13/35] __attach_mnt(): lose the second argument Al Viro
2025-06-23  4:54     ` [PATCH v2 14/35] don't set MNT_LOCKED on parentless mounts Al Viro
2025-06-23  4:54     ` [PATCH v2 15/35] clone_mnt(): simplify the propagation-related logics Al Viro
2025-06-23  4:54     ` [PATCH v2 16/35] do_umount(): simplify the "is it still mounted" checks Al Viro
2025-06-23  4:54     ` [PATCH v2 17/35] sanitize handling of long-term internal mounts Al Viro
2025-06-23 16:18       ` Linus Torvalds
2025-06-23 17:03         ` Al Viro
2025-06-23 18:21           ` Linus Torvalds
2025-06-28  7:58           ` [RFC] vfs_parse_fs_string() calling conventions change (was Re: [PATCH v2 17/35] sanitize handling of long-term internal mounts) Al Viro
2025-06-28 16:28             ` Al Viro
2025-06-29 17:47               ` Al Viro
2025-06-28 17:41             ` Linus Torvalds
2025-06-30 15:19           ` David Howells
2025-06-30 16:55             ` Al Viro
2025-06-30 17:04               ` Linus Torvalds
2025-06-23  4:54     ` [PATCH v2 18/35] Rewrite of propagate_umount() Al Viro
2025-06-23  4:54     ` [PATCH v2 19/35] make commit_tree() usable in same-namespace move case Al Viro
2025-06-23  4:54     ` [PATCH v2 20/35] attach_recursive_mnt(): unify the mnt_change_mountpoint() logics Al Viro
2025-06-23  4:54     ` [PATCH v2 21/35] attach_recursive_mnt(): pass destination mount in all cases Al Viro
2025-06-23  4:54     ` [PATCH v2 22/35] attach_recursive_mnt(): get rid of flags entirely Al Viro
2025-06-23  4:54     ` [PATCH v2 23/35] do_move_mount(): take dropping the old mountpoint into attach_recursive_mnt() Al Viro
2025-06-23  4:54     ` [PATCH v2 24/35] do_move_mount(): get rid of 'attached' flag Al Viro
2025-06-23  4:54     ` [PATCH v2 25/35] attach_recursive_mnt(): remove from expiry list on move Al Viro
2025-06-23  4:54     ` [PATCH v2 26/35] take ->mnt_expire handling under mount_lock [read_seqlock_excl] Al Viro
2025-06-23  4:54     ` [PATCH v2 27/35] pivot_root(): reorder tree surgeries, collapse unhash_mnt() and put_mountpoint() Al Viro
2025-06-23  4:54     ` [PATCH v2 28/35] combine __put_mountpoint() with unhash_mnt() Al Viro
2025-06-23  4:54     ` [PATCH v2 29/35] get rid of mountpoint->m_count Al Viro
2025-06-23  4:54     ` [PATCH v2 30/35] don't have mounts pin their parents Al Viro
2025-06-23  4:54     ` [PATCH v2 31/35] copy_tree(): don't link the mounts via mnt_list Al Viro
2025-06-23  4:54     ` [PATCH v2 32/35] mount: separate the flags accessed only under namespace_sem Al Viro
2025-06-23  4:54     ` [PATCH v2 33/35] propagate_one(): get rid of dest_master Al Viro
2025-06-23  4:54     ` [PATCH v2 34/35] propagate_mnt(): get rid of globals Al Viro
2025-06-23  4:54     ` [PATCH v2 35/35] take freeing of emptied mnt_namespace to namespace_unlock() Al Viro
2025-06-23 15:10     ` [PATCH v2 01/35] replace collect_mounts()/drop_collected_mounts() with a safer variant Al Viro
2025-06-23  9:06   ` [PATCHES v2][RFC][CFR] mount-related stuff Ian Kent
2025-06-23 18:55     ` Al Viro
2025-06-24  6:48       ` Ian Kent
2025-06-24  7:05         ` Al Viro
2025-06-24 11:03           ` Ian Kent
2025-06-25  7:57         ` Al Viro
2025-06-25 10:58           ` Ian Kent
2025-06-27  3:03             ` Ian Kent
2025-06-30  2:51   ` [PATCHES v3][RFC][CFR] " Al Viro
2025-06-30  2:52     ` [PATCH v3 01/48] attach_mnt(): expand in attach_recursive_mnt(), then lose the flag argument Al Viro
2025-06-30  2:52       ` [PATCH v3 02/48] get rid of mnt_set_mountpoint_beneath() Al Viro
2025-06-30  2:52       ` [PATCH v3 03/48] prevent mount hash conflicts Al Viro
2025-06-30  2:52       ` [PATCH v3 04/48] copy_tree(): don't set ->mnt_mountpoint on the root of copy Al Viro
2025-06-30  2:52       ` [PATCH v3 05/48] constify mnt_has_parent() Al Viro
2025-06-30  2:52       ` [PATCH v3 06/48] pnode: lift peers() into pnode.h Al Viro
2025-06-30  2:52       ` [PATCH v3 07/48] new predicate: mount_is_ancestor() Al Viro
2025-06-30  2:52       ` [PATCH v3 08/48] constify is_local_mountpoint() Al Viro
2025-06-30  2:52       ` [PATCH v3 09/48] new predicate: anon_ns_root(mount) Al Viro
2025-06-30  2:52       ` [PATCH v3 10/48] dissolve_on_fput(): use anon_ns_root() Al Viro
2025-06-30  2:52       ` [PATCH v3 11/48] __attach_mnt(): lose the second argument Al Viro
2025-06-30  2:52       ` [PATCH v3 12/48] don't set MNT_LOCKED on parentless mounts Al Viro
2025-06-30  2:52       ` [PATCH v3 13/48] clone_mnt(): simplify the propagation-related logics Al Viro
2025-06-30  2:52       ` [PATCH v3 14/48] do_umount(): simplify the "is it still mounted" checks Al Viro
2025-06-30  2:52       ` [PATCH v3 15/48] sanitize handling of long-term internal mounts Al Viro
2025-06-30  2:52       ` [PATCH v3 16/48] Rewrite of propagate_umount() Al Viro
2025-06-30  2:52       ` [PATCH v3 17/48] make commit_tree() usable in same-namespace move case Al Viro
2025-06-30  2:52       ` [PATCH v3 18/48] attach_recursive_mnt(): unify the mnt_change_mountpoint() logics Al Viro
2025-06-30  2:52       ` [PATCH v3 19/48] attach_recursive_mnt(): pass destination mount in all cases Al Viro
2025-06-30  2:52       ` [PATCH v3 20/48] attach_recursive_mnt(): get rid of flags entirely Al Viro
2025-06-30  2:52       ` [PATCH v3 21/48] do_move_mount(): take dropping the old mountpoint into attach_recursive_mnt() Al Viro
2025-06-30  2:52       ` [PATCH v3 22/48] do_move_mount(): get rid of 'attached' flag Al Viro
2025-06-30  2:52       ` [PATCH v3 23/48] attach_recursive_mnt(): remove from expiry list on move Al Viro
2025-06-30  2:52       ` [PATCH v3 24/48] take ->mnt_expire handling under mount_lock [read_seqlock_excl] Al Viro
2025-06-30  2:52       ` [PATCH v3 25/48] pivot_root(): reorder tree surgeries, collapse unhash_mnt() and put_mountpoint() Al Viro
2025-06-30  2:52       ` [PATCH v3 26/48] combine __put_mountpoint() with unhash_mnt() Al Viro
2025-06-30  2:52       ` [PATCH v3 27/48] get rid of mountpoint->m_count Al Viro
2025-06-30  2:52       ` [PATCH v3 28/48] don't have mounts pin their parents Al Viro
2025-06-30  2:52       ` [PATCH v3 29/48] mount: separate the flags accessed only under namespace_sem Al Viro
2025-06-30  2:52       ` [PATCH v3 30/48] propagate_one(): get rid of dest_master Al Viro
2025-06-30  2:52       ` [PATCH v3 31/48] propagate_mnt(): handle all peer groups in the same loop Al Viro
2025-06-30  2:52       ` [PATCH v3 32/48] propagate_one(): separate the "do we need secondary here?" logics Al Viro
2025-06-30  2:52       ` [PATCH v3 33/48] propagate_one(): separate the "what should be the master for this copy" part Al Viro
2025-06-30  2:52       ` [PATCH v3 34/48] propagate_one(): fold into the sole caller Al Viro
2025-06-30  2:52       ` [PATCH v3 35/48] fs/pnode.c: get rid of globals Al Viro
2025-06-30  2:52       ` [PATCH v3 36/48] propagate_mnt(): get rid of last_dest Al Viro
2025-06-30  2:52       ` [PATCH v3 37/48] propagate_mnt(): fix comment and convert to kernel-doc, while we are at it Al Viro
2025-06-30  2:52       ` [PATCH v3 38/48] change_mnt_propagation() cleanups, step 1 Al Viro
2025-06-30  2:52       ` [PATCH v3 39/48] change_mnt_propagation(): do_make_slave() is a no-op unless IS_MNT_SHARED() Al Viro
2025-06-30  2:52       ` [PATCH v3 40/48] do_make_slave(): choose new master sanely Al Viro
2025-06-30  2:52       ` [PATCH v3 41/48] turn do_make_slave() into transfer_propagation() Al Viro
2025-06-30  2:52       ` [PATCH v3 42/48] mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node Al Viro
2025-06-30  2:52       ` [PATCH v3 43/48] change_mnt_propagation(): move ->mnt_master assignment into MS_SLAVE case Al Viro
2025-06-30  2:52       ` [PATCH v3 44/48] copy_tree(): don't link the mounts via mnt_list Al Viro
2025-08-13  6:45         ` Lai, Yi
2025-08-13  7:13           ` Al Viro
2025-08-13  7:32             ` Al Viro
2025-08-14 23:21               ` Al Viro
2025-08-14 23:25                 ` Al Viro
2025-08-15  3:19                 ` Lai, Yi
2025-06-30  2:52       ` [PATCH v3 45/48] take freeing of emptied mnt_namespace to namespace_unlock() Al Viro
2025-06-30  2:52       ` [PATCH v3 46/48] get rid of CL_SHARE_TO_SLAVE Al Viro
2025-06-30  2:52       ` [PATCH v3 47/48] invent_group_ids(): zero ->mnt_group_id always implies !IS_MNT_SHARED() Al Viro
2025-06-30  2:52       ` [PATCH v3 48/48] statmount_mnt_basic(): simplify the logics for group id Al Viro
2025-07-02 19:29     ` [PATCHES v3][RFC][CFR] mount-related stuff Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250611-minus-zugfahrt-d6e68d933f0f@brauner \
    --to=brauner@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).