[RFC] separate the internal mount flags from the rest

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] separate the internal mount flags from the rest
@ 2025-06-03 20:47 Al Viro
  2025-06-03 23:17 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Al Viro @ 2025-06-03 20:47 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Eric W. Biederman, Christian Brauner, Linus Torvalds

	Currently we use ->mnt_flags for all kinds of things, including
the ones only fs/{namespace,pnode}.c care about.

	That wouldn't be a problem if not for the locking.  ->mnt_flags is
protected by mount_lock.  All writers MUST grab that.  Having lockless
readers is unsurprising - after all, for something like noexec we want
the current state of mount, whatever it happens to be.  If userland
remounts something noexec and that races with execve(2), there's nothing
to be done - it is a race, but not the kernel one.

	However, for a bunch of flags we rely upon the fact that all
changes in one of those are always done under namespace_sem (exclusive).
It's not a lockless read - we do depend upon namespace_sem (shared) being
sufficient to stabilize those particular bits.	MNT_SHARED is one such.
Note that this flag is a part of propagation graph representation and
namespace_sem is what protects the entire thing, so setting or clearing
it under mount_lock alone would be a very bad idea.

Since MNT_SHARED sits in ->mnt_flags, we have to take mount_lock to set or
clear it, leading to places like this:
        if (IS_MNT_SHARED(from)) {
		to->mnt_group_id = from->mnt_group_id;
		list_add(&to->mnt_share, &from->mnt_share);
		lock_mount_hash();
		set_mnt_shared(to);
		unlock_mount_hash();
	}
Non-empty ->mnt_share on a mount without MNT_SHARED would be a hard bug;
these changes belong together.  Incidentally, lock_mount_hash() is an
overkill here - read_seqlock_excl(&mount_lock) would be better...

The rules would be easier to follow if we took MNT_SHARED, MNT_UNBINDABLE,
MNT_MARKED and possibly MNT_LOCKED and MNT_LOCK_* to separate field, protected
by namespace_sem alone (MNT_LOCKED - depending upon how the path_parent() mess
settles).

Yes, it's 4 bytes added into struct mount.  However, this
        int mnt_id;                     /* mount identifier, reused */
	u64 mnt_id_unique;              /* mount ID unique until reboot */
	int mnt_group_id;               /* peer group identifier */
	int mnt_expiry_mark;            /* true if marked for expiry */
is preceded and followed by a pointer, so we already have a gap there,
_and_ there are other pending changes that kill ->mnt_umounting.  So the
size of struct mount won't grow even on 32bit and would actually go down
on 64bit.

Objections?  The thing I really want is clear locking rules for that
stuff.  Reduced contention on mount_lock and fewer increments of its
seqcount component wouldn't hurt either, but that's secondary...


PS: despite being strictly namespace.c-internal, two flags must stay in
->mnt_flags - MNT_DOOMED and MNT_SYNC_UMOUNT - __legitimize_mnt() needs
them there.  Could be folded together with a bit of massage, though, but
that's a separate story...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] separate the internal mount flags from the rest
  2025-06-03 20:47 [RFC] separate the internal mount flags from the rest Al Viro
@ 2025-06-03 23:17 ` Linus Torvalds
  2025-06-04  7:36 ` Christian Brauner
  2025-06-07 11:48 ` Eric W. Biederman
  2 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2025-06-03 23:17 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Eric W. Biederman, Christian Brauner

On Tue, 3 Jun 2025 at 13:47, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Objections?  The thing I really want is clear locking rules for that
> stuff.  Reduced contention on mount_lock and fewer increments of its
> seqcount component wouldn't hurt either, but that's secondary...

No objection at all, as long as you also change the name of the flag.
Which I assume you were planning on doing anyway.

We've occasionally had various flags that had the same "namespace"
prefix but were split across multiple fields, and it's always painful
and very confusing.

            Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] separate the internal mount flags from the rest
  2025-06-03 20:47 [RFC] separate the internal mount flags from the rest Al Viro
  2025-06-03 23:17 ` Linus Torvalds
@ 2025-06-04  7:36 ` Christian Brauner
  2025-06-07 11:48 ` Eric W. Biederman
  2 siblings, 0 replies; 4+ messages in thread
From: Christian Brauner @ 2025-06-04  7:36 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Eric W. Biederman, Linus Torvalds

On Tue, Jun 03, 2025 at 09:47:04PM +0100, Al Viro wrote:
> 	Currently we use ->mnt_flags for all kinds of things, including
> the ones only fs/{namespace,pnode}.c care about.
> 
> 	That wouldn't be a problem if not for the locking.  ->mnt_flags is
> protected by mount_lock.  All writers MUST grab that.  Having lockless
> readers is unsurprising - after all, for something like noexec we want
> the current state of mount, whatever it happens to be.  If userland
> remounts something noexec and that races with execve(2), there's nothing
> to be done - it is a race, but not the kernel one.
> 
> 	However, for a bunch of flags we rely upon the fact that all
> changes in one of those are always done under namespace_sem (exclusive).
> It's not a lockless read - we do depend upon namespace_sem (shared) being
> sufficient to stabilize those particular bits.	MNT_SHARED is one such.
> Note that this flag is a part of propagation graph representation and
> namespace_sem is what protects the entire thing, so setting or clearing
> it under mount_lock alone would be a very bad idea.
> 
> Since MNT_SHARED sits in ->mnt_flags, we have to take mount_lock to set or
> clear it, leading to places like this:
>         if (IS_MNT_SHARED(from)) {
> 		to->mnt_group_id = from->mnt_group_id;
> 		list_add(&to->mnt_share, &from->mnt_share);
> 		lock_mount_hash();
> 		set_mnt_shared(to);
> 		unlock_mount_hash();
> 	}
> Non-empty ->mnt_share on a mount without MNT_SHARED would be a hard bug;
> these changes belong together.  Incidentally, lock_mount_hash() is an
> overkill here - read_seqlock_excl(&mount_lock) would be better...
> 
> The rules would be easier to follow if we took MNT_SHARED, MNT_UNBINDABLE,
> MNT_MARKED and possibly MNT_LOCKED and MNT_LOCK_* to separate field, protected
> by namespace_sem alone (MNT_LOCKED - depending upon how the path_parent() mess
> settles).

Agreed.

> 
> Yes, it's 4 bytes added into struct mount.  However, this
>         int mnt_id;                     /* mount identifier, reused */
> 	u64 mnt_id_unique;              /* mount ID unique until reboot */
> 	int mnt_group_id;               /* peer group identifier */
> 	int mnt_expiry_mark;            /* true if marked for expiry */
> is preceded and followed by a pointer, so we already have a gap there,
> _and_ there are other pending changes that kill ->mnt_umounting.  So the
> size of struct mount won't grow even on 32bit and would actually go down
> on 64bit.

I'm not at all worried about the size in this case.

> 
> Objections?  The thing I really want is clear locking rules for that
> stuff.  Reduced contention on mount_lock and fewer increments of its
> seqcount component wouldn't hurt either, but that's secondary...

Seems good to me!

> PS: despite being strictly namespace.c-internal, two flags must stay in
> ->mnt_flags - MNT_DOOMED and MNT_SYNC_UMOUNT - __legitimize_mnt() needs

Sure.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] separate the internal mount flags from the rest
  2025-06-03 20:47 [RFC] separate the internal mount flags from the rest Al Viro
  2025-06-03 23:17 ` Linus Torvalds
  2025-06-04  7:36 ` Christian Brauner
@ 2025-06-07 11:48 ` Eric W. Biederman
  2 siblings, 0 replies; 4+ messages in thread
From: Eric W. Biederman @ 2025-06-07 11:48 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christian Brauner, Linus Torvalds

Al Viro <viro@zeniv.linux.org.uk> writes:

> 	Currently we use ->mnt_flags for all kinds of things, including
> the ones only fs/{namespace,pnode}.c care about.
>
> 	That wouldn't be a problem if not for the locking.  ->mnt_flags is
> protected by mount_lock.  All writers MUST grab that.  Having lockless
> readers is unsurprising - after all, for something like noexec we want
> the current state of mount, whatever it happens to be.  If userland
> remounts something noexec and that races with execve(2), there's nothing
> to be done - it is a race, but not the kernel one.

In general I don't have a problem.

I see one possible complication I haven't seen mentioned.
Which mnt_flags get exposed to userspace?

I know it was the expectation that MNT_LOCKED would be visible to
userspace, as it's effects are visible to userspace (it isn't just some
accounting thing).

I may just be objecting to the name of the macro MNT_INTERNAL_FLAGS,
and the making of the decision about flags based on the
internal/external state.

Your patch to change make MNT_LOCKED an internal flag in combination
with this proposed change make me wonder about our classification
of flags as internal/external and how much that matters.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-07 12:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 20:47 [RFC] separate the internal mount flags from the rest Al Viro
2025-06-03 23:17 ` Linus Torvalds
2025-06-04  7:36 ` Christian Brauner
2025-06-07 11:48 ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.