Re: do_change_type(): refuse to operate on unmounted/not ours mounts

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
@ 2025-07-24 20:02 Andrei Vagin
  2025-07-24 23:00 ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Andrei Vagin @ 2025-07-24 20:02 UTC (permalink / raw)
  To: Al Viro, Christian Brauner; +Cc: linux-fsdevel, LKML, criu

Hi Al and Christian,

The commit 12f147ddd6de ("do_change_type(): refuse to operate on
unmounted/not ours mounts") introduced an ABI backward compatibility
break. CRIU depends on the previous behavior, and users are now
reporting criu restore failures following the kernel update. This change
has been propagated to stable kernels. Is this check strictly required?
Would it be possible to check only if the current process has
CAP_SYS_ADMIN within the mount user namespace?

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-24 20:02 do_change_type(): refuse to operate on unmounted/not ours mounts Andrei Vagin
@ 2025-07-24 23:00 ` Al Viro
  2025-07-24 23:38   ` Andrei Vagin
  2025-07-26 17:12   ` Andrei Vagin
  0 siblings, 2 replies; 18+ messages in thread
From: Al Viro @ 2025-07-24 23:00 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: Christian Brauner, linux-fsdevel, LKML, criu

On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> Hi Al and Christian,
> 
> The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> unmounted/not ours mounts") introduced an ABI backward compatibility
> break. CRIU depends on the previous behavior, and users are now
> reporting criu restore failures following the kernel update. This change
> has been propagated to stable kernels. Is this check strictly required?

Yes.

> Would it be possible to check only if the current process has
> CAP_SYS_ADMIN within the mount user namespace?

Not enough, both in terms of permissions *and* in terms of "thou
shalt not bugger the kernel data structures - nobody's priveleged
enough for that".

What the hell is CRIU trying to do there?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-24 23:00 ` Al Viro
@ 2025-07-24 23:38   ` Andrei Vagin
  2025-07-26 17:12   ` Andrei Vagin
  1 sibling, 0 replies; 18+ messages in thread
From: Andrei Vagin @ 2025-07-24 23:38 UTC (permalink / raw)
  To: Al Viro; +Cc: Christian Brauner, linux-fsdevel, LKML, criu

On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > Hi Al and Christian,
> >
> > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > unmounted/not ours mounts") introduced an ABI backward compatibility
> > break. CRIU depends on the previous behavior, and users are now
> > reporting criu restore failures following the kernel update. This change
> > has been propagated to stable kernels. Is this check strictly required?
>
> Yes.
>
> > Would it be possible to check only if the current process has
> > CAP_SYS_ADMIN within the mount user namespace?
>
> Not enough, both in terms of permissions *and* in terms of "thou
> shalt not bugger the kernel data structures - nobody's priveleged
> enough for that".
>
> What the hell is CRIU trying to do there?

As usual, CRIU's doing some kind of ritualistic dance to restore a
container's state. In this specific scenario, it's about restoring a
mount tree across multiple mount namespaces. Fixing this
particular issue within CRIU isn't a big deal, the challenge is in
propagating this fix to all affected users. Given that the kernel change
has already been merged into stable branches, CRIU will stop
working for most users.

The criu fix is here:
https://github.com/checkpoint-restore/criu/pull/2695/commits/e91d74a27b723d4dd1f9aceb83601b1b8c2b50a7

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-24 23:00 ` Al Viro
  2025-07-24 23:38   ` Andrei Vagin
@ 2025-07-26 17:12   ` Andrei Vagin
  2025-07-26 17:53     ` Al Viro
  1 sibling, 1 reply; 18+ messages in thread
From: Andrei Vagin @ 2025-07-26 17:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Christian Brauner, linux-fsdevel, LKML, criu, Linux API, stable

On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > Hi Al and Christian,
> >
> > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > unmounted/not ours mounts") introduced an ABI backward compatibility
> > break. CRIU depends on the previous behavior, and users are now
> > reporting criu restore failures following the kernel update. This change
> > has been propagated to stable kernels. Is this check strictly required?
>
> Yes.
>
> > Would it be possible to check only if the current process has
> > CAP_SYS_ADMIN within the mount user namespace?
>
> Not enough, both in terms of permissions *and* in terms of "thou
> shalt not bugger the kernel data structures - nobody's priveleged
> enough for that".

Al,

I am still thinking in terms of "Thou shalt not break userspace"...

Seriously though, this original behavior has been in the kernel for 20
years, and it hasn't triggered any corruptions in all that time. I
understand this change might be necessary in its current form, and
that some collateral damage could be unavoidable. But if that's the
case, I'd expect a detailed explanation of why it had to be so and why
userspace breakage is unavoidable.

The original change was merged two decades ago. We need to
consider that some applications might rely on that behavior. I'm not
questioning the security aspect - that must be addressed. But for
anything else, we need to minimize the impact on user applications that
don't violate security.

We can consider a cleaner fix for the upstream kernel, but when we are
talking about stable kernels, the user-space backward compatibility
aspect should be even more critical.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-26 17:12   ` Andrei Vagin
@ 2025-07-26 17:53     ` Al Viro
  2025-07-26 21:01       ` Andrei Vagin
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-07-26 17:53 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Christian Brauner, linux-fsdevel, LKML, criu, Linux API, stable

On Sat, Jul 26, 2025 at 10:12:34AM -0700, Andrei Vagin wrote:
> On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > > Hi Al and Christian,
> > >
> > > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > > unmounted/not ours mounts") introduced an ABI backward compatibility
> > > break. CRIU depends on the previous behavior, and users are now
> > > reporting criu restore failures following the kernel update. This change
> > > has been propagated to stable kernels. Is this check strictly required?
> >
> > Yes.
> >
> > > Would it be possible to check only if the current process has
> > > CAP_SYS_ADMIN within the mount user namespace?
> >
> > Not enough, both in terms of permissions *and* in terms of "thou
> > shalt not bugger the kernel data structures - nobody's priveleged
> > enough for that".
> 
> Al,
> 
> I am still thinking in terms of "Thou shalt not break userspace"...
> 
> Seriously though, this original behavior has been in the kernel for 20
> years, and it hasn't triggered any corruptions in all that time.

For a very mild example of fun to be had there:
	mount("none", "/mnt", "tmpfs", 0, "");
	chdir("/mnt");
	umount2(".", MNT_DETACH);
	mount(NULL, ".", NULL, MS_SHARED, NULL);
Repeat in a loop, watch mount group id leak.  That's a trivial example
of violating the assertion ("a mount that had been through umount_tree()
is out of propagation graph and related data structures for good").

As for the "CAP_SYS_ADMIN within the mount user namespace" - which
userns do you have in mind?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-26 17:53     ` Al Viro
@ 2025-07-26 21:01       ` Andrei Vagin
  2025-07-31  2:40         ` Pavel Tikhomirov
  2025-08-13 18:56         ` Al Viro
  0 siblings, 2 replies; 18+ messages in thread
From: Andrei Vagin @ 2025-07-26 21:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrei Vagin, Christian Brauner, linux-fsdevel, LKML, criu,
	Linux API, stable

On Sat, Jul 26, 2025 at 10:53 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Jul 26, 2025 at 10:12:34AM -0700, Andrei Vagin wrote:
> > On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > > > Hi Al and Christian,
> > > >
> > > > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > > > unmounted/not ours mounts") introduced an ABI backward compatibility
> > > > break. CRIU depends on the previous behavior, and users are now
> > > > reporting criu restore failures following the kernel update. This change
> > > > has been propagated to stable kernels. Is this check strictly required?
> > >
> > > Yes.
> > >
> > > > Would it be possible to check only if the current process has
> > > > CAP_SYS_ADMIN within the mount user namespace?
> > >
> > > Not enough, both in terms of permissions *and* in terms of "thou
> > > shalt not bugger the kernel data structures - nobody's priveleged
> > > enough for that".
> >
> > Al,
> >
> > I am still thinking in terms of "Thou shalt not break userspace"...
> >
> > Seriously though, this original behavior has been in the kernel for 20
> > years, and it hasn't triggered any corruptions in all that time.
>
> For a very mild example of fun to be had there:
>         mount("none", "/mnt", "tmpfs", 0, "");
>         chdir("/mnt");
>         umount2(".", MNT_DETACH);
>         mount(NULL, ".", NULL, MS_SHARED, NULL);
> Repeat in a loop, watch mount group id leak.  That's a trivial example
> of violating the assertion ("a mount that had been through umount_tree()
> is out of propagation graph and related data structures for good").

I wasn't referring to detached mounts. CRIU modifies mounts from
non-current namespaces.

>
> As for the "CAP_SYS_ADMIN within the mount user namespace" - which
> userns do you have in mind?
>

The user namespace of the target mount:
ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-26 21:01       ` Andrei Vagin
@ 2025-07-31  2:40         ` Pavel Tikhomirov
  2025-07-31  7:53           ` Christian Brauner
  2025-08-13 18:56         ` Al Viro
  1 sibling, 1 reply; 18+ messages in thread
From: Pavel Tikhomirov @ 2025-07-31  2:40 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Al Viro, Andrei Vagin, Christian Brauner, linux-fsdevel, LKML,
	criu, Linux API, stable

If detached mounts are our only concern, it looks like the check instead of:

if (!check_mnt(mnt)) {
        err = -EINVAL;
        goto out_unlock;
}

could've been a more relaxed one:

if (mnt_detached(mnt)) {
        err = -EINVAL;
        goto out_unlock;
}

bool mnt_detached(struct mount *mnt)
{
        return !mnt->mnt_ns;
}

not to allow propagation change only on detached mounts. (As
umount_tree sets mnt_ns to NULL.)

Also in do_mount_setattr we have a more relaxed check too:

if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt))
        goto out;

Best Regards, Tikhomirov Pavel.

On Sun, Jul 27, 2025 at 5:01 AM Andrei Vagin <avagin@google.com> wrote:
>
> On Sat, Jul 26, 2025 at 10:53 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Sat, Jul 26, 2025 at 10:12:34AM -0700, Andrei Vagin wrote:
> > > On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > > > > Hi Al and Christian,
> > > > >
> > > > > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > > > > unmounted/not ours mounts") introduced an ABI backward compatibility
> > > > > break. CRIU depends on the previous behavior, and users are now
> > > > > reporting criu restore failures following the kernel update. This change
> > > > > has been propagated to stable kernels. Is this check strictly required?
> > > >
> > > > Yes.
> > > >
> > > > > Would it be possible to check only if the current process has
> > > > > CAP_SYS_ADMIN within the mount user namespace?
> > > >
> > > > Not enough, both in terms of permissions *and* in terms of "thou
> > > > shalt not bugger the kernel data structures - nobody's priveleged
> > > > enough for that".
> > >
> > > Al,
> > >
> > > I am still thinking in terms of "Thou shalt not break userspace"...
> > >
> > > Seriously though, this original behavior has been in the kernel for 20
> > > years, and it hasn't triggered any corruptions in all that time.
> >
> > For a very mild example of fun to be had there:
> >         mount("none", "/mnt", "tmpfs", 0, "");
> >         chdir("/mnt");
> >         umount2(".", MNT_DETACH);
> >         mount(NULL, ".", NULL, MS_SHARED, NULL);
> > Repeat in a loop, watch mount group id leak.  That's a trivial example
> > of violating the assertion ("a mount that had been through umount_tree()
> > is out of propagation graph and related data structures for good").
>
> I wasn't referring to detached mounts. CRIU modifies mounts from
> non-current namespaces.
>
> >
> > As for the "CAP_SYS_ADMIN within the mount user namespace" - which
> > userns do you have in mind?
> >
>
> The user namespace of the target mount:
> ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-31  2:40         ` Pavel Tikhomirov
@ 2025-07-31  7:53           ` Christian Brauner
  2025-07-31  8:11             ` Pavel Tikhomirov
  0 siblings, 1 reply; 18+ messages in thread
From: Christian Brauner @ 2025-07-31  7:53 UTC (permalink / raw)
  To: Pavel Tikhomirov
  Cc: Andrei Vagin, Al Viro, Andrei Vagin, linux-fsdevel, LKML, criu,
	Linux API, stable

On Thu, Jul 31, 2025 at 10:40:40AM +0800, Pavel Tikhomirov wrote:
> If detached mounts are our only concern, it looks like the check instead of:
> 
> if (!check_mnt(mnt)) {
>         err = -EINVAL;
>         goto out_unlock;
> }
> 
> could've been a more relaxed one:
> 
> if (mnt_detached(mnt)) {
>         err = -EINVAL;
>         goto out_unlock;
> }
> 
> bool mnt_detached(struct mount *mnt)
> {
>         return !mnt->mnt_ns;
> }
> 
> not to allow propagation change only on detached mounts. (As
> umount_tree sets mnt_ns to NULL.)

Changing propagation settings on detached mounts is fine and shoud work?
Changing propagation settings on unmounted mounts not so much...

> 
> Also in do_mount_setattr we have a more relaxed check too:
> 
> if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt))
>         goto out;
> 
> Best Regards, Tikhomirov Pavel.
> 
> On Sun, Jul 27, 2025 at 5:01 AM Andrei Vagin <avagin@google.com> wrote:
> >
> > On Sat, Jul 26, 2025 at 10:53 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Sat, Jul 26, 2025 at 10:12:34AM -0700, Andrei Vagin wrote:
> > > > On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > >
> > > > > On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > > > > > Hi Al and Christian,
> > > > > >
> > > > > > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > > > > > unmounted/not ours mounts") introduced an ABI backward compatibility
> > > > > > break. CRIU depends on the previous behavior, and users are now
> > > > > > reporting criu restore failures following the kernel update. This change
> > > > > > has been propagated to stable kernels. Is this check strictly required?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > Would it be possible to check only if the current process has
> > > > > > CAP_SYS_ADMIN within the mount user namespace?
> > > > >
> > > > > Not enough, both in terms of permissions *and* in terms of "thou
> > > > > shalt not bugger the kernel data structures - nobody's priveleged
> > > > > enough for that".
> > > >
> > > > Al,
> > > >
> > > > I am still thinking in terms of "Thou shalt not break userspace"...
> > > >
> > > > Seriously though, this original behavior has been in the kernel for 20
> > > > years, and it hasn't triggered any corruptions in all that time.
> > >
> > > For a very mild example of fun to be had there:
> > >         mount("none", "/mnt", "tmpfs", 0, "");
> > >         chdir("/mnt");
> > >         umount2(".", MNT_DETACH);
> > >         mount(NULL, ".", NULL, MS_SHARED, NULL);
> > > Repeat in a loop, watch mount group id leak.  That's a trivial example
> > > of violating the assertion ("a mount that had been through umount_tree()
> > > is out of propagation graph and related data structures for good").
> >
> > I wasn't referring to detached mounts. CRIU modifies mounts from
> > non-current namespaces.
> >
> > >
> > > As for the "CAP_SYS_ADMIN within the mount user namespace" - which
> > > userns do you have in mind?
> > >
> >
> > The user namespace of the target mount:
> > ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-31  7:53           ` Christian Brauner
@ 2025-07-31  8:11             ` Pavel Tikhomirov
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Tikhomirov @ 2025-07-31  8:11 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrei Vagin, Al Viro, Andrei Vagin, linux-fsdevel, LKML, criu,
	Linux API, stable

On Thu, Jul 31, 2025 at 3:53 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, Jul 31, 2025 at 10:40:40AM +0800, Pavel Tikhomirov wrote:
> > If detached mounts are our only concern, it looks like the check instead of:
> >
> > if (!check_mnt(mnt)) {
> >         err = -EINVAL;
> >         goto out_unlock;
> > }
> >
> > could've been a more relaxed one:
> >
> > if (mnt_detached(mnt)) {
> >         err = -EINVAL;
> >         goto out_unlock;
> > }
> >
> > bool mnt_detached(struct mount *mnt)
> > {
> >         return !mnt->mnt_ns;
> > }
> >
> > not to allow propagation change only on detached mounts. (As
> > umount_tree sets mnt_ns to NULL.)
>
> Changing propagation settings on detached mounts is fine and shoud work?
> Changing propagation settings on unmounted mounts not so much...

Sorry, it's my confused terminology, here by "detached" mounts I mean
mounts which were unmounted with umount2(MNT_DETACH), maybe I should
call them "unmounted" (e.g. s/mnt_detached/mnt_unmounted/).

And yes, I understand why we need to allow changing propagation on
mounts in anonymous namespace without being inside it, because one
can't just enter anonymous namespace.

I don't think that we need to change anything, just sharing my
thoughts that it could be more relaxed and will still protect us from
propagation setting on unmounted mounts.

>
> >
> > Also in do_mount_setattr we have a more relaxed check too:
> >
> > if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt))
> >         goto out;
> >
> > Best Regards, Tikhomirov Pavel.
> >
> > On Sun, Jul 27, 2025 at 5:01 AM Andrei Vagin <avagin@google.com> wrote:
> > >
> > > On Sat, Jul 26, 2025 at 10:53 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > On Sat, Jul 26, 2025 at 10:12:34AM -0700, Andrei Vagin wrote:
> > > > > On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > > >
> > > > > > On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > > > > > > Hi Al and Christian,
> > > > > > >
> > > > > > > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > > > > > > unmounted/not ours mounts") introduced an ABI backward compatibility
> > > > > > > break. CRIU depends on the previous behavior, and users are now
> > > > > > > reporting criu restore failures following the kernel update. This change
> > > > > > > has been propagated to stable kernels. Is this check strictly required?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > Would it be possible to check only if the current process has
> > > > > > > CAP_SYS_ADMIN within the mount user namespace?
> > > > > >
> > > > > > Not enough, both in terms of permissions *and* in terms of "thou
> > > > > > shalt not bugger the kernel data structures - nobody's priveleged
> > > > > > enough for that".
> > > > >
> > > > > Al,
> > > > >
> > > > > I am still thinking in terms of "Thou shalt not break userspace"...
> > > > >
> > > > > Seriously though, this original behavior has been in the kernel for 20
> > > > > years, and it hasn't triggered any corruptions in all that time.
> > > >
> > > > For a very mild example of fun to be had there:
> > > >         mount("none", "/mnt", "tmpfs", 0, "");
> > > >         chdir("/mnt");
> > > >         umount2(".", MNT_DETACH);
> > > >         mount(NULL, ".", NULL, MS_SHARED, NULL);
> > > > Repeat in a loop, watch mount group id leak.  That's a trivial example
> > > > of violating the assertion ("a mount that had been through umount_tree()
> > > > is out of propagation graph and related data structures for good").
> > >
> > > I wasn't referring to detached mounts. CRIU modifies mounts from
> > > non-current namespaces.
> > >
> > > >
> > > > As for the "CAP_SYS_ADMIN within the mount user namespace" - which
> > > > userns do you have in mind?
> > > >
> > >
> > > The user namespace of the target mount:
> > > ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)
> > >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-07-26 21:01       ` Andrei Vagin
  2025-07-31  2:40         ` Pavel Tikhomirov
@ 2025-08-13 18:56         ` Al Viro
  2025-08-13 19:09           ` Tycho Andersen
  1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-08-13 18:56 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Andrei Vagin, Christian Brauner, linux-fsdevel, LKML, criu,
	Linux API, stable

On Sat, Jul 26, 2025 at 02:01:20PM -0700, Andrei Vagin wrote:

> > For a very mild example of fun to be had there:
> >         mount("none", "/mnt", "tmpfs", 0, "");
> >         chdir("/mnt");
> >         umount2(".", MNT_DETACH);
> >         mount(NULL, ".", NULL, MS_SHARED, NULL);
> > Repeat in a loop, watch mount group id leak.  That's a trivial example
> > of violating the assertion ("a mount that had been through umount_tree()
> > is out of propagation graph and related data structures for good").
> 
> I wasn't referring to detached mounts. CRIU modifies mounts from
> non-current namespaces.
> 
> >
> > As for the "CAP_SYS_ADMIN within the mount user namespace" - which
> > userns do you have in mind?
> >
> 
> The user namespace of the target mount:
> ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)

To bring that thread back: how about the following?  If nobody objects,
I'm going to throw it into viro/vfs.git #fixes...

[PATCH] use uniform permission checks for all mount propagation changes

do_change_type() and do_set_group() are operating on different
aspects of the same thing - propagation graph.  The latter
asks for mounts involved to be mounted in namespace(s) the caller
has CAP_SYS_ADMIN for.  The former is a mess - originally it
didn't even check that mount *is* mounted.  That got fixed,
but the resulting check turns out to be too strict for userland -
in effect, we check that mount is in our namespace, having already
checked that we have CAP_SYS_ADMIN there.

What we really need (in both cases) is
	* we only touch mounts that are mounted.  Hard requirement,
data corruption if that's get violated.
	* we don't allow to mess with a namespace unless you already
have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns).

That's an equivalent of what do_set_group() does; let's extract that
into a helper (may_change_propagation()) and use it in both
do_set_group() and do_change_type().

Fixes: 12f147ddd6de "do_change_type(): refuse to operate on unmounted/not ours mounts"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/namespace.c b/fs/namespace.c
index ddfd4457d338..e7d9b23f1e9e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2862,6 +2862,19 @@ static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
 	return attach_recursive_mnt(mnt, p, mp);
 }
 
+static int may_change_propagation(const struct mount *m)
+{
+        struct mnt_namespace *ns = m->mnt_ns;
+
+	 // it must be mounted in some namespace
+	 if (IS_ERR_OR_NULL(ns))         // is_mounted()
+		 return -EINVAL;
+	 // and the caller must be admin in userns of that namespace
+	 if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		 return -EPERM;
+	 return 0;
+}
+
 /*
  * Sanity check the flags to change_mnt_propagation.
  */
@@ -2898,10 +2911,10 @@ static int do_change_type(struct path *path, int ms_flags)
 		return -EINVAL;
 
 	namespace_lock();
-	if (!check_mnt(mnt)) {
-		err = -EINVAL;
+	err = may_change_propagation(mnt);
+	if (err)
 		goto out_unlock;
-	}
+
 	if (type == MS_SHARED) {
 		err = invent_group_ids(mnt, recurse);
 		if (err)
@@ -3347,18 +3360,11 @@ static int do_set_group(struct path *from_path, struct path *to_path)
 
 	namespace_lock();
 
-	err = -EINVAL;
-	/* To and From must be mounted */
-	if (!is_mounted(&from->mnt))
-		goto out;
-	if (!is_mounted(&to->mnt))
-		goto out;
-
-	err = -EPERM;
-	/* We should be allowed to modify mount namespaces of both mounts */
-	if (!ns_capable(from->mnt_ns->user_ns, CAP_SYS_ADMIN))
+	err = may_change_propagation(from);
+	if (err)
 		goto out;
-	if (!ns_capable(to->mnt_ns->user_ns, CAP_SYS_ADMIN))
+	err = may_change_propagation(from);
+	if (err)
 		goto out;
 
 	err = -EINVAL;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-08-13 18:56         ` Al Viro
@ 2025-08-13 19:09           ` Tycho Andersen
  2025-08-13 19:41             ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Tycho Andersen @ 2025-08-13 19:09 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrei Vagin, Andrei Vagin, Christian Brauner, linux-fsdevel,
	LKML, criu, Linux API, stable

On Wed, Aug 13, 2025 at 07:56:01PM +0100, Al Viro wrote:
> @@ -3347,18 +3360,11 @@ static int do_set_group(struct path *from_path, struct path *to_path)
>  
>  	namespace_lock();
>  
> -	err = -EINVAL;
> -	/* To and From must be mounted */
> -	if (!is_mounted(&from->mnt))
> -		goto out;
> -	if (!is_mounted(&to->mnt))
> -		goto out;
> -
> -	err = -EPERM;
> -	/* We should be allowed to modify mount namespaces of both mounts */
> -	if (!ns_capable(from->mnt_ns->user_ns, CAP_SYS_ADMIN))
> +	err = may_change_propagation(from);
> +	if (err)
>  		goto out;
> -	if (!ns_capable(to->mnt_ns->user_ns, CAP_SYS_ADMIN))
> +	err = may_change_propagation(from);

Just driving by, but I guess you mean "to" here.

Tycho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-08-13 19:09           ` Tycho Andersen
@ 2025-08-13 19:41             ` Al Viro
  2025-08-14  4:08               ` Pavel Tikhomirov
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-08-13 19:41 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Andrei Vagin, Andrei Vagin, Christian Brauner, linux-fsdevel,
	LKML, criu, Linux API, stable

On Wed, Aug 13, 2025 at 01:09:27PM -0600, Tycho Andersen wrote:
> On Wed, Aug 13, 2025 at 07:56:01PM +0100, Al Viro wrote:
> > @@ -3347,18 +3360,11 @@ static int do_set_group(struct path *from_path, struct path *to_path)
> >  
> >  	namespace_lock();
> >  
> > -	err = -EINVAL;
> > -	/* To and From must be mounted */
> > -	if (!is_mounted(&from->mnt))
> > -		goto out;
> > -	if (!is_mounted(&to->mnt))
> > -		goto out;
> > -
> > -	err = -EPERM;
> > -	/* We should be allowed to modify mount namespaces of both mounts */
> > -	if (!ns_capable(from->mnt_ns->user_ns, CAP_SYS_ADMIN))
> > +	err = may_change_propagation(from);
> > +	if (err)
> >  		goto out;
> > -	if (!ns_capable(to->mnt_ns->user_ns, CAP_SYS_ADMIN))
> > +	err = may_change_propagation(from);
> 
> Just driving by, but I guess you mean "to" here.

D'oh...  Yes, of course.  Fun question: would our selftests have caught
that?
[checks]
move_mount_set_group_test.c doesn't have anything in that area, nothing in
LTP or xfstests either, AFAICS...  And I don't see anything in
https://github.com/checkpoint-restore/criu
either - there are uses of MOVE_MOUNT_SET_GROUP, but they are well-buried
and I don't see anything in their tests that would even try to poke into
that thing...

Before we go and try to cobble something up, does anybody know of a place
where regression tests for MOVE_MOUNT_SET_GROUP could be picked from?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-08-13 19:41             ` Al Viro
@ 2025-08-14  4:08               ` Pavel Tikhomirov
  2025-08-14  4:42                 ` Al Viro
  2025-08-14  7:07                 ` do_change_type(): refuse to operate on unmounted/not ours mounts Pavel Tikhomirov
  0 siblings, 2 replies; 18+ messages in thread
From: Pavel Tikhomirov @ 2025-08-14  4:08 UTC (permalink / raw)
  To: Al Viro
  Cc: Tycho Andersen, Andrei Vagin, Andrei Vagin, Christian Brauner,
	linux-fsdevel, LKML, criu, Linux API, stable

On Thu, Aug 14, 2025 at 3:41 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Aug 13, 2025 at 01:09:27PM -0600, Tycho Andersen wrote:
> > On Wed, Aug 13, 2025 at 07:56:01PM +0100, Al Viro wrote:
> > > @@ -3347,18 +3360,11 @@ static int do_set_group(struct path *from_path, struct path *to_path)
> > >
> > >     namespace_lock();
> > >
> > > -   err = -EINVAL;
> > > -   /* To and From must be mounted */
> > > -   if (!is_mounted(&from->mnt))
> > > -           goto out;
> > > -   if (!is_mounted(&to->mnt))
> > > -           goto out;
> > > -
> > > -   err = -EPERM;
> > > -   /* We should be allowed to modify mount namespaces of both mounts */
> > > -   if (!ns_capable(from->mnt_ns->user_ns, CAP_SYS_ADMIN))
> > > +   err = may_change_propagation(from);
> > > +   if (err)
> > >             goto out;
> > > -   if (!ns_capable(to->mnt_ns->user_ns, CAP_SYS_ADMIN))
> > > +   err = may_change_propagation(from);
> >
> > Just driving by, but I guess you mean "to" here.
>
> D'oh...  Yes, of course.  Fun question: would our selftests have caught
> that?
> [checks]
> move_mount_set_group_test.c doesn't have anything in that area, nothing in
> LTP or xfstests either, AFAICS...

Yes, selftest is very simple and is not covering userns checks.

>  And I don't see anything in
> https://github.com/checkpoint-restore/criu
> either - there are uses of MOVE_MOUNT_SET_GROUP, but they are well-buried
> and I don't see anything in their tests that would even try to poke into
> that thing...
>
> Before we go and try to cobble something up, does anybody know of a place
> where regression tests for MOVE_MOUNT_SET_GROUP could be picked from?
>

Basically each CRIU test that is run by zdtm (if it is in ns/uns
flavor (which are most of them)), tests mounts checkpoint/restore. And
each test which has shared/slave moutns leads to MOVE_MOUNT_SET_GROUP
being used and thus tested. We have a mountinfo comparison in zdtm
which checks that propagation is topologically the same after c/r.

But, yes, we do not cover userns checks, as in CRIU case, CRIU is
expected to run in userns which has all capabilities over restored
container, and should always pass those checks.

JFYI:

The use of MOVE_MOUNT_SET_GROUP in CRIU is well-buried in:

https://github.com/checkpoint-restore/criu/blob/116e56ba46382c05066d33a8bbadcc495dbdb644/criu/mount-v2.c#L896

  +-< move_mount_set_group
    +-< restore_one_sharing
      +-< restore_one_sharing_group
        +-< restore_mount_sharing_options
          +-< prepare_mnt_ns_v2

This stack already has a set of precreated mounts and walks over their
sharing groups saved in CRIU image files and assigns them accordingly.

And we have a bunch of tests with different sharing configurations to
test propagation c/r specifically:

git grep -l "SHARING\|SLAVE" test/zdtm/static
test/zdtm/static/mnt_ext_auto.c
test/zdtm/static/mnt_ext_master.c
test/zdtm/static/mnt_ext_multiple.c
test/zdtm/static/mnt_root_ext.c
test/zdtm/static/mntns_overmount.c
test/zdtm/static/mntns_shared_bind03.c
test/zdtm/static/mount_complex_sharing.c
test/zdtm/static/mountpoints.c
test/zdtm/static/shared_slave_mount_children.c

It should be enough to run a zdtm test-suit to check that change does
not break something for CRIU (will do).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-08-14  4:08               ` Pavel Tikhomirov
@ 2025-08-14  4:42                 ` Al Viro
  2025-08-14  5:51                   ` [PATCH][RFC][CFT] use uniform permission checks for all mount propagation changes Al Viro
  2025-08-14  7:07                 ` do_change_type(): refuse to operate on unmounted/not ours mounts Pavel Tikhomirov
  1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-08-14  4:42 UTC (permalink / raw)
  To: Pavel Tikhomirov
  Cc: Tycho Andersen, Andrei Vagin, Andrei Vagin, Christian Brauner,
	linux-fsdevel, LKML, criu, Linux API, stable

On Thu, Aug 14, 2025 at 12:08:49PM +0800, Pavel Tikhomirov wrote:

> Yes, selftest is very simple and is not covering userns checks.

FWIW, see below for what I've got here at the moment for MOVE_MOUNT_SET_GROUP;
no tests for cross-filesystem and not-a-subtree yet.  At least it does catch
that braino when run on a kernel that doesn't have it fixed ;-)
No do_change_type() tests either yet...

// link with -lcap, assumes userns enabled
// can run both as root and as regular user
#define _GNU_SOURCE
#include <sched.h>
#include <sys/capability.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <stdbool.h>

_Bool drop_caps(void)
{
        cap_value_t cap_value[] = { CAP_SYS_ADMIN };
        cap_t cap = cap_get_proc();
        if (!cap) {
		perror("cap_get_proc");
		return false;
	}
	return true;
}

void do_unshare(void)
{
	FILE *f;
	uid_t uid = geteuid();
	gid_t gid = getegid();
	unshare(CLONE_NEWNS|CLONE_NEWUSER);
	f = fopen("/proc/self/uid_map", "w");
	fprintf(f, "0 %d 1", uid);
	fclose(f);
	f = fopen("/proc/self/setgroups", "w");
	fprintf(f, "deny");
	fclose(f);
	f = fopen("/proc/self/gid_map", "w");
	fprintf(f, "0 %d 1", gid);
	fclose(f);
	mount(NULL, "/", NULL, MS_REC|MS_PRIVATE, NULL);
}

void bind(char *p)
{
	mount(p, p, NULL, MS_BIND, NULL);
}

void test_it(int fd1, char *p1, int fd2, char *p2, int expected)
{
	int flags = MOVE_MOUNT_SET_GROUP;
	int n;

	if (!p1) {
		p1 = "";
		flags |= MOVE_MOUNT_F_EMPTY_PATH;
	}
	if (!p2) {
		p2 = "";
		flags |= MOVE_MOUNT_T_EMPTY_PATH;
	}
	n = move_mount(fd1, p1, fd2, p2, flags);
	if (!n)
		errno = 0;
	if (expected != errno)
		printf(" failed: %d != %d\n", expected, errno);
	else
		printf(" OK\n");
}

int main()
{
	int pipe1[2], pipe2[2];
	char path[40];
	pid_t child;
	int root_fd;
	char c;

	if (pipe(pipe1) < 0 || pipe(pipe2) < 0) {
		perror("pipe");
		return -1;
	}
	if (!drop_caps())
		return -1;
	do_unshare();

	root_fd = open("/", O_PATH);

	errno = 0;
	mount("none", "/mnt", "tmpfs", 0, NULL);
	mkdir("/mnt/a", 0777);
	mkdir("/mnt/a/private", 0777);
	mkdir("/mnt/a/private/b", 0777);
	mkdir("/mnt/a/shared", 0777);
	mkdir("/mnt/a/slave", 0777);
	mkdir("/mnt/a/shared-slave", 0777);
	mkdir("/mnt/locked", 0777);
	mkdir("/mnt/no-locked", 0777);
	bind("/mnt/locked");

	child = fork();
	if (child < 0) {
		perror("fork");
		return -1;
	} else if (child == 0) {
		do_unshare();
		mount(NULL, "/mnt/", NULL, MS_SHARED, NULL);
		bind("/mnt/a");
		write(pipe1[1], &c, 1);
		fchdir(root_fd);
		read(pipe2[0], &c, 1);
		printf("from should be someplace we have permissions for");
		test_it(AT_FDCWD, "mnt/a", AT_FDCWD, "/mnt/a/private", EPERM);
		printf("to should be someplace we have permissions for");
		test_it(AT_FDCWD, "/mnt/a", AT_FDCWD, "mnt/a/private", EPERM);
		write(pipe1[1], &c, 1);
		return 0;
	}
	read(pipe1[0], &c, 1);
	sprintf(path, "/proc/%d/root", child);
	chdir(path);

	mount(NULL, "/mnt", NULL, MS_SHARED, NULL);
	bind("/mnt/a/private");
	bind("/mnt/a/shared");
	bind("/mnt/a/slave");
	bind("/mnt/a/slave-shared");
	bind("/mnt/no-locked");
	mount(NULL, "/mnt/a/private", NULL, MS_PRIVATE, NULL);
	mount(NULL, "/mnt/a/slave", NULL, MS_SLAVE, NULL);
	mount(NULL, "/mnt/a/shared-slave", NULL, MS_SLAVE, NULL);
	mount(NULL, "/mnt/a/shared-slave", NULL, MS_SHARED, NULL);
	mount(NULL, "/mnt/no-locked", NULL, MS_PRIVATE, NULL);

	printf("from should be mounted (pipes are not)");
	test_it(pipe1[0], NULL, AT_FDCWD, "/mnt/a/private", EINVAL);

	printf("to should be mounted (pipes are not)");
	test_it(AT_FDCWD, "/mnt", pipe1[0], NULL, EINVAL);

	printf("from should be someplace we have permissions for");
	test_it(AT_FDCWD, "mnt/a", AT_FDCWD, "/mnt/a/private", 0);
	mount(NULL, "/mnt/a/private", NULL, MS_PRIVATE, NULL);

	printf("from should be mountpoint");
	test_it(AT_FDCWD, "/mnt/a", AT_FDCWD, "/mnt/a/private", EINVAL);

	printf("to should be mountpoint");
	test_it(AT_FDCWD, "/mnt/a", AT_FDCWD, "/mnt/a/private/b", EINVAL);

	printf("from should not have anything locked in counterpart of to");
	test_it(AT_FDCWD, "mnt", AT_FDCWD, "/mnt/locked", EINVAL);

	printf("from should not have anything locked in counterpart of to");
	test_it(AT_FDCWD, "mnt", AT_FDCWD, "/mnt/no-locked", 0);
	mount(NULL, "/mnt/no-locked", NULL, MS_PRIVATE, NULL);

	fflush(stdout);
	write(pipe2[1], &c, 1);
	read(pipe1[0], &c, 1);
	return 0;
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH][RFC][CFT] use uniform permission checks for all mount propagation changes
  2025-08-14  4:42                 ` Al Viro
@ 2025-08-14  5:51                   ` Al Viro
  2025-08-14  5:57                     ` [RFC][CFT] selftest for permission checks in " Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-08-14  5:51 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Tycho Andersen, Andrei Vagin, Andrei Vagin, Christian Brauner,
	Pavel Tikhomirov, LKML, criu, Linux API, stable

[this should fix userland regression from do_change_type() fix last cycle;
we have too little self-test coverage in the area, unfortunately.  First
approximation for selftest in the followup to this posting.  Review and
testing of both the patch and test would be very welcome]

do_change_type() and do_set_group() are operating on different
aspects of the same thing - propagation graph.  The latter
asks for mounts involved to be mounted in namespace(s) the caller
has CAP_SYS_ADMIN for.  The former is a mess - originally it
didn't even check that mount *is* mounted.  That got fixed,
but the resulting check turns out to be too strict for userland -
in effect, we check that mount is in our namespace, having already
checked that we have CAP_SYS_ADMIN there.

What we really need (in both cases) is
	* we only touch mounts that are mounted.  Hard requirement,
data corruption if that's get violated.
	* we don't allow to mess with a namespace unless you already
have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns).

That's an equivalent of what do_set_group() does; let's extract that
into a helper (may_change_propagation()) and use it in both
do_set_group() and do_change_type().

Fixes: 12f147ddd6de "do_change_type(): refuse to operate on unmounted/not ours mounts"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/namespace.c b/fs/namespace.c
index ddfd4457d338..a191c6519e36 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2862,6 +2862,19 @@ static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
 	return attach_recursive_mnt(mnt, p, mp);
 }
 
+static int may_change_propagation(const struct mount *m)
+{
+        struct mnt_namespace *ns = m->mnt_ns;
+
+	 // it must be mounted in some namespace
+	 if (IS_ERR_OR_NULL(ns))         // is_mounted()
+		 return -EINVAL;
+	 // and the caller must be admin in userns of that namespace
+	 if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		 return -EPERM;
+	 return 0;
+}
+
 /*
  * Sanity check the flags to change_mnt_propagation.
  */
@@ -2898,10 +2911,10 @@ static int do_change_type(struct path *path, int ms_flags)
 		return -EINVAL;
 
 	namespace_lock();
-	if (!check_mnt(mnt)) {
-		err = -EINVAL;
+	err = may_change_propagation(mnt);
+	if (err)
 		goto out_unlock;
-	}
+
 	if (type == MS_SHARED) {
 		err = invent_group_ids(mnt, recurse);
 		if (err)
@@ -3347,18 +3360,11 @@ static int do_set_group(struct path *from_path, struct path *to_path)
 
 	namespace_lock();
 
-	err = -EINVAL;
-	/* To and From must be mounted */
-	if (!is_mounted(&from->mnt))
-		goto out;
-	if (!is_mounted(&to->mnt))
-		goto out;
-
-	err = -EPERM;
-	/* We should be allowed to modify mount namespaces of both mounts */
-	if (!ns_capable(from->mnt_ns->user_ns, CAP_SYS_ADMIN))
+	err = may_change_propagation(from);
+	if (err)
 		goto out;
-	if (!ns_capable(to->mnt_ns->user_ns, CAP_SYS_ADMIN))
+	err = may_change_propagation(to);
+	if (err)
 		goto out;
 
 	err = -EINVAL;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC][CFT] selftest for permission checks in mount propagation changes
  2025-08-14  5:51                   ` [PATCH][RFC][CFT] use uniform permission checks for all mount propagation changes Al Viro
@ 2025-08-14  5:57                     ` Al Viro
  2025-08-14  6:37                       ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-08-14  5:57 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Tycho Andersen, Andrei Vagin, Andrei Vagin, Christian Brauner,
	Pavel Tikhomirov, LKML, criu, Linux API, stable

// link with -lcap, can run both as root and as regular user
#define _GNU_SOURCE
#include <sched.h>
#include <sys/capability.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <stdbool.h>

_Bool drop_caps(void)
{
        cap_value_t cap_value[] = { CAP_SYS_ADMIN };
        cap_t cap = cap_get_proc();
        if (!cap) {
		perror("cap_get_proc");
		return false;
	}
	return true;
}

void do_unshare(void)
{
	FILE *f;
	uid_t uid = geteuid();
	gid_t gid = getegid();
	unshare(CLONE_NEWNS|CLONE_NEWUSER);
	f = fopen("/proc/self/uid_map", "w");
	fprintf(f, "0 %d 1", uid);
	fclose(f);
	f = fopen("/proc/self/setgroups", "w");
	fprintf(f, "deny");
	fclose(f);
	f = fopen("/proc/self/gid_map", "w");
	fprintf(f, "0 %d 1", gid);
	fclose(f);
	mount(NULL, "/", NULL, MS_REC|MS_PRIVATE, NULL);
}

void bind(char *p)
{
	mount(p, p, NULL, MS_BIND, NULL);
}

void change_type(char *p, int type)
{
	errno = 0;
	mount(NULL, p, NULL, type, NULL);
}

void set_group(int fd1, char *p1, int fd2, char *p2)
{
	int flags = MOVE_MOUNT_SET_GROUP;
	int n;

	if (!p1 || !*p1) {
		p1 = "";	// be kind to old kernels
		flags |= MOVE_MOUNT_F_EMPTY_PATH;
	}
	if (!p2 || !*p2) {
		p2 = "";	// be kind to old kernels
		flags |= MOVE_MOUNT_T_EMPTY_PATH;
	}
	errno = 0;
	move_mount(fd1, p1, fd2, p2, flags);
}

_Bool result(int expected)
{
	if (expected != errno) {
		printf(" failed: %d != %d\n", expected, errno);
		return false;
	}
	printf(" OK\n");
	return true;
}

int fd1[2], fd2[2];

void in_child(void)
{
	printf("from should be someplace we have permissions for");
	set_group(AT_FDCWD, "mnt/a", AT_FDCWD, "/mnt/a/private");
	result(EPERM);
	printf("to should be someplace we have permissions for");
	set_group(AT_FDCWD, "/mnt/a", AT_FDCWD, "mnt/a/private");
	result(EPERM);
	printf("change_type: should have permissions for target");
	change_type("mnt/locked", MS_PRIVATE);
	result(EPERM);
}

void in_parent(void)
{
	printf("from should be mounted (pipes are not)");
	set_group(fd1[0], NULL, AT_FDCWD, "/mnt/a/private");
	result(EINVAL);

	printf("to should be mounted (pipes are not)");
	set_group(AT_FDCWD, "/mnt", fd1[0], NULL);
	result(EINVAL);

	printf("from should be someplace we have permissions for");
	set_group(AT_FDCWD, "mnt/a", AT_FDCWD, "/mnt/a/private");
	if (result(0))
		change_type("/mnt/a/private", MS_PRIVATE);

	printf("to should be someplace we have permissions for");
	set_group(AT_FDCWD, "/mnt", AT_FDCWD, "mnt/a/private");
	if (result(0))
		change_type("mnt/a/private", MS_PRIVATE);

	printf("from should be mountpoint");
	set_group(AT_FDCWD, "/mnt/a", AT_FDCWD, "/mnt/a/private");
	result(EINVAL);

	printf("to should be mountpoint");
	set_group(AT_FDCWD, "/mnt/a", AT_FDCWD, "/mnt/a/private/b");
	result(EINVAL);

	printf("to and from should be on the same filesystem");
	mount("none", "/mnt/no-locked", "tmpfs", 0, NULL);
	set_group(AT_FDCWD, "/mnt/a/shared", AT_FDCWD, "/mnt/no-locked");
	result(EINVAL);
	umount("/mnt/no-locked");

	printf("from should contain the counterpart of to");
	set_group(AT_FDCWD, "/mnt/a/shared", AT_FDCWD, "/mnt/no-locked");
	result(EINVAL);

	printf("from should not have anything locked in counterpart of to");
	set_group(AT_FDCWD, "mnt", AT_FDCWD, "/mnt/no-locked");
	if (result(0))
		change_type("/mnt/no-locked", MS_PRIVATE);

	printf("change_type: should have permissions for target");
	change_type("mnt/a/private", MS_PRIVATE);
	result(0);

	printf("change_type: target should be a mountpoint");
	change_type("/mnt/a", MS_PRIVATE);
	result(EINVAL);

	chdir("/mnt/a/private");
	umount2("/mnt/a/private", MNT_DETACH);
	printf("change_type: target should be mounted");
	change_type(".", MS_PRIVATE);
	result(EINVAL);
}

int main()
{
	char path[40];
	pid_t child;
	int root_fd;
	char c;

	if (pipe(fd1) < 0 || pipe(fd2) < 0) {
		perror("pipe");
		return -1;
	}
	if (!drop_caps())
		return -1;
	do_unshare();

	root_fd = open("/", O_PATH);

	errno = 0;
	mount("none", "/mnt", "tmpfs", 0, NULL);
	mkdir("/mnt/a", 0777);
	mkdir("/mnt/a/private", 0777);
	mkdir("/mnt/a/private/b", 0777);
	mkdir("/mnt/a/shared", 0777);
	mkdir("/mnt/a/slave", 0777);
	mkdir("/mnt/a/shared-slave", 0777);
	mkdir("/mnt/locked", 0777);
	mkdir("/mnt/no-locked", 0777);
	bind("/mnt/locked");

	child = fork();
	if (child < 0) {
		perror("fork");
		return -1;
	} else if (child == 0) {
		do_unshare();
		change_type("/mnt/", MS_SHARED);
		bind("/mnt/a");
		bind("/mnt/a/private");
		change_type("/mnt/a/private", MS_PRIVATE);
		write(fd1[1], &c, 1);
		read(fd2[0], &c, 1);

		fchdir(root_fd);
		in_child();

		write(fd1[1], &c, 1);
		return 0;
	}
	read(fd1[0], &c, 1);
	sprintf(path, "/proc/%d/root", child);
	chdir(path);

	change_type("/mnt", MS_SHARED);
	bind("/mnt/a/private");
	bind("/mnt/a/shared");
	bind("/mnt/a/slave");
	bind("/mnt/a/slave-shared");
	bind("/mnt/no-locked");
	change_type("/mnt/a/private", MS_PRIVATE);
	change_type("/mnt/a/slave", MS_SLAVE);
	change_type("/mnt/a/shared-slave", MS_SLAVE);
	change_type("/mnt/a/shared-slave", MS_SHARED);
	change_type("/mnt/no-locked", MS_PRIVATE);

	in_parent();

	fflush(stdout);
	write(fd2[1], &c, 1);
	read(fd1[0], &c, 1);
	return 0;
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][CFT] selftest for permission checks in mount propagation changes
  2025-08-14  5:57                     ` [RFC][CFT] selftest for permission checks in " Al Viro
@ 2025-08-14  6:37                       ` Al Viro
  0 siblings, 0 replies; 18+ messages in thread
From: Al Viro @ 2025-08-14  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Tycho Andersen, Andrei Vagin, Andrei Vagin, Christian Brauner,
	Pavel Tikhomirov, LKML, criu, Linux API, stable

> void do_unshare(void)
> {
> 	FILE *f;
> 	uid_t uid = geteuid();
> 	gid_t gid = getegid();
> 	unshare(CLONE_NEWNS|CLONE_NEWUSER);
> 	f = fopen("/proc/self/uid_map", "w");
> 	fprintf(f, "0 %d 1", uid);
> 	fclose(f);
> 	f = fopen("/proc/self/setgroups", "w");
> 	fprintf(f, "deny");
> 	fclose(f);
> 	f = fopen("/proc/self/gid_map", "w");
> 	fprintf(f, "0 %d 1", gid);
> 	fclose(f);
> 	mount(NULL, "/", NULL, MS_REC|MS_PRIVATE, NULL);
> }

This obviously needs error checking - in this form it won't do
anything good without userns enabled (coredump on the first
fprintf() in there, since there won't be /proc/self/uid_map);
should probably just report CLONE_NEWUSER failure, warn about
skipped tests, fall back to unshare(CLONE_NEWNS) and skip
everything in in_child()...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: do_change_type(): refuse to operate on unmounted/not ours mounts
  2025-08-14  4:08               ` Pavel Tikhomirov
  2025-08-14  4:42                 ` Al Viro
@ 2025-08-14  7:07                 ` Pavel Tikhomirov
  1 sibling, 0 replies; 18+ messages in thread
From: Pavel Tikhomirov @ 2025-08-14  7:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Tycho Andersen, Andrei Vagin, Andrei Vagin, Christian Brauner,
	linux-fsdevel, LKML, criu, Linux API, stable

> It should be enough to run a zdtm test-suit to check that change does
> not break something for CRIU (will do).

jfyi: checked 0cc53520e68 with patch "[PATCH] use uniform permission
checks for all mount propagation changes" (+ s/from/to/), there is no
problem on criu-zdtm mount related tests. I see some problems on
socket related tests on it, but it looks unrelated.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-08-14  7:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-24 20:02 do_change_type(): refuse to operate on unmounted/not ours mounts Andrei Vagin
2025-07-24 23:00 ` Al Viro
2025-07-24 23:38   ` Andrei Vagin
2025-07-26 17:12   ` Andrei Vagin
2025-07-26 17:53     ` Al Viro
2025-07-26 21:01       ` Andrei Vagin
2025-07-31  2:40         ` Pavel Tikhomirov
2025-07-31  7:53           ` Christian Brauner
2025-07-31  8:11             ` Pavel Tikhomirov
2025-08-13 18:56         ` Al Viro
2025-08-13 19:09           ` Tycho Andersen
2025-08-13 19:41             ` Al Viro
2025-08-14  4:08               ` Pavel Tikhomirov
2025-08-14  4:42                 ` Al Viro
2025-08-14  5:51                   ` [PATCH][RFC][CFT] use uniform permission checks for all mount propagation changes Al Viro
2025-08-14  5:57                     ` [RFC][CFT] selftest for permission checks in " Al Viro
2025-08-14  6:37                       ` Al Viro
2025-08-14  7:07                 ` do_change_type(): refuse to operate on unmounted/not ours mounts Pavel Tikhomirov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).