Re: Regression in 5.3 for some FS_USERNS_MOUNT (aka user-namespace-mountable) filesystems

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Christian Brauner <christian@brauner.io>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	David Howells <dhowells@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: Regression in 5.3 for some FS_USERNS_MOUNT (aka user-namespace-mountable) filesystems
Date: Sat, 27 Jul 2019 13:37:05 +0100	[thread overview]
Message-ID: <20190727123705.GP1131@ZenIV.linux.org.uk> (raw)
In-Reply-To: <87h877pvv1.fsf@xmission.com>

On Sat, Jul 27, 2019 at 06:20:18AM -0500, Eric W. Biederman wrote:

> > In principle I like killing FS_USERNS_MOUNT flag, but when a method
> > is always either NULL or exact same function...
> 
> Either you are being dramatic or you read the patch much too quickly.

Or you've done the same to my reply.  I'm not saying that in the
posted form the instances are the same; I'm saying that they are
all equivalent to exact same code - ns_capable(fc->user_ns, CAP_SYS_ADMIN),
that is.

> userns_mount_permission covers the common case of FS_USERNS_MOUNT.
> Then there are the cases where you need to know how the filesystem is
> going to map current into the filesystem that will be mounted.  Those
> are: proc_mount_permission, sysfs_mount_permission,
> mqueue_mount_permission, cgroup_mount_permission,

+static int proc_mount_permission(void)
+{
+       struct pid_namespace *ns = task_active_pid_ns(current);
+       return ns_capable(ns->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
+}

compare with
        ctx->pid_ns = get_pid_ns(task_active_pid_ns(current));
        put_user_ns(fc->user_ns);
        fc->user_ns = get_user_ns(ctx->pid_ns->user_ns);
in proc_init_fs_context().  Notice anything?  With those 'orrible
API changes proc_mount_permissions() and userns_mount_permission()
can actually become identical.  Because the default case is
to leave fc->user_ns set to current_user_ns().

cgroup case:
+static int cgroup_mount_permission(void)
+{
+       struct cgroup_namespace *ns = current->nsproxy->cgroup_ns;
+       return ns_capable(ns->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
+}

with
        ctx->ns = current->nsproxy->cgroup_ns;
...
        put_user_ns(fc->user_ns);
        fc->user_ns = get_user_ns(ctx->ns->user_ns);
in cgroup_init_fs_context().  IOW, again your instances fold together.

mqueue:
        ctx->ipc_ns = get_ipc_ns(current->nsproxy->ipc_ns);
        put_user_ns(fc->user_ns);
        fc->user_ns = get_user_ns(ctx->ipc_ns->user_ns);
in mqueue_init_fs_context() and
+static int mqueue_mount_permission(void)
+{
+       struct ipc_namespace *ns = current->nsproxy->ipc_ns;
+       return ns_capable(ns->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
+}

Same situation.

+static int sysfs_mount_permission(void)
+{
+       struct net *net = current->nsproxy->net_ns;
+       return ns_capable(net->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
+}

with
        kfc->ns_tag = netns = kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
...
        if (netns) {
                put_user_ns(fc->user_ns);
                fc->user_ns = get_user_ns(netns->user_ns);
        }
Now, _that_ is interesting.  Here the variants diverge - in case
USER_NS && !NET_NS fc->user_ns is left at current_user_ns().

That, BTW, is the only case where the old checks are left in:
        if (!(fc->sb_flags & SB_KERNMOUNT)) {
                if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
                        return -EPERM;
        }
in sysfs_init_fs_context() papers over that case and I'd love to
get rid of that irregularity.  Which, AFAICS, can be done by
unconditional
	put_user_ns(fc->user_ns);
	fc->user_ns = get_user_ns(current->nsproxy->net_ns->user_ns);
whatever the .config we have.  On NET_NS ones it'll be identical
to ->user_ns of what kobj_ns_grab_current(KOBJ_NS_TYPE_NET) returns,
on !NET_NS it'll get mount_capable() do the right thing without
that wart with init_fs_context() doing that extra check.

> So yes I agree the function of interest is always capable in some form,
> we just need the filesystem specific logic to check to see if we will
> have capable over the filesystem that will be mounted.
> 
> I don't doubt that the new mount api has added a few new complexities.

So far it looks like *in this particular case* complexities would be
reduced - with one exception all your ->permission() instances become
identical.

Moreover, even in that case we still get the right overall behaviour
with the same instance...

So do you have any specific objections to behaviour in vfs.git #fixes
and/or to adding

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index db81cfbab9d6..c5b3c7d4d360 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -57,11 +57,6 @@ static int sysfs_init_fs_context(struct fs_context *fc)
 	struct kernfs_fs_context *kfc;
 	struct net *netns;
 
-	if (!(fc->sb_flags & SB_KERNMOUNT)) {
-		if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
-			return -EPERM;
-	}
-
 	kfc = kzalloc(sizeof(struct kernfs_fs_context), GFP_KERNEL);
 	if (!kfc)
 		return -ENOMEM;
@@ -71,10 +66,8 @@ static int sysfs_init_fs_context(struct fs_context *fc)
 	kfc->magic = SYSFS_MAGIC;
 	fc->fs_private = kfc;
 	fc->ops = &sysfs_fs_context_ops;
-	if (netns) {
-		put_user_ns(fc->user_ns);
-		fc->user_ns = get_user_ns(netns->user_ns);
-	}
+	put_user_ns(fc->user_ns);
+	fc->user_ns = get_user_ns(current->nsproxy->net_ns->user_ns);
 	fc->global = true;
 	return 0;
 }

on top of that?  The last one shouldn't change the behaviour, but it
certainly looks like a nice wartectomy; not #fixes material, though.

next prev parent reply	other threads:[~2019-07-27 12:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-26 11:59 Regression in 5.3 for some FS_USERNS_MOUNT (aka user-namespace-mountable) filesystems Christian Brauner
2019-07-26 22:47 ` Linus Torvalds
2019-07-26 23:22   ` Al Viro
2019-07-27  0:46     ` Eric W. Biederman
2019-07-27  2:28       ` Al Viro
2019-07-27 11:20         ` Eric W. Biederman
2019-07-27 12:37           ` Al Viro [this message]
2019-07-27 13:17             ` Al Viro
2019-07-27  2:23     ` Al Viro

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:db81cfbab9d dfblob:c5b3c7d4d36 )
 OR (
bs:"Re: Regression in 5.3 for some FS_USERNS_MOUNT (aka user-namespace-mountable) filesystems" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190727123705.GP1131@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).