[RFC PATCH 0/14] user namespaces: continue targetting capabilities

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/14] user namespaces: continue targetting capabilities
@ 2011-07-12 23:30 Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
                   ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm

Hi,

here is a set of patches to continue targetting capabilities
where appropriate.  This set goes about as far as is possible
without making the VFS user namespace aware, meaning that the
VFS can provide a namespaced view of userids, i.e init_user_ns
sees file owner 500, while child user ns sees file owner 0 or
1000.

With this set applied, you can create and configure veth netdevs
if your user namespace owns your network namespace (and you are
privileged), but not otherwise.

Some simple testcases can be found at
https://code.launchpad.net/~serge-hallyn/+junk/usernstests with
packages at https://launchpad.net/~serge-hallyn/+archive/userns-natty

Feedback very much appreciated.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 01/14] add Documentation/namespaces/user_namespace.txt
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-13 12:45   ` David Howells
  2011-07-12 23:30 ` [RFC PATCH 02/14] allow root in container to copy namespaces Serge Hallyn
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

This will hold some info about the design.  Currently it contains
future todos, issues and questions.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 Documentation/namespaces/user_namespace.txt |   93 +++++++++++++++++++++++++++
 1 files changed, 93 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/namespaces/user_namespace.txt

diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
new file mode 100644
index 0000000..24c894f
--- /dev/null
+++ b/Documentation/namespaces/user_namespace.txt
@@ -0,0 +1,93 @@
+Description
+===========
+
+Traditionally, each task is owned by a userid (uid) and belongs to one
+or more groups (gid).  Both are simple numeric ids, though userspace
+usually translates them to names.  The user namespace allows tasks to
+have different views of the uids and gids associated with tasks and
+other resources.
+
+The user namespace is a simple heirarchical one.  The system begins
+with all tasks belonging to the initial user namespace.  A task creates
+a new user namespace by passing the CLONE_NEWUSER flag to clone(2).
+To do so, the creating task needs the CAP_SETUID, CAP_SETGID, and
+CAP_CHOWN capabilities, but does not need to be root.  The clone(2)
+call will result in a new task which to the creator appears to have
+the same credentials as itself, but which sees itself as being uid
+and gid 0.  Any task in or resource belonging to the initial user
+namespace will, to this new task, appear to belong to uid and gid
+-1, which is usually known as 'nobody'.  Opening such files will
+result in obtaining the 'user other' permissions.  UID comparisons
+will return false, and privilege will be denied.
+
+When a task belonging to userid 500 in the initial user namespace
+creates a new user namespace, even though the new task will see itself
+as belonging to uid 0, any task in the initial user namespace
+will see it as belonging to uid 500.  Therefore, uid 500 in the
+initial user namespace will be able to kill the new task.  Files
+created by the new user will (eventually) be seen by tasks in its
+own user namespace as belonging to uid 0, but to tasks in the initial
+user namespace as belonging to uid 500.  Note that this userid
+mapping for the VFS is not yet implemented, though the lkml and
+containers mailing list archives will show several previous prototypes.
+In the end, those got hung up waiting on the concept of targeted
+capabilities to be developed, which, thanks to the insight of Eric
+Biederman, they finally did.
+
+Other namespaces, such as UTS and network, are owned by a user
+namespace.  When such a namespace is created, it is assigned to the user
+namespace by which it was created.  Therefore, attempts to exercise
+privilege to resources in a network namespace can be properly validated
+by checking whether the caller has the needed privilege targeted to the
+user namespace owning the network namespace.  This is called checking
+targeted capabilities, and is done using the 'ns_capable' function.
+
+As an example, if a new task is cloned with a private user namespace but
+no private network namespace, then the task's network namespace is owned
+by the parent user namespace.  The new task has no privilege to the
+parent user namespace, so it will not be able to create or configure
+network devices.  If, instead, the task were cloned with both private
+user and network namespaces, then the private network namespace is owned
+by the private user namespace, and so root in the new user namespace
+will have privilege targeted to the network namespace.  It will be able
+to create and configure network devices.
+
+Working notes
+=============
+capable checks for actions related to syslog must be against the
+init_user_ns until syslog is containerized.
+
+Same is true for reboot and power, control groups, devices, and time.
+
+Perf actions (kernel/event/core.c for instance) will always be
+constrained to init_user_ns.
+
+Q:
+Is accounting considered properly containerized wrt pidns?  (it
+appears to be).  If so, then we can change the capable check in
+kernel/acct.c to 'ns_capable(current_pid_ns()->user_ns, CAP_PACCT)'
+
+Q:
+For things like nice and schedaffinity, we could allow root in a
+container to control those, and leave only cgroups to constrain
+the container.  I'm not sure whether that is right, or whether it
+violates admin expectations.
+
+I punted on some of commoncap.c.  I'm punting on xattr stuff as
+they take dentries, not inodes.
+
+For drivers/tty/tty_io.c and drivers/tty/vt/vt.c, we'll want to (for
+some of them) target at the user_ns owning the tty.  That will have
+to wait until we get userns owning files straightened out.
+
+We need to figure out how to label devices.  Should we just toss a user_ns
+right into struct device?
+
+capable(CAP_MAC_ADMIN) checks are always to be against init_user_ns,
+unless some day LSMs were to be containerized, near zero chance.
+
+inode_owner_or_capable() should probably take an optional ns and
+cap paramter.  If cap is 0, then CAP_FOWNER is checked.  If ns is
+NULL, we derive the ns from inode.  But if ns is provided, then
+callers who need to derive inode_userns(inode) anyway can save a
+few cycles.
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 02/14] allow root in container to copy namespaces
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 03/14] keyctl: check capabilities against key's user_ns Serge Hallyn
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

Othewise nested containers with user namespaces won't be possible.

It's true that user namespaces are not yet fully isolated, but for
that same reason there are far worse things that root in a child
user ns can do.  Spawning a child user ns is not in itself bad.

This patch also allows setns for root in a container:
@Eric Biederman: are there gotchas in allowing setns from child
userns?

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/fork.c    |    4 ++--
 kernel/nsproxy.c |    6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 0276c30..01d7564 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1475,8 +1475,8 @@ long do_fork(unsigned long clone_flags,
 		/* hopefully this check will go away when userns support is
 		 * complete
 		 */
-		if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
-				!capable(CAP_SETGID))
+		if (!nsown_capable(CAP_SYS_ADMIN) || !nsown_capable(CAP_SETUID) ||
+				!nsown_capable(CAP_SETGID))
 			return -EPERM;
 	}
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index d6a00f3..a687405 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -134,7 +134,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 				CLONE_NEWPID | CLONE_NEWNET)))
 		return 0;
 
-	if (!capable(CAP_SYS_ADMIN)) {
+	if (!nsown_capable(CAP_SYS_ADMIN)) {
 		err = -EPERM;
 		goto out;
 	}
@@ -191,7 +191,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 			       CLONE_NEWNET)))
 		return 0;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!nsown_capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
 	*new_nsp = create_new_namespaces(unshare_flags, current,
@@ -241,7 +241,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
 	struct file *file;
 	int err;
 
-	if (!capable(CAP_SYS_ADMIN))
+	if (!nsown_capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
 	file = proc_ns_fget(fd);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 03/14] keyctl: check capabilities against key's user_ns
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 02/14] allow root in container to copy namespaces Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-13 16:04   ` David Howells
  2011-07-12 23:30 ` [RFC PATCH 04/14] user_ns: convert fs/attr.c to targeted capabilities Serge Hallyn
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

ATM, task should only be able to get his own user_ns's keys
anyway, so nsown_capable should also work, but there is no
advantage to doing that, while using key's user_ns is clearer.

changelog: jun 6:
	compile fix: keyctl.c (key_user, not key has user_ns)

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 security/keys/keyctl.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index eca5191..fa7d420 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -745,7 +745,7 @@ long keyctl_chown_key(key_serial_t id, uid_t uid, gid_t gid)
 	ret = -EACCES;
 	down_write(&key->sem);
 
-	if (!capable(CAP_SYS_ADMIN)) {
+	if (!ns_capable(key->user->user_ns, CAP_SYS_ADMIN)) {
 		/* only the sysadmin can chown a key to some other UID */
 		if (uid != (uid_t) -1 && key->uid != uid)
 			goto error_put;
@@ -852,7 +852,8 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm)
 	down_write(&key->sem);
 
 	/* if we're not the sysadmin, we can only change a key that we own */
-	if (capable(CAP_SYS_ADMIN) || key->uid == current_fsuid()) {
+	if (ns_capable(key->user->user_ns, CAP_SYS_ADMIN) ||
+	    key->uid == current_fsuid()) {
 		key->perm = perm;
 		ret = 0;
 	}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 04/14] user_ns: convert fs/attr.c to targeted capabilities
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (2 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 03/14] keyctl: check capabilities against key's user_ns Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 05/14] userns: clamp down users of cap_raised Serge Hallyn
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/attr.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index caf2aa5..c6d502d 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -29,6 +29,7 @@
 int inode_change_ok(const struct inode *inode, struct iattr *attr)
 {
 	unsigned int ia_valid = attr->ia_valid;
+	struct user_namespace *ns;
 
 	/*
 	 * First check size constraints.  These can't be overriden using
@@ -44,26 +45,28 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
 	if (ia_valid & ATTR_FORCE)
 		return 0;
 
+	ns = inode_userns(inode);
 	/* Make sure a caller can chown. */
 	if ((ia_valid & ATTR_UID) &&
-	    (current_fsuid() != inode->i_uid ||
-	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
+	    (ns != current_user_ns() || current_fsuid() != inode->i_uid ||
+	     attr->ia_uid != inode->i_uid) && !ns_capable(ns, CAP_CHOWN))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
 	if ((ia_valid & ATTR_GID) &&
-	    (current_fsuid() != inode->i_uid ||
+	    (ns != current_user_ns() || current_fsuid() != inode->i_uid ||
 	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
-	    !capable(CAP_CHOWN))
+	    !ns_capable(ns, CAP_CHOWN))
 		return -EPERM;
 
 	/* Make sure a caller can chmod. */
 	if (ia_valid & ATTR_MODE) {
+		gid_t gid = (ia_valid & ATTR_GID) ? attr->ia_gid : inode->i_gid;
 		if (!inode_owner_or_capable(inode))
 			return -EPERM;
 		/* Also check the setgid bit! */
-		if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
-				inode->i_gid) && !capable(CAP_FSETID))
+		if ((ns != current_user_ns() || !in_group_p(gid)) &&
+		    !ns_capable(ns, CAP_FSETID))
 			attr->ia_mode &= ~S_ISGID;
 	}
 
@@ -154,9 +157,12 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
 						inode->i_sb->s_time_gran);
 	if (ia_valid & ATTR_MODE) {
 		umode_t mode = attr->ia_mode;
+		struct user_namespace *ns = inode_userns(inode);
 
-		if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+		if ((ns != current_user_ns() || !in_group_p(inode->i_gid)) &&
+		    !ns_capable(ns, CAP_FSETID))
 			mode &= ~S_ISGID;
+
 		inode->i_mode = mode;
 	}
 }
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 05/14] userns: clamp down users of cap_raised
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (3 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 04/14] user_ns: convert fs/attr.c to targeted capabilities Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 06/14] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

A few modules are using cap_raised(current_cap(), cap) to authorize
actions, but the privilege should be applicable against the initial
user namespace.  Refuse privilege if the caller is not in init_user_ns.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/block/drbd/drbd_nl.c           |    5 +++++
 drivers/md/dm-log-userspace-transfer.c |    3 +++
 drivers/staging/pohmelfs/config.c      |    3 +++
 drivers/video/uvesafb.c                |    3 +++
 4 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 515bcd9..7717f8a 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2297,6 +2297,11 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
 		return;
 	}
 
+	if (current_user_ns() != &init_user_ns) {
+		retcode = ERR_PERM;
+		goto fail;
+	}
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 1f23e04..140ca81 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -134,6 +134,9 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c
index b6c42cb..cd259d0 100644
--- a/drivers/staging/pohmelfs/config.c
+++ b/drivers/staging/pohmelfs/config.c
@@ -525,6 +525,9 @@ static void pohmelfs_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *n
 {
 	int err;
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 7f8472c..71dab8e 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -73,6 +73,9 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 06/14] user namespace: make each net (net_ns) belong to a user_ns
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (4 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 05/14] userns: clamp down users of cap_raised Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 07/14] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

This way we can target capabilites at the user_ns which created the
net ns.

Changelog:
   jul 8: nsproxy: don't assign netns->userns if not cloning.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/net_namespace.h |    2 ++
 kernel/nsproxy.c            |    2 ++
 net/core/net_namespace.c    |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index aef430d..0664fff 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -29,6 +29,7 @@ struct ctl_table_header;
 struct net_generic;
 struct sock;
 struct netns_ipvs;
+struct user_namespace;
 
 
 #define NETDEV_HASHBITS    8
@@ -100,6 +101,7 @@ struct net {
 	struct netns_xfrm	xfrm;
 #endif
 	struct netns_ipvs	*ipvs;
+	struct user_namespace	*user_ns;
 };
 
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index a687405..d3e5b7e 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -95,6 +95,8 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 		err = PTR_ERR(new_nsp->net_ns);
 		goto out_net;
 	}
+	if (flags & CLONE_NEWNET)
+		new_nsp->net_ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 
 	return new_nsp;
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index ea489db..2b5a4cb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -10,6 +10,7 @@
 #include <linux/nsproxy.h>
 #include <linux/proc_fs.h>
 #include <linux/file.h>
+#include <linux/user_namespace.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -208,6 +209,7 @@ static void net_free(struct net *net)
 	}
 #endif
 	kfree(net->gen);
+	put_user_ns(net->user_ns);
 	kmem_cache_free(net_cachep, net);
 }
 
@@ -388,6 +390,7 @@ static int __init net_ns_init(void)
 	rcu_assign_pointer(init_net.gen, ng);
 
 	mutex_lock(&net_mutex);
+	init_net.user_ns = &init_user_ns;
 	if (setup_net(&init_net))
 		panic("Could not setup the initial network namespace");
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 07/14] user namespace: use net->user_ns for some capable calls under net/
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (5 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 06/14] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 08/14] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge Hallyn

From: Serge Hallyn <serge.hallyn@ubuntu.com>

Just a partial conversion to show how the previous patch is expected to
be used.

Changelog:
  6/28/11: fix typo in net/core/sock.c
  7/08/11: don't target capability which authorizes module loading

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/dev.c  |    4 ++--
 net/core/sock.c |   14 ++++++++------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9c58c1e..63d25b3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5004,7 +5004,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCGMIIPHY:
 	case SIOCGMIIREG:
 	case SIOCSIFNAME:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		dev_load(net, ifr.ifr_name);
 		rtnl_lock();
@@ -5043,7 +5043,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCBRADDIF:
 	case SIOCBRDELIF:
 	case SIOCSHWTSTAMP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		/* fall through */
 	case SIOCBONDSLAVEINFOQUERY:
diff --git a/net/core/sock.c b/net/core/sock.c
index 6e81978..965b1e7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -417,7 +417,7 @@ static int sock_bindtodevice(struct sock *sk, char __user *optval, int optlen)
 
 	/* Sorry... */
 	ret = -EPERM;
-	if (!capable(CAP_NET_RAW))
+	if (!ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out;
 
 	ret = -EINVAL;
@@ -485,6 +485,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 	int valbool;
 	struct linger ling;
 	int ret = 0;
+	struct net *net = sock_net(sk);
 
 	/*
 	 *	Options without arguments
@@ -505,7 +506,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 
 	switch (optname) {
 	case SO_DEBUG:
-		if (val && !capable(CAP_NET_ADMIN))
+		if (val && !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			ret = -EACCES;
 		else
 			sock_valbool_flag(sk, SOCK_DBG, valbool);
@@ -548,7 +549,7 @@ set_sndbuf:
 		break;
 
 	case SO_SNDBUFFORCE:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			ret = -EPERM;
 			break;
 		}
@@ -586,7 +587,7 @@ set_rcvbuf:
 		break;
 
 	case SO_RCVBUFFORCE:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			ret = -EPERM;
 			break;
 		}
@@ -609,7 +610,8 @@ set_rcvbuf:
 		break;
 
 	case SO_PRIORITY:
-		if ((val >= 0 && val <= 6) || capable(CAP_NET_ADMIN))
+		if ((val >= 0 && val <= 6) ||
+		     ns_capable(net->user_ns, CAP_NET_ADMIN))
 			sk->sk_priority = val;
 		else
 			ret = -EPERM;
@@ -726,7 +728,7 @@ set_rcvbuf:
 			clear_bit(SOCK_PASSSEC, &sock->flags);
 		break;
 	case SO_MARK:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			ret = -EPERM;
 		else
 			sk->sk_mark = val;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 08/14] af_netlink.c: make netlink_capable userns-aware
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (6 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 07/14] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-13  1:33   ` Eric Dumazet
  2011-07-12 23:30 ` [RFC PATCH 09/14] user ns: convert ipv6 to targeted capabilities Serge Hallyn
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

netlink_capable should check for permissions against the user
namespace owning the socket in question.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/netlink/af_netlink.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6ef64ad..81c1099 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -580,8 +580,15 @@ retry:
 
 static inline int netlink_capable(struct socket *sock, unsigned int flag)
 {
-	return (nl_table[sock->sk->sk_protocol].nl_nonroot & flag) ||
-	       capable(CAP_NET_ADMIN);
+	struct net *net;
+	if (nl_table[sock->sk->sk_protocol].nl_nonroot & flag)
+		return 1;
+#ifdef CONFIG_NET_NS
+	net = sock->sk->sk_net;
+#else
+	net = &init_net;
+#endif
+	return ns_capable(net->user_ns, CAP_NET_ADMIN);
 }
 
 static void
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 09/14] user ns: convert ipv6 to targeted capabilities
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (7 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 08/14] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/ipv6/addrconf.c             |    4 ++--
 net/ipv6/af_inet6.c             |    6 ++++--
 net/ipv6/datagram.c             |    6 +++---
 net/ipv6/ip6_flowlabel.c        |   24 ++++++++++++++----------
 net/ipv6/ip6_tunnel.c           |    4 ++--
 net/ipv6/ip6mr.c                |    2 +-
 net/ipv6/ipv6_sockglue.c        |    7 ++++---
 net/ipv6/netfilter/ip6_tables.c |    8 ++++----
 net/ipv6/route.c                |    2 +-
 net/ipv6/sit.c                  |   10 +++++-----
 10 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 498b927..42b783b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2219,7 +2219,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
@@ -2238,7 +2238,7 @@ int addrconf_del_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3b5669a..1854ffe 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -160,7 +160,8 @@ lookup_protocol:
 	}
 
 	err = -EPERM;
-	if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
+	if (sock->type == SOCK_RAW && !kern &&
+	    !ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -281,7 +282,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		return -EINVAL;
 
 	snum = ntohs(addr->sin6_port);
-	if (snum && snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
+	if (snum && snum < PROT_SOCK &&
+	    !ns_capable(sock_net(sk)->user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	lock_sock(sk);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 1656033..7e38d8f 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -694,7 +694,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -714,7 +714,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -739,7 +739,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index f3caf1b..4726c02 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -294,21 +294,22 @@ struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions * opt_space,
 	return opt_space;
 }
 
-static unsigned long check_linger(unsigned long ttl)
+static unsigned long check_linger(unsigned long ttl, struct user_namespace *ns)
 {
 	if (ttl < FL_MIN_LINGER)
 		return FL_MIN_LINGER*HZ;
-	if (ttl > FL_MAX_LINGER && !capable(CAP_NET_ADMIN))
+	if (ttl > FL_MAX_LINGER && !ns_capable(ns, CAP_NET_ADMIN))
 		return 0;
 	return ttl*HZ;
 }
 
-static int fl6_renew(struct ip6_flowlabel *fl, unsigned long linger, unsigned long expires)
+static int fl6_renew(struct ip6_flowlabel *fl, unsigned long linger,
+		     unsigned long expires, struct user_namespace *ns)
 {
-	linger = check_linger(linger);
+	linger = check_linger(linger, ns);
 	if (!linger)
 		return -EPERM;
-	expires = check_linger(expires);
+	expires = check_linger(expires, ns);
 	if (!expires)
 		return -EPERM;
 	fl->lastuse = jiffies;
@@ -375,7 +376,7 @@ fl_create(struct net *net, struct in6_flowlabel_req *freq, char __user *optval,
 
 	fl->fl_net = hold_net(net);
 	fl->expires = jiffies;
-	err = fl6_renew(fl, freq->flr_linger, freq->flr_expires);
+	err = fl6_renew(fl, freq->flr_linger, freq->flr_expires, net->user_ns);
 	if (err)
 		goto done;
 	fl->share = freq->flr_share;
@@ -425,7 +426,7 @@ static int mem_check(struct sock *sk)
 	if (room <= 0 ||
 	    ((count >= FL_MAX_PER_SOCK ||
 	      (count > 0 && room < FL_MAX_SIZE/2) || room < FL_MAX_SIZE/4) &&
-	     !capable(CAP_NET_ADMIN)))
+	     !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
 		return -ENOBUFS;
 
 	return 0;
@@ -507,17 +508,20 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		read_lock_bh(&ip6_sk_fl_lock);
 		for (sfl = np->ipv6_fl_list; sfl; sfl = sfl->next) {
 			if (sfl->fl->label == freq.flr_label) {
-				err = fl6_renew(sfl->fl, freq.flr_linger, freq.flr_expires);
+				err = fl6_renew(sfl->fl, freq.flr_linger, freq.flr_expires,
+						net->user_ns);
 				read_unlock_bh(&ip6_sk_fl_lock);
 				return err;
 			}
 		}
 		read_unlock_bh(&ip6_sk_fl_lock);
 
-		if (freq.flr_share == IPV6_FL_S_NONE && capable(CAP_NET_ADMIN)) {
+		if (freq.flr_share == IPV6_FL_S_NONE &&
+		    ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			fl = fl_lookup(net, freq.flr_label);
 			if (fl) {
-				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires);
+				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires,
+						net->user_ns);
 				fl_release(fl);
 				return err;
 			}
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 36c2842..1a98c23 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1269,7 +1269,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof (p)))
@@ -1304,7 +1304,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 82a8099..29e992d 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1581,7 +1581,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		return -ENOENT;
 
 	if (optname != MRT6_INIT) {
-		if (sk != mrt->mroute6_sk && !capable(CAP_NET_ADMIN))
+		if (sk != mrt->mroute6_sk && !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 9cb191e..196b099 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -343,7 +343,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_TRANSPARENT:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			retv = -EPERM;
 			break;
 		}
@@ -381,7 +381,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 		/* hop-by-hop / destination options are privileged option */
 		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !capable(CAP_NET_RAW))
+		if (optname != IPV6_RTHDR &&
+		    !ns_capable(net->user_ns, CAP_NET_RAW))
 			break;
 
 		opt = ipv6_renew_options(sk, np->opt, optname,
@@ -725,7 +726,7 @@ done:
 	case IPV6_IPSEC_POLICY:
 	case IPV6_XFRM_POLICY:
 		retv = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		retv = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 94874b0..7fce7d8 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1869,7 +1869,7 @@ compat_do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1984,7 +1984,7 @@ compat_do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2006,7 +2006,7 @@ do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2031,7 +2031,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0ef1f08..1b5c6df 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1915,7 +1915,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch(cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = copy_from_user(&rtmsg, arg,
 				     sizeof(struct in6_rtmsg));
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 1cca576..219e6c1 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -308,7 +308,7 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t,
 	/* For simple GET or for root users,
 	 * we try harder to allocate.
 	 */
-	kp = (cmax <= 1 || capable(CAP_NET_ADMIN)) ?
+	kp = (cmax <= 1 || ns_capable(dev_net(t->dev)->user_ns, CAP_NET_ADMIN)) ?
 		kcalloc(cmax, sizeof(*kp), GFP_KERNEL) :
 		NULL;
 
@@ -926,7 +926,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -985,7 +985,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == sitn->fb_tunnel_dev) {
@@ -1018,7 +1018,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCDELPRL:
 	case SIOCCHGPRL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 		err = -EINVAL;
 		if (dev == sitn->fb_tunnel_dev)
@@ -1047,7 +1047,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCCHG6RD:
 	case SIOCDEL6RD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (8 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 09/14] user ns: convert ipv6 to targeted capabilities Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 11/14] userns: make some net-sysfs capable calls targeted Serge Hallyn
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

The uid/gid comparisons don't have to be pulled out.  This just seemed
more easily proved correct.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/scm.c |   41 ++++++++++++++++++++++++++++++++++-------
 1 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index 4c1ef02..21b5d0b 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -43,17 +43,44 @@
  *	setu(g)id.
  */
 
-static __inline__ int scm_check_creds(struct ucred *creds)
+static __inline__ bool uidequiv(struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (src->uid == tgt->uid || src->euid == tgt->uid ||
+	    src->suid == tgt->uid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETUID))
+		return true;
+	return false;
+}
+
+static __inline__ bool gidequiv(struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (src->gid == tgt->gid || src->egid == tgt->gid ||
+	    src->sgid == tgt->gid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETGID))
+		return true;
+	return false;
+}
+
+static __inline__ int scm_check_creds(struct ucred *creds, struct socket *sock)
 {
 	const struct cred *cred = current_cred();
+	struct user_namespace *ns = sock_net(sock->sk)->user_ns;
 
-	if ((creds->pid == task_tgid_vnr(current) || capable(CAP_SYS_ADMIN)) &&
-	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
-	      creds->uid == cred->suid) || capable(CAP_SETUID)) &&
-	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
-	      creds->gid == cred->sgid) || capable(CAP_SETGID))) {
+	if ((creds->pid == task_tgid_vnr(current) || ns_capable(ns, CAP_SYS_ADMIN)) &&
+	     uidequiv(cred, creds, ns) && gidequiv(cred, creds, ns)) {
 	       return 0;
 	}
+
 	return -EPERM;
 }
 
@@ -169,7 +196,7 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct ucred)))
 				goto error;
 			memcpy(&p->creds, CMSG_DATA(cmsg), sizeof(struct ucred));
-			err = scm_check_creds(&p->creds);
+			err = scm_check_creds(&p->creds, sock);
 			if (err)
 				goto error;
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 11/14] userns: make some net-sysfs capable calls targeted
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (9 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 12/14] user_ns: target af_key capability check Serge Hallyn
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

Changelog: jul 1: fix compilation errors (net_device != net)

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/net-sysfs.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 33d2a1f..b651baf 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -76,7 +76,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 	unsigned long new;
 	int ret = -EINVAL;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(net)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	new = simple_strtoul(buf, &endp, 0);
@@ -262,7 +262,7 @@ static ssize_t store_ifalias(struct device *dev, struct device_attribute *attr,
 	size_t count = len;
 	ssize_t ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(netdev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* ignore trailing newline */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 12/14] user_ns: target af_key capability check
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (10 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 11/14] userns: make some net-sysfs capable calls targeted Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 13/14] userns: net: make many network capable calls targeted Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 14/14] net: pass user_ns to cap_netlink_recv() Serge Hallyn
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

This presumes that it really is complete wrt network namespaces.  Looking
at the code it appears to be.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/key/af_key.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 8f92cf8..414f409 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -141,7 +141,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 13/14] userns: net: make many network capable calls targeted
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (11 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 12/14] user_ns: target af_key capability check Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  2011-07-12 23:30 ` [RFC PATCH 14/14] net: pass user_ns to cap_netlink_recv() Serge Hallyn
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

When privilege is protected a namespaced network resource, then having
the required privilege targed toward the user namespace which owns the
resource suffices.

As with other patches, a big concern here is that we be cleanly separating
the cases where privilege protects a network resource from cases where
privilege can lead to laxer constraints on input and, subsequently,
the ability to corrupt, crash, or own the host kernel.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/8021q/vlan.c                  |   12 ++++++------
 net/bridge/br_ioctl.c             |   22 +++++++++++-----------
 net/bridge/br_sysfs_br.c          |    8 ++++----
 net/bridge/br_sysfs_if.c          |    2 +-
 net/bridge/netfilter/ebtables.c   |    8 ++++----
 net/core/ethtool.c                |    2 +-
 net/ipv4/arp.c                    |    2 +-
 net/ipv4/devinet.c                |    4 ++--
 net/ipv4/fib_frontend.c           |    2 +-
 net/ipv4/ip_options.c             |    6 +++---
 net/ipv4/ip_sockglue.c            |    4 ++--
 net/ipv4/ipip.c                   |    4 ++--
 net/ipv4/ipmr.c                   |    2 +-
 net/ipv4/netfilter/arp_tables.c   |    8 ++++----
 net/ipv4/netfilter/ip_tables.c    |    8 ++++----
 net/netfilter/ipset/ip_set_core.c |    2 +-
 net/netfilter/ipvs/ip_vs_ctl.c    |    4 ++--
 net/packet/af_packet.c            |    2 +-
 18 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 917ecb9..46a4801 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -561,7 +561,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 	switch (args.cmd) {
 	case SET_VLAN_INGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		vlan_dev_set_ingress_priority(dev,
 					      args.u.skb_priority,
@@ -571,7 +571,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_EGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_set_egress_priority(dev,
 						   args.u.skb_priority,
@@ -580,7 +580,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_FLAG_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_change_flags(dev,
 					    args.vlan_qos ? args.u.flag : 0,
@@ -589,7 +589,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_NAME_TYPE_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		if ((args.u.name_type >= 0) &&
 		    (args.u.name_type < VLAN_NAME_TYPE_HIGHEST)) {
@@ -605,14 +605,14 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case ADD_VLAN_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = register_vlan_device(dev, args.u.VID);
 		break;
 
 	case DEL_VLAN_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		unregister_vlan_dev(dev, NULL);
 		err = 0;
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index 7222fe1..c82f9cb 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -88,7 +88,7 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
 	struct net_device *dev;
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	dev = __dev_get_by_index(dev_net(br->dev), ifindex);
@@ -178,25 +178,25 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_FORWARD_DELAY:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		return br_set_forward_delay(br, args[1]);
 
 	case BRCTL_SET_BRIDGE_HELLO_TIME:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		return br_set_hello_time(br, args[1]);
 
 	case BRCTL_SET_BRIDGE_MAX_AGE:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		return br_set_max_age(br, args[1]);
 
 	case BRCTL_SET_AGEING_TIME:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br->ageing_time = clock_t_to_jiffies(args[1]);
@@ -236,14 +236,14 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	}
 
 	case BRCTL_SET_BRIDGE_STP_STATE:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		br_stp_set_enabled(br, args[1]);
 		return 0;
 
 	case BRCTL_SET_BRIDGE_PRIORITY:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -256,7 +256,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		struct net_bridge_port *p;
 		int ret;
 
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -273,7 +273,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 		struct net_bridge_port *p;
 		int ret;
 
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		spin_lock_bh(&br->lock);
@@ -330,7 +330,7 @@ static int old_deviceless(struct net *net, void __user *uarg)
 	{
 		char buf[IFNAMSIZ];
 
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, (void __user *)args[1], IFNAMSIZ))
@@ -360,7 +360,7 @@ int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *uar
 	{
 		char buf[IFNAMSIZ];
 
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(buf, uarg, IFNAMSIZ))
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 68b893e..7f4fa3a 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -36,7 +36,7 @@ static ssize_t store_bridge_parm(struct device *d,
 	unsigned long val;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
@@ -132,7 +132,7 @@ static ssize_t store_stp_state(struct device *d,
 	char *endp;
 	unsigned long val;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
@@ -267,7 +267,7 @@ static ssize_t store_group_addr(struct device *d,
 	unsigned new_addr[6];
 	int i;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (sscanf(buf, "%x:%x:%x:%x:%x:%x",
@@ -304,7 +304,7 @@ static ssize_t store_flush(struct device *d,
 {
 	struct net_bridge *br = to_bridge(d);
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	br_fdb_flush(br);
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 6229b62..9cb4d2e 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -209,7 +209,7 @@ static ssize_t brport_store(struct kobject * kobj,
 	char *endp;
 	unsigned long val;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(p->br->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	val = simple_strtoul(buf, &endp, 0);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 2b5ca1a..c403c45 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1462,7 +1462,7 @@ static int do_ebt_set_ctl(struct sock *sk,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch(cmd) {
@@ -1484,7 +1484,7 @@ static int do_ebt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct ebt_replace tmp;
 	struct ebt_table *t;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&tmp, user, sizeof(tmp)))
@@ -2275,7 +2275,7 @@ static int compat_do_ebt_set_ctl(struct sock *sk,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2298,7 +2298,7 @@ static int compat_do_ebt_get_ctl(struct sock *sk, int cmd,
 	struct compat_ebt_replace tmp;
 	struct ebt_table *t;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* try real handler in case userland supplied needed padding */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index fd14116..4ee227b 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1980,7 +1980,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GFEATURES:
 		break;
 	default:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	}
 
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 1b74d3b..b67c075 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1208,7 +1208,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCDARP:
 	case SIOCSARP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	case SIOCGARP:
 		err = copy_from_user(&r, arg, sizeof(struct arpreq));
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 0d4a184..42d93a4 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -728,7 +728,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 
 	case SIOCSIFFLAGS:
 		ret = -EACCES;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto out;
 		break;
 	case SIOCSIFADDR:	/* Set interface address (and family) */
@@ -736,7 +736,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCSIFDSTADDR:	/* Set the destination address */
 	case SIOCSIFNETMASK: 	/* Set the netmask for the interface */
 		ret = -EACCES;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto out;
 		ret = -EINVAL;
 		if (sin->sin_family != AF_INET)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 2252471..1f30cf8 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -437,7 +437,7 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(&rt, arg, sizeof(rt)))
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index ec93335..21df700 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -396,7 +396,7 @@ int ip_options_compile(struct net *net,
 					optptr[2] += 8;
 					break;
 				      default:
-					if (!skb && !capable(CAP_NET_RAW)) {
+					if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
 						pp_ptr = optptr + 3;
 						goto error;
 					}
@@ -432,7 +432,7 @@ int ip_options_compile(struct net *net,
 				opt->router_alert = optptr - iph;
 			break;
 		      case IPOPT_CIPSO:
-			if ((!skb && !capable(CAP_NET_RAW)) || opt->cipso) {
+			if ((!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) || opt->cipso) {
 				pp_ptr = optptr;
 				goto error;
 			}
@@ -445,7 +445,7 @@ int ip_options_compile(struct net *net,
 		      case IPOPT_SEC:
 		      case IPOPT_SID:
 		      default:
-			if (!skb && !capable(CAP_NET_RAW)) {
+			if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
 				pp_ptr = optptr;
 				goto error;
 			}
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ab0c9ef..972c65f 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -955,13 +955,13 @@ mc_msf_out:
 	case IP_IPSEC_POLICY:
 	case IP_XFRM_POLICY:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 			break;
 		err = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
 
 	case IP_TRANSPARENT:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 378b20b..6725832 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -629,7 +629,7 @@ ipip_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -689,7 +689,7 @@ ipip_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ipn->fb_tunnel_dev) {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 30a7763..4e22804 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1204,7 +1204,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 
 	if (optname != MRT_INIT) {
 		if (sk != rcu_dereference_raw(mrt->mroute_sk) &&
-		    !capable(CAP_NET_ADMIN))
+		    !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index fd7a3f6..acc908f 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1534,7 +1534,7 @@ static int compat_do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1678,7 +1678,7 @@ static int compat_do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1699,7 +1699,7 @@ static int do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1723,7 +1723,7 @@ static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 24e556e..72f2cde 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1847,7 +1847,7 @@ compat_do_ipt_set_ctl(struct sock *sk,	int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1962,7 +1962,7 @@ compat_do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1984,7 +1984,7 @@ do_ipt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2009,7 +2009,7 @@ do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 42aa64b..16933b2 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1568,7 +1568,7 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len)
 	void *data;
 	int copylen = *len, ret = 0;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (optval != SO_IP_SET)
 		return -EBADF;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 699c79a..65395a1 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2284,7 +2284,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	struct ip_vs_dest_user *udest_compat;
 	struct ip_vs_dest_user_kern udest;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX)
@@ -2566,7 +2566,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	BUG_ON(!net);
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c0c3cda..ed798b0 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1496,7 +1496,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	__be16 proto = (__force __be16)protocol; /* weird, but documented */
 	int err;
 
-	if (!capable(CAP_NET_RAW))
+	if (!ns_capable(net->user_ns, CAP_NET_RAW))
 		return -EPERM;
 	if (sock->type != SOCK_DGRAM && sock->type != SOCK_RAW &&
 	    sock->type != SOCK_PACKET)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 14/14] net: pass user_ns to cap_netlink_recv()
  2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
                   ` (12 preceding siblings ...)
  2011-07-12 23:30 ` [RFC PATCH 13/14] userns: net: make many network capable calls targeted Serge Hallyn
@ 2011-07-12 23:30 ` Serge Hallyn
  13 siblings, 0 replies; 20+ messages in thread
From: Serge Hallyn @ 2011-07-12 23:30 UTC (permalink / raw)
  To: linux-kernel, containers; +Cc: dhowells, ebiederm, Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

and make cap_netlink_recv() userns-aware

cap_netlink_recv() was granting privilege if a capability is in
current_cap(), regardless of the user namespace.  Fix that by
targeting the capability check against the user namespace which
owns the skb.

Because sock_net is static inline defined in net/sock.h, which we
don't want to #include at the cap_netlink_recv function (commoncap.h).

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/scsi/scsi_netlink.c     |    3 ++-
 include/linux/security.h        |   14 +++++++++-----
 kernel/audit.c                  |    6 ++++--
 net/core/rtnetlink.c            |    3 ++-
 net/decnet/netfilter/dn_rtmsg.c |    3 ++-
 net/ipv4/netfilter/ip_queue.c   |    3 ++-
 net/ipv6/netfilter/ip6_queue.c  |    3 ++-
 net/netfilter/nfnetlink.c       |    2 +-
 net/netlink/genetlink.c         |    2 +-
 net/xfrm/xfrm_user.c            |    2 +-
 security/commoncap.c            |    6 ++----
 security/security.c             |    4 ++--
 security/selinux/hooks.c        |    5 +++--
 13 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 26a8a45..0aa2e57 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -111,7 +111,8 @@ scsi_nl_rcv_msg(struct sk_buff *skb)
 			goto next_msg;
 		}
 
-		if (security_netlink_recv(skb, CAP_SYS_ADMIN)) {
+		if (security_netlink_recv(skb, CAP_SYS_ADMIN,
+					  sock_net(skb->sk)->user_ns)) {
 			err = -EPERM;
 			goto next_msg;
 		}
diff --git a/include/linux/security.h b/include/linux/security.h
index 8ce59ef..2d0a6c5 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -95,7 +95,8 @@ struct xfrm_user_sec_ctx;
 struct seq_file;
 
 extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb);
-extern int cap_netlink_recv(struct sk_buff *skb, int cap);
+extern int cap_netlink_recv(struct sk_buff *skb, int cap,
+			    struct user_namespace *ns);
 
 void reset_security_ops(void);
 
@@ -797,6 +798,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	@skb.
  *	@skb contains the sk_buff structure for the netlink message.
  *	@cap indicates the capability required
+ *	@ns is the user namespace which owns skb
  *	Return 0 if permission is granted.
  *
  * Security hooks for Unix domain networking.
@@ -1557,7 +1559,8 @@ struct security_operations {
 			  struct sembuf *sops, unsigned nsops, int alter);
 
 	int (*netlink_send) (struct sock *sk, struct sk_buff *skb);
-	int (*netlink_recv) (struct sk_buff *skb, int cap);
+	int (*netlink_recv) (struct sk_buff *skb, int cap,
+			     struct user_namespace *ns);
 
 	void (*d_instantiate) (struct dentry *dentry, struct inode *inode);
 
@@ -1807,7 +1810,7 @@ void security_d_instantiate(struct dentry *dentry, struct inode *inode);
 int security_getprocattr(struct task_struct *p, char *name, char **value);
 int security_setprocattr(struct task_struct *p, char *name, void *value, size_t size);
 int security_netlink_send(struct sock *sk, struct sk_buff *skb);
-int security_netlink_recv(struct sk_buff *skb, int cap);
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns);
 int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
 int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void security_release_secctx(char *secdata, u32 seclen);
@@ -2505,9 +2508,10 @@ static inline int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return cap_netlink_send(sk, skb);
 }
 
-static inline int security_netlink_recv(struct sk_buff *skb, int cap)
+static inline int security_netlink_recv(struct sk_buff *skb, int cap,
+					struct user_namespace *ns)
 {
-	return cap_netlink_recv(skb, cap);
+	return cap_netlink_recv(skb, cap, ns);
 }
 
 static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
diff --git a/kernel/audit.c b/kernel/audit.c
index 9395003..abe5e66 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -598,13 +598,15 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
 	case AUDIT_TTY_SET:
 	case AUDIT_TRIM:
 	case AUDIT_MAKE_EQUIV:
-		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL))
+		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	case AUDIT_USER:
 	case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
 	case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
-		if (security_netlink_recv(skb, CAP_AUDIT_WRITE))
+		if (security_netlink_recv(skb, CAP_AUDIT_WRITE,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	default:  /* bad msg */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index abd936d..889ea93 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1896,7 +1896,8 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	sz_idx = type>>2;
 	kind = type&3;
 
-	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN,
+					       net->user_ns))
 		return -EPERM;
 
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 64a7f39..d31c57b 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -108,7 +108,8 @@ static inline void dnrmg_receive_user_skb(struct sk_buff *skb)
 	if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+	    sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	/* Eventually we might send routing messages too */
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index 5c9b9d9..51d7c52 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -432,7 +432,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index 2493948..8206bf3 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -433,7 +433,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index b4a4532..911054b 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -130,7 +130,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	const struct nfnetlink_subsystem *ss;
 	int type, err;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	/* All the messages must at least contain nfgenmsg */
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 1781d99..348f4c9 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -516,7 +516,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return -EOPNOTSUPP;
 
 	if ((ops->flags & GENL_ADMIN_PERM) &&
-	    security_netlink_recv(skb, CAP_NET_ADMIN))
+	    security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index c658cb3..0750bb9 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2290,7 +2290,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	link = &xfrm_dispatch[type];
 
 	/* All operations require privileges, even GET */
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||
diff --git a/security/commoncap.c b/security/commoncap.c
index a93b3b7..1e48e6a 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -56,11 +56,9 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-int cap_netlink_recv(struct sk_buff *skb, int cap)
+int cap_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	if (!cap_raised(current_cap(), cap))
-		return -EPERM;
-	return 0;
+	return security_capable(ns, current_cred(), cap);
 }
 EXPORT_SYMBOL(cap_netlink_recv);
 
diff --git a/security/security.c b/security/security.c
index 4ba6d4c..813c75b 100644
--- a/security/security.c
+++ b/security/security.c
@@ -948,9 +948,9 @@ int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return security_ops->netlink_send(sk, skb);
 }
 
-int security_netlink_recv(struct sk_buff *skb, int cap)
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	return security_ops->netlink_recv(skb, cap);
+	return security_ops->netlink_recv(skb, cap, ns);
 }
 EXPORT_SYMBOL(security_netlink_recv);
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 20219ef..6c3a747 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4722,13 +4722,14 @@ static int selinux_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return selinux_nlmsg_perm(sk, skb);
 }
 
-static int selinux_netlink_recv(struct sk_buff *skb, int capability)
+static int selinux_netlink_recv(struct sk_buff *skb, int capability,
+				struct user_namespace *ns)
 {
 	int err;
 	struct common_audit_data ad;
 	u32 sid;
 
-	err = cap_netlink_recv(skb, capability);
+	err = cap_netlink_recv(skb, capability, ns);
 	if (err)
 		return err;
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 08/14] af_netlink.c: make netlink_capable userns-aware
  2011-07-12 23:30 ` [RFC PATCH 08/14] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
@ 2011-07-13  1:33   ` Eric Dumazet
  2011-07-13  2:02     ` Serge E. Hallyn
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2011-07-13  1:33 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, containers, dhowells, ebiederm, Serge E. Hallyn

Le mardi 12 juillet 2011 à 23:30 +0000, Serge Hallyn a écrit :
> From: Serge E. Hallyn <serge.hallyn@canonical.com>
> 
> netlink_capable should check for permissions against the user
> namespace owning the socket in question.
> 
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  net/netlink/af_netlink.c |   11 +++++++++--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 6ef64ad..81c1099 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -580,8 +580,15 @@ retry:
>  
>  static inline int netlink_capable(struct socket *sock, unsigned int flag)
>  {
> -	return (nl_table[sock->sk->sk_protocol].nl_nonroot & flag) ||
> -	       capable(CAP_NET_ADMIN);
> +	struct net *net;
> +	if (nl_table[sock->sk->sk_protocol].nl_nonroot & flag)
> +		return 1;
> +#ifdef CONFIG_NET_NS
> +	net = sock->sk->sk_net;
> +#else
> +	net = &init_net;
> +#endif

This is really ugly, please use :

	net = sock_net(sk);

And no more #ifdef




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 08/14] af_netlink.c: make netlink_capable userns-aware
  2011-07-13  1:33   ` Eric Dumazet
@ 2011-07-13  2:02     ` Serge E. Hallyn
  0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2011-07-13  2:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, containers, dhowells, ebiederm, Serge E. Hallyn

Quoting Eric Dumazet (eric.dumazet@gmail.com):
> Le mardi 12 juillet 2011 à 23:30 +0000, Serge Hallyn a écrit :
> > From: Serge E. Hallyn <serge.hallyn@canonical.com>
> > 
> > netlink_capable should check for permissions against the user
> > namespace owning the socket in question.
> > 
> > Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > ---
> >  net/netlink/af_netlink.c |   11 +++++++++--
> >  1 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> > index 6ef64ad..81c1099 100644
> > --- a/net/netlink/af_netlink.c
> > +++ b/net/netlink/af_netlink.c
> > @@ -580,8 +580,15 @@ retry:
> >  
> >  static inline int netlink_capable(struct socket *sock, unsigned int flag)
> >  {
> > -	return (nl_table[sock->sk->sk_protocol].nl_nonroot & flag) ||
> > -	       capable(CAP_NET_ADMIN);
> > +	struct net *net;
> > +	if (nl_table[sock->sk->sk_protocol].nl_nonroot & flag)
> > +		return 1;
> > +#ifdef CONFIG_NET_NS
> > +	net = sock->sk->sk_net;
> > +#else
> > +	net = &init_net;
> > +#endif
> 
> This is really ugly, please use :
> 
> 	net = sock_net(sk);
> 
> And no more #ifdef

thanks, will do!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 01/14] add Documentation/namespaces/user_namespace.txt
  2011-07-12 23:30 ` [RFC PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
@ 2011-07-13 12:45   ` David Howells
  2011-07-14  2:37     ` Serge E. Hallyn
  0 siblings, 1 reply; 20+ messages in thread
From: David Howells @ 2011-07-13 12:45 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: dhowells, linux-kernel, containers, ebiederm, Serge E. Hallyn

Serge Hallyn <serge@hallyn.com> wrote:

> +Traditionally, each task is owned by a userid (uid) and belongs to one

"userid" -> "user ID" perhaps.

> +or more groups (gid).  Both are simple numeric ids, though userspace

Expanding the wrap column to 79 chars would reduce the number of lines, though
I grant for this text you will get a more ragged margin.

"ids" -> "IDs"?

> +... The user namespace allows tasks to
> +have different views of the uids and gids associated with tasks and
> +other resources.

How does this relate to UIDs/GIDs stored on disk?

> +The user namespace is a simple heirarchical one.  The system begins

Or even a "hierarchical" one.

I'd recommend "starts" rather than "begins".

> +To do so, the creating task needs the CAP_SETUID, CAP_SETGID, and
> +CAP_CHOWN capabilities, but does not need to be root.

How about "This requires the creating task to have the ... but it does not
need to be running as root."?

> ... The clone(2) call will result in a new task which to the creator
> appears to have the same credentials as itself, but which sees itself as
> being uid and gid 0.

How about "The clone(2) call will result in a new task which to itself appears
to be running as uid and gid 0, but to its creator seems to have the creator's
credentials."

> ... Any task in or resource belonging to the initial user
> namespace will, to this new task, appear to belong to uid and gid
> -1, which is usually known as 'nobody'.

I think ", which is usually" should probably be " - which is usually".

> ... Opening such files will result in obtaining the 'user other'
> permissions.

How about "Permission to open such files will be granted according to the
'user other' permissions."?

Do you mean 'user other' or just 'other'?

> ... UID comparisons will return false, and privilege will be denied.

UID and GID both?

You should probably be consistent about using all 'UID/GID' or all 'uid/gid'.
I prefer the former as it's an acronym, but that's up to you.

> When a task belonging to userid 500 in the initial user namespace

Is 500 special?  Or is this just a worked example?

> creates a new user namespace, even though the new task will see itself
> as belonging to uid 0, any task in the initial user namespace
> will see it as belonging to uid 500.  Therefore, uid 500 in the
> initial user namespace will be able to kill the new task.  Files
> created by the new user will (eventually) be seen by tasks in its
> own user namespace as belonging to uid 0, but to tasks in the initial
> user namespace as belonging to uid 500.

The next bit should probably be a new paragraph (or possibly inline note).

> Note that this userid
> mapping for the VFS is not yet implemented, though the lkml and
> containers mailing list archives will show several previous prototypes.
> In the end, those got hung up waiting on the concept of targeted
> capabilities to be developed, which, thanks to the insight of Eric
> Biederman, they finally did.

Section heading here?

> Other namespaces, such as UTS and network, are owned by a user
> namespace.

I think this is awkward because you're now overloading the term 'namespace' to
mean two different things.  I wonder if you should hyphenate "user namespace".
Let me think about that.

> ...  When such a namespace is created, it is assigned to the user
> namespace by which it was created.  Therefore, attempts to exercise
> privilege to resources in a network namespace can be properly validated
> by checking whether the caller has the needed privilege targeted to the
> user namespace owning the network namespace.

That's a very convoluted sentence.

> ...  This is called checking
> targeted capabilities, and is done using the 'ns_capable' function.
> 
> As an example, if a new task is cloned with a private user namespace but
> no private network namespace, then the task's network namespace is owned
> by the parent user namespace.  The new task has no privilege to the
> parent user namespace, so it will not be able to create or configure
> network devices.  If, instead, the task were cloned with both private
> user and network namespaces, then the private network namespace is owned
> by the private user namespace, and so root in the new user namespace
> will have privilege targeted to the network namespace.  It will be able
> to create and configure network devices.
> 
> Working notes
> =============
> capable checks for actions related to syslog must be against the
> init_user_ns until syslog is containerized.

Do you mean the 'capable' function?  If so, I recommend you suffix it with
'()'.  Or did you mean 'Capability checks'?

> Same is true for reboot and power, control groups, devices, and time.
> 
> Perf actions (kernel/event/core.c for instance) will always be
> constrained to init_user_ns.
> 
> Q:
> Is accounting considered properly containerized wrt pidns?  (it
> appears to be).  If so, then we can change the capable check in

'capability check' or 'capable() call'?  Anyone reading this ought to know what
capable() does.

> kernel/acct.c to 'ns_capable(current_pid_ns()->user_ns, CAP_PACCT)'
> 
> Q:
> For things like nice and schedaffinity, we could allow root in a
> container to control those, and leave only cgroups to constrain
> the container.  I'm not sure whether that is right, or whether it
> violates admin expectations.
> 
> I punted on some of commoncap.c.  I'm punting on xattr stuff as
> they take dentries, not inodes.

Rather than 'punted' you might want to use 'deferred' if that's what you meant.

> For drivers/tty/tty_io.c and drivers/tty/vt/vt.c, we'll want to (for
> some of them) target at the user_ns owning the tty.  That will have
> to wait until we get userns owning files straightened out.

Target what at the user_ns?

> We need to figure out how to label devices.  Should we just toss a user_ns
> right into struct device?

Would that isolate a device and make it exclusively accessible by that user_ns?

> capable(CAP_MAC_ADMIN) checks are always to be against init_user_ns,
> unless some day LSMs were to be containerized, near zero chance.

'Containerized'?  Yuck:-)

> inode_owner_or_capable() should probably take an optional ns and
> cap paramter.  If cap is 0, then CAP_FOWNER is checked.  If ns is

'parameter'.

> NULL, we derive the ns from inode.  But if ns is provided, then
> callers who need to derive inode_userns(inode) anyway can save a
> few cycles.

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 03/14] keyctl: check capabilities against key's user_ns
  2011-07-12 23:30 ` [RFC PATCH 03/14] keyctl: check capabilities against key's user_ns Serge Hallyn
@ 2011-07-13 16:04   ` David Howells
  0 siblings, 0 replies; 20+ messages in thread
From: David Howells @ 2011-07-13 16:04 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: dhowells, linux-kernel, containers, ebiederm, Serge E. Hallyn

Serge Hallyn <serge@hallyn.com> wrote:

> From: Serge E. Hallyn <serge.hallyn@canonical.com>
> 
> ATM, task should only be able to get his own user_ns's keys
> anyway, so nsown_capable should also work, but there is no
> advantage to doing that, while using key's user_ns is clearer.
> 
> changelog: jun 6:
> 	compile fix: keyctl.c (key_user, not key has user_ns)
> 
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: David Howells <dhowells@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 01/14] add Documentation/namespaces/user_namespace.txt
  2011-07-13 12:45   ` David Howells
@ 2011-07-14  2:37     ` Serge E. Hallyn
  0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2011-07-14  2:37 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, containers, ebiederm, Serge E. Hallyn

Quoting David Howells (dhowells@redhat.com):
> Serge Hallyn <serge@hallyn.com> wrote:

Thanks for the detailed comments, David.

> > +... The user namespace allows tasks to
> > +have different views of the uids and gids associated with tasks and
> > +other resources.
> 
> How does this relate to UIDs/GIDs stored on disk?

The current plan (see 'flexible uid mapping' at
https://wiki.ubuntu.com/UserNamespace) is:

The uid/gid stored on disk will be that in the init_user_ns.  Most
likely uid/gid in other namespaces will be stored in xattrs.  But
Eric was advocating (a few years ago) leaving the details up to
filesystems while providing a lib/ stock implementation.  See the
thread around here
http://www.mail-archive.com/devel@openvz.org/msg09331.html

...

> > ... Opening such files will result in obtaining the 'user other'
> > permissions.
> 
> How about "Permission to open such files will be granted according to the
> 'user other' permissions."?
> 
> Do you mean 'user other' or just 'other'?

'user other'

> > ... UID comparisons will return false, and privilege will be denied.
> 
> UID and GID both?

Right, GID also part of the user namespace.

> You should probably be consistent about using all 'UID/GID' or all 'uid/gid'.
> I prefer the former as it's an acronym, but that's up to you.

ok.

> > When a task belonging to userid 500 in the initial user namespace
> 
> Is 500 special?  Or is this just a worked example?

example.

...

> > Working notes
> > =============
> > capable checks for actions related to syslog must be against the
> > init_user_ns until syslog is containerized.
> 
> Do you mean the 'capable' function?  If so, I recommend you suffix it with
> '()'.  Or did you mean 'Capability checks'?

I meant capability checks.

> > Same is true for reboot and power, control groups, devices, and time.
> > 
> > Perf actions (kernel/event/core.c for instance) will always be
> > constrained to init_user_ns.
> > 
> > Q:
> > Is accounting considered properly containerized wrt pidns?  (it
> > appears to be).  If so, then we can change the capable check in
> 
> 'capability check' or 'capable() call'?  Anyone reading this ought to know what
> capable() does.

Here I meant capable().  I definately see I need to be clearer.

> > kernel/acct.c to 'ns_capable(current_pid_ns()->user_ns, CAP_PACCT)'
> > 
...

> > For drivers/tty/tty_io.c and drivers/tty/vt/vt.c, we'll want to (for
> > some of them) target at the user_ns owning the tty.  That will have
> > to wait until we get userns owning files straightened out.
> 
> Target what at the user_ns?

Target the capability check at the user_ns.

> > We need to figure out how to label devices.  Should we just toss a user_ns
> > right into struct device?
> 
> Would that isolate a device and make it exclusively accessible by that user_ns?

I think so, which is probably too restrictive until a devices namespace
can help us work around it when needed.

Thanks again,

-serge

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-07-14  2:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-12 23:30 [RFC PATCH 0/14] user namespaces: continue targetting capabilities Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 01/14] add Documentation/namespaces/user_namespace.txt Serge Hallyn
2011-07-13 12:45   ` David Howells
2011-07-14  2:37     ` Serge E. Hallyn
2011-07-12 23:30 ` [RFC PATCH 02/14] allow root in container to copy namespaces Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 03/14] keyctl: check capabilities against key's user_ns Serge Hallyn
2011-07-13 16:04   ` David Howells
2011-07-12 23:30 ` [RFC PATCH 04/14] user_ns: convert fs/attr.c to targeted capabilities Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 05/14] userns: clamp down users of cap_raised Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 06/14] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 07/14] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 08/14] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
2011-07-13  1:33   ` Eric Dumazet
2011-07-13  2:02     ` Serge E. Hallyn
2011-07-12 23:30 ` [RFC PATCH 09/14] user ns: convert ipv6 to targeted capabilities Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 10/14] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 11/14] userns: make some net-sysfs capable calls targeted Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 12/14] user_ns: target af_key capability check Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 13/14] userns: net: make many network capable calls targeted Serge Hallyn
2011-07-12 23:30 ` [RFC PATCH 14/14] net: pass user_ns to cap_netlink_recv() Serge Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox