user namespaces: fix some uid/privilege leaks

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* user namespaces: fix some uid/privilege leaks
@ 2011-11-04 22:24 Serge Hallyn
  2011-11-04 22:24 ` [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4) Serge Hallyn
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Serge Hallyn @ 2011-11-04 22:24 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: oleg, richard, akpm, ebiederm, serge, dhowells, eparis

The previous submission of these patches, and review comments, can be
seen in the thread starting here: https://lkml.org/lkml/2011/10/18/463 .
Since then, patches 
0001-pid_ns-ensure-pid-is-not-freed-during-kill_pid_info_.patch
and
0002-user-namespace-usb-make-usb-urbs-user-namespace-awar.patch
have gone upstream, and I've reverted
0009-make-net-core-scm.c-uid-comparisons-user-namespace-a.patch
because it relaxes checks, and right now we want to focus on
fixing leaks.

The set includes:

0001-user-namespace-make-signal.c-respect-user-namespaces.patch
	This convers the uid for the task sending a signal to the
	user namespace of the receiver.  It is somewhat analogous
	to what is done with the sender's pid.
	Waiting on feedback from Oleg, but I believe this patch is
	ready.

0002-User-namespace-don-t-allow-sysctl-in-non-init-user-n.patch
	This prevents root in a child user namespace from man-handling
	sysctls.  With this patch, a task in a child user namespace
	will only get the world access rights to sysctls.

0003-user-namespace-clamp-down-users-of-cap_raised.patch
	This clamps down on cases where privilege to your own user
	namespace were checked for access to the initial user namespace.

0004-Add-Documentation-namespaces-user_namespace.txt-v3.patch
	Documentation.

0005-user-namespace-make-each-net-net_ns-belong-to-a-user.patch
	This adds a struct user_namespace pointer to the net_ns for use
	by later patches.

0006-protect-cap_netlink_recv-from-user-namespaces.patch
	Now that net_ns is owned by a user_ns, cap_netlink_recv() can
	target privilege checks to the user_ns owning the resource.  The
	current check against current_cap() is unsafe.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4)
  2011-11-04 22:24 user namespaces: fix some uid/privilege leaks Serge Hallyn
@ 2011-11-04 22:24 ` Serge Hallyn
  2011-11-09  0:22   ` Andrew Morton
  2011-11-04 22:24 ` [PATCH 2/6] User namespace: don't allow sysctl in non-init user ns (v2) Serge Hallyn
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Serge Hallyn @ 2011-11-04 22:24 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: oleg, richard, akpm, ebiederm, serge, dhowells, eparis,
	Serge Hallyn

From: Serge Hallyn <serge.hallyn@canonical.com>

ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
user namespace. (new, thanks Oleg)

__send_signal: convert current's uid to the recipient's user namespace
for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
again :)

do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
user namespace

ptrace_signal maps parent's uid into current's user namespace before
including in signal to current.  IIUC Oleg has argued that this shouldn't
matter as the debugger will play with it, but it seems like not converting
the value currently being set is misleading.

Changelog:
Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
	simplify callers and help make clear what we are translating
        (which uid into which namespace).  Passing the target task would
	make callers even easier to read, but we pass in user_ns because
	current_user_ns() != task_cred_xxx(current, user_ns).
Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
	in ptrace_signal().
Sep 23: In send_signal(), detect when (user) signal is coming from an
	ancestor or unrelated user namespace.  Pass that on to __send_signal,
	which sets si_uid to 0 or overflowuid if needed.
Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
	SI_FROMKERNEL cases at callers, because we can't assume sender is
	current in those cases.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 ipc/mqueue.c    |    7 ++++++-
 kernel/signal.c |   38 +++++++++++++++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 2e0ecfc..4dbf3cb 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -32,6 +32,7 @@
 #include <linux/nsproxy.h>
 #include <linux/pid.h>
 #include <linux/ipc_namespace.h>
+#include <linux/user_namespace.h>
 #include <linux/slab.h>
 
 #include <net/sock.h>
@@ -543,9 +544,13 @@ static void __do_notify(struct mqueue_inode_info *info)
 			sig_i.si_errno = 0;
 			sig_i.si_code = SI_MESGQ;
 			sig_i.si_value = info->notify.sigev_value;
+			/* map current pid/uid into info->owner's namespaces */
+			rcu_read_lock();
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = current_uid();
+			sig_i.si_uid = user_ns_map_uid(info->user->user_ns,
+						current_cred(), current_uid());
+			rcu_read_lock();
 
 			kill_pid_info(info->notify.sigev_signo,
 				      &sig_i, info->notify_owner);
diff --git a/kernel/signal.c b/kernel/signal.c
index d252be2..6a6c2f4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -28,6 +28,7 @@
 #include <linux/freezer.h>
 #include <linux/pid_namespace.h>
 #include <linux/nsproxy.h>
+#include <linux/user_namespace.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/signal.h>
 
@@ -1019,6 +1020,29 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
+/*
+ * map the uid in struct cred into user namespace *ns
+ */
+static inline uid_t map_cred_ns(const struct cred *cred,
+				struct user_namespace *ns)
+{
+	return user_ns_map_uid(ns, cred, cred->uid);
+}
+
+static inline void fixup_uid(struct siginfo *info, struct task_struct *t)
+{
+#ifdef CONFIG_USER_NS
+	if (current_user_ns() == task_cred_xxx(t, user_ns))
+#endif
+		return;
+
+	if (SI_FROMKERNEL(info))
+		return;
+
+	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
+					current_cred(), info->si_uid);
+}
+
 static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			int group, int from_ancestor_ns)
 {
@@ -1088,6 +1112,9 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 				q->info.si_pid = 0;
 			break;
 		}
+
+		fixup_uid(info, t);
+
 	} else if (!is_si_special(info)) {
 		if (sig >= SIGRTMIN && info->si_code != SI_USER) {
 			/*
@@ -1626,7 +1653,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
-	info.si_uid = __task_cred(tsk)->uid;
+	info.si_uid = map_cred_ns(__task_cred(tsk),
+			task_cred_xxx(tsk->parent, user_ns));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
@@ -1711,7 +1739,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
-	info.si_uid = __task_cred(tsk)->uid;
+	info.si_uid = map_cred_ns(__task_cred(tsk),
+			task_cred_xxx(parent, user_ns));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime);
@@ -2129,8 +2158,11 @@ static int ptrace_signal(int signr, siginfo_t *info,
 		info->si_signo = signr;
 		info->si_errno = 0;
 		info->si_code = SI_USER;
+		rcu_read_lock();
 		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = task_uid(current->parent);
+		info->si_uid = map_cred_ns(__task_cred(current->parent),
+				current_user_ns());
+		rcu_read_unlock();
 	}
 
 	/* If the (new) signal is now blocked, requeue it.  */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] User namespace: don't allow sysctl in non-init user ns (v2)
  2011-11-04 22:24 user namespaces: fix some uid/privilege leaks Serge Hallyn
  2011-11-04 22:24 ` [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4) Serge Hallyn
@ 2011-11-04 22:24 ` Serge Hallyn
  2011-11-04 22:24 ` [PATCH 3/6] user namespace: clamp down users of cap_raised Serge Hallyn
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Serge Hallyn @ 2011-11-04 22:24 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: oleg, richard, akpm, ebiederm, serge, dhowells, eparis,
	Serge Hallyn, Vasiliy Kulikov, Miquel van Smoorenburg

From: Serge Hallyn <serge.hallyn@canonical.com>

sysctl.c has its own custom uid check, which is not user namespace
aware.  As discovered by Richard, that allows root in a container
privileged access to set all sysctls.

To fix that, don't compare uid or groups if current is not in the
initial user namespace.  We may at some point want to relax that check
so that some sysctls are allowed - for instance dmesg_restrict when
syslog is containerized.

Changelog:
Sep 22: As Miquel van Smoorenburg pointed out, rather than always
	refusing access if not in initial user_ns, we should allow
	world access rights to sysctl files.  We just want to prevent
	a task in a non-init user namespace from getting the root user
	or group access rights.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: richard@nod.at
Cc: Miquel van Smoorenburg <mikevs@xs4all.net>
---
 kernel/sysctl.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ae27196..473df41 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1708,10 +1708,12 @@ void register_sysctl_root(struct ctl_table_root *root)
 
 static int test_perm(int mode, int op)
 {
-	if (!current_euid())
-		mode >>= 6;
-	else if (in_egroup_p(0))
-		mode >>= 3;
+	if (current_user_ns() == &init_user_ns) {
+		if (!current_euid())
+			mode >>= 6;
+		else if (in_egroup_p(0))
+			mode >>= 3;
+	}
 	if ((op & ~mode & (MAY_READ|MAY_WRITE|MAY_EXEC)) == 0)
 		return 0;
 	return -EACCES;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] user namespace: clamp down users of cap_raised
  2011-11-04 22:24 user namespaces: fix some uid/privilege leaks Serge Hallyn
  2011-11-04 22:24 ` [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4) Serge Hallyn
  2011-11-04 22:24 ` [PATCH 2/6] User namespace: don't allow sysctl in non-init user ns (v2) Serge Hallyn
@ 2011-11-04 22:24 ` Serge Hallyn
  2011-11-06  1:14   ` Andrew G. Morgan
  2011-11-04 22:24 ` [PATCH 4/6] Add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Serge Hallyn @ 2011-11-04 22:24 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: oleg, richard, akpm, ebiederm, serge, dhowells, eparis,
	Serge E. Hallyn, Andrew Morgan, Vasiliy Kulikov

From: Serge E. Hallyn <serge.hallyn@canonical.com>

A few modules are using cap_raised(current_cap(), cap) to authorize
actions.  This means that tasks which are privileged in non-initial
user namespaces will be deemed privileged.  The privilege should only
be granted if the task is in the initial user namespace.

Switching the calls to capable() would change the behavior - it would
cause the LSM capable hooks to be called, and set PF_SUPERPRIV if
the capability was used.  So instead, put in an explicit check and
refuse privilege if the caller is not in init_user_ns.

Changelog:
Oct 23: Use a nice macro to make the code shorter and easier to read,
	per advice from Andrew Morgan and David Howells.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: David Howells <dhowells@redhat.com>
---
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/staging/pohmelfs/config.c      |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 include/linux/cred.h                   |    2 ++
 5 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index af2a250..b7b19b8 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2297,7 +2297,7 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
 		return;
 	}
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
+	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
 	}
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 1f23e04..126a79b 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -134,7 +134,7 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
 	spin_lock(&receiving_list_lock);
diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c
index b6c42cb..327c047 100644
--- a/drivers/staging/pohmelfs/config.c
+++ b/drivers/staging/pohmelfs/config.c
@@ -525,7 +525,7 @@ static void pohmelfs_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *n
 {
 	int err;
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
 	switch (msg->flags) {
diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 7f8472c..94e0e9d 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -73,7 +73,7 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
 	if (msg->seq >= UVESAFB_TASKS_MAX)
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 4030896..2f75da7 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -359,9 +359,11 @@ static inline void put_cred(const struct cred *_cred)
 
 #ifdef CONFIG_USER_NS
 #define current_user_ns() (current_cred_xxx(user_ns))
+#define IN_ROOT_USER_NS() (current_user_ns() == &init_user_ns)
 #else
 extern struct user_namespace init_user_ns;
 #define current_user_ns() (&init_user_ns)
+#define IN_ROOT_USER_NS() (1)
 #endif
 
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] Add Documentation/namespaces/user_namespace.txt (v3)
  2011-11-04 22:24 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (2 preceding siblings ...)
  2011-11-04 22:24 ` [PATCH 3/6] user namespace: clamp down users of cap_raised Serge Hallyn
@ 2011-11-04 22:24 ` Serge Hallyn
  2011-11-04 22:24 ` [PATCH 5/6] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Serge Hallyn @ 2011-11-04 22:24 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: oleg, richard, akpm, ebiederm, serge, dhowells, eparis,
	Serge E. Hallyn, Randy Dunlap

From: Serge E. Hallyn <serge@hallyn.com>

Provide a description of user namespaces

Changelog:
   jul 26: incorporate feedback from David Howells.
   jul 29: incorporate feedback from Randy Dunlap.
   sep 15: remove information which is not yet certain.
   Oct 26: add changes suggested by David on Oct 19

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
---
 Documentation/namespaces/user_namespace.txt |   54 +++++++++++++++++++++++++++
 1 files changed, 54 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/namespaces/user_namespace.txt

diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
new file mode 100644
index 0000000..805f0ef
--- /dev/null
+++ b/Documentation/namespaces/user_namespace.txt
@@ -0,0 +1,54 @@
+Description
+===========
+
+Traditionally, each task is owned by a user ID (UID) and belongs to one or more
+groups (GID).  Both are simple numeric IDs, though userspace usually translates
+them to names.  The user namespace allows tasks to have different views of the
+UIDs and GIDs associated with tasks and other resources.  (See 'UID mapping'
+below for more.)
+
+The user namespace is a simple hierarchical one.  The system starts with all
+tasks belonging to the initial user namespace.  A task creates a new user
+namespace by passing the CLONE_NEWUSER flag to clone(2).  This requires the
+creating task to have the CAP_SETUID, CAP_SETGID, and CAP_CHOWN capabilities,
+but it does not need to be running as root.  The clone(2) call will result in a
+new task which to itself appears to be running as UID and GID 0, but to its
+creator seems to have the creator's credentials.
+
+To this new task, any resource belonging to the initial user namespace will
+appear to belong to user and group 'nobody', which are UID and GID -1.
+Permission to open such files will be granted according to world access
+permissions.  UID comparisons and group membership checks will fail, and
+privilege will be denied.
+
+When a task belonging to (for example) UID 500 in the initial user namespace
+creates a new user namespace, even though the new task will see itself as
+belonging to UID 0, any task in the initial user namespace will see it as
+belonging to UID 500.  Therefore, UID 500 in the initial user namespace will be
+able to kill the new task.
+
+UID mapping for the VFS is not yet implemented, though prototypes exist.
+
+Relationship between the User namespace and other namespaces
+============================================================
+
+Other namespaces, such as UTS and network, are owned by a user namespace.  When
+such a namespace is created, it is assigned to the user namespace of the task
+by which it was created.  Therefore attempts to exercise privilege over a
+resource can be properly validated by checking for privilege targeted to the
+user namespace which owns the resource.  For instance, a check for the special
+privilege to change a network interface address could be done by checking for
+CAP_NET_ADMIN against the user namespace which created the network namespace
+owning the network interface.
+
+( XXX TODO: add a list of capabilities corresponding to different namespaces)
+
+As an example, if a new task is cloned with a private user namespace but
+not a private network namespace, then the task's network namespace is owned
+by the parent user namespace.  The new task has no special privilege over the
+parent user namespace, so it will not be able to create or configure the
+network devices therein.  If instead the task were cloned with both private
+user and network namespaces, then the private network namespace is owned
+by the private user namespace, and so root in the new user namespace
+will have privilege over resources owned by the network namespace.  It will
+be able to create and configure network devices.
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] user namespace: make each net (net_ns) belong to a user_ns
  2011-11-04 22:24 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (3 preceding siblings ...)
  2011-11-04 22:24 ` [PATCH 4/6] Add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
@ 2011-11-04 22:24 ` Serge Hallyn
  2011-11-04 22:24 ` [PATCH 6/6] protect cap_netlink_recv from user namespaces Serge Hallyn
  2011-11-11  4:13 ` user namespaces: fix some uid/privilege leaks Serge E. Hallyn
  6 siblings, 0 replies; 19+ messages in thread
From: Serge Hallyn @ 2011-11-04 22:24 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: oleg, richard, akpm, ebiederm, serge, dhowells, eparis,
	Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

Changelog:
   jul 8: nsproxy: don't assign netns->userns if not cloning.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/net_namespace.h |    2 ++
 kernel/nsproxy.c            |    2 ++
 net/core/net_namespace.c    |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 3bb6fa0..d91fe5f 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -29,6 +29,7 @@ struct ctl_table_header;
 struct net_generic;
 struct sock;
 struct netns_ipvs;
+struct user_namespace;
 
 
 #define NETDEV_HASHBITS    8
@@ -101,6 +102,7 @@ struct net {
 	struct netns_xfrm	xfrm;
 #endif
 	struct netns_ipvs	*ipvs;
+	struct user_namespace	*user_ns;
 };
 
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 9aeab4b..0d5bf8d 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -95,6 +95,8 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 		err = PTR_ERR(new_nsp->net_ns);
 		goto out_net;
 	}
+	if (flags & CLONE_NEWNET)
+		new_nsp->net_ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 
 	return new_nsp;
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 5bbdbf0..791c19c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -10,6 +10,7 @@
 #include <linux/nsproxy.h>
 #include <linux/proc_fs.h>
 #include <linux/file.h>
+#include <linux/user_namespace.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -209,6 +210,7 @@ static void net_free(struct net *net)
 	}
 #endif
 	kfree(net->gen);
+	put_user_ns(net->user_ns);
 	kmem_cache_free(net_cachep, net);
 }
 
@@ -389,6 +391,7 @@ static int __init net_ns_init(void)
 	rcu_assign_pointer(init_net.gen, ng);
 
 	mutex_lock(&net_mutex);
+	init_net.user_ns = &init_user_ns;
 	if (setup_net(&init_net))
 		panic("Could not setup the initial network namespace");
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] protect cap_netlink_recv from user namespaces
  2011-11-04 22:24 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (4 preceding siblings ...)
  2011-11-04 22:24 ` [PATCH 5/6] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
@ 2011-11-04 22:24 ` Serge Hallyn
  2011-11-07 19:35   ` Eric Paris
  2011-11-11  4:13 ` user namespaces: fix some uid/privilege leaks Serge E. Hallyn
  6 siblings, 1 reply; 19+ messages in thread
From: Serge Hallyn @ 2011-11-04 22:24 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: oleg, richard, akpm, ebiederm, serge, dhowells, eparis,
	Serge E. Hallyn

From: Serge E. Hallyn <serge.hallyn@canonical.com>

cap_netlink_recv() was granting privilege if a capability is in
current_cap(), regardless of the user namespace.  Fix that by
targeting the capability check against the user namespace which
owns the skb.

Caller passes the user ns down because sock_net is static inline defined in
net/sock.h, which we'd rather not #include at the cap_netlink_recv function.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/scsi/scsi_netlink.c     |    3 ++-
 include/linux/security.h        |   14 +++++++++-----
 kernel/audit.c                  |    6 ++++--
 net/core/rtnetlink.c            |    3 ++-
 net/decnet/netfilter/dn_rtmsg.c |    3 ++-
 net/ipv4/netfilter/ip_queue.c   |    3 ++-
 net/ipv6/netfilter/ip6_queue.c  |    3 ++-
 net/netfilter/nfnetlink.c       |    2 +-
 net/netlink/genetlink.c         |    2 +-
 net/xfrm/xfrm_user.c            |    2 +-
 security/commoncap.c            |    6 ++----
 security/security.c             |    4 ++--
 security/selinux/hooks.c        |    5 +++--
 13 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 26a8a45..0aa2e57 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -111,7 +111,8 @@ scsi_nl_rcv_msg(struct sk_buff *skb)
 			goto next_msg;
 		}
 
-		if (security_netlink_recv(skb, CAP_SYS_ADMIN)) {
+		if (security_netlink_recv(skb, CAP_SYS_ADMIN,
+					  sock_net(skb->sk)->user_ns)) {
 			err = -EPERM;
 			goto next_msg;
 		}
diff --git a/include/linux/security.h b/include/linux/security.h
index 19d8e04..ac37e1c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -96,7 +96,8 @@ struct xfrm_user_sec_ctx;
 struct seq_file;
 
 extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb);
-extern int cap_netlink_recv(struct sk_buff *skb, int cap);
+extern int cap_netlink_recv(struct sk_buff *skb, int cap,
+			    struct user_namespace *ns);
 
 void reset_security_ops(void);
 
@@ -802,6 +803,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	@skb.
  *	@skb contains the sk_buff structure for the netlink message.
  *	@cap indicates the capability required
+ *	@ns is the user namespace which owns skb
  *	Return 0 if permission is granted.
  *
  * Security hooks for Unix domain networking.
@@ -1562,7 +1564,8 @@ struct security_operations {
 			  struct sembuf *sops, unsigned nsops, int alter);
 
 	int (*netlink_send) (struct sock *sk, struct sk_buff *skb);
-	int (*netlink_recv) (struct sk_buff *skb, int cap);
+	int (*netlink_recv) (struct sk_buff *skb, int cap,
+			     struct user_namespace *ns);
 
 	void (*d_instantiate) (struct dentry *dentry, struct inode *inode);
 
@@ -1816,7 +1819,7 @@ void security_d_instantiate(struct dentry *dentry, struct inode *inode);
 int security_getprocattr(struct task_struct *p, char *name, char **value);
 int security_setprocattr(struct task_struct *p, char *name, void *value, size_t size);
 int security_netlink_send(struct sock *sk, struct sk_buff *skb);
-int security_netlink_recv(struct sk_buff *skb, int cap);
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns);
 int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
 int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void security_release_secctx(char *secdata, u32 seclen);
@@ -2516,9 +2519,10 @@ static inline int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return cap_netlink_send(sk, skb);
 }
 
-static inline int security_netlink_recv(struct sk_buff *skb, int cap)
+static inline int security_netlink_recv(struct sk_buff *skb, int cap,
+					struct user_namespace *ns)
 {
-	return cap_netlink_recv(skb, cap);
+	return cap_netlink_recv(skb, cap, ns);
 }
 
 static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
diff --git a/kernel/audit.c b/kernel/audit.c
index 0a1355c..48144c4 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -601,13 +601,15 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
 	case AUDIT_TTY_SET:
 	case AUDIT_TRIM:
 	case AUDIT_MAKE_EQUIV:
-		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL))
+		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	case AUDIT_USER:
 	case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
 	case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
-		if (security_netlink_recv(skb, CAP_AUDIT_WRITE))
+		if (security_netlink_recv(skb, CAP_AUDIT_WRITE,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	default:  /* bad msg */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9083e82..58c6f02 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1957,7 +1957,8 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	sz_idx = type>>2;
 	kind = type&3;
 
-	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN,
+					       net->user_ns))
 		return -EPERM;
 
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 69975e0..2d052ab 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -108,7 +108,8 @@ static inline void dnrmg_receive_user_skb(struct sk_buff *skb)
 	if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+	    sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	/* Eventually we might send routing messages too */
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index e59aabd..d20bede 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -430,7 +430,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index e63c397..09db01c 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -431,7 +431,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index c879c1a..ea7b382 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -130,7 +130,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	const struct nfnetlink_subsystem *ss;
 	int type, err;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	/* All the messages must at least contain nfgenmsg */
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 482fa57..00a101c 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -516,7 +516,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return -EOPNOTSUPP;
 
 	if ((ops->flags & GENL_ADMIN_PERM) &&
-	    security_netlink_recv(skb, CAP_NET_ADMIN))
+	    security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d0a42df..fb3f309 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2290,7 +2290,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	link = &xfrm_dispatch[type];
 
 	/* All operations require privileges, even GET */
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||
diff --git a/security/commoncap.c b/security/commoncap.c
index ee4f848..e65c681 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -56,11 +56,9 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-int cap_netlink_recv(struct sk_buff *skb, int cap)
+int cap_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	if (!cap_raised(current_cap(), cap))
-		return -EPERM;
-	return 0;
+	return security_capable(ns, current_cred(), cap);
 }
 EXPORT_SYMBOL(cap_netlink_recv);
 
diff --git a/security/security.c b/security/security.c
index 0c6cc69..ac0dfaf 100644
--- a/security/security.c
+++ b/security/security.c
@@ -995,9 +995,9 @@ int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return security_ops->netlink_send(sk, skb);
 }
 
-int security_netlink_recv(struct sk_buff *skb, int cap)
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	return security_ops->netlink_recv(skb, cap);
+	return security_ops->netlink_recv(skb, cap, ns);
 }
 EXPORT_SYMBOL(security_netlink_recv);
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e545b9f..61b8f54 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4716,13 +4716,14 @@ static int selinux_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return selinux_nlmsg_perm(sk, skb);
 }
 
-static int selinux_netlink_recv(struct sk_buff *skb, int capability)
+static int selinux_netlink_recv(struct sk_buff *skb, int capability,
+				struct user_namespace *ns)
 {
 	int err;
 	struct common_audit_data ad;
 	u32 sid;
 
-	err = cap_netlink_recv(skb, capability);
+	err = cap_netlink_recv(skb, capability, ns);
 	if (err)
 		return err;
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] user namespace: clamp down users of cap_raised
  2011-11-04 22:24 ` [PATCH 3/6] user namespace: clamp down users of cap_raised Serge Hallyn
@ 2011-11-06  1:14   ` Andrew G. Morgan
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew G. Morgan @ 2011-11-06  1:14 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, containers, oleg, richard, akpm, ebiederm, dhowells,
	eparis, Serge E. Hallyn, Vasiliy Kulikov

Acked-by: Andrew G. Morgan <morgan@kernel.org>

Cheers

Andrew

On Fri, Nov 4, 2011 at 3:24 PM, Serge Hallyn <serge@hallyn.com> wrote:
> From: Serge E. Hallyn <serge.hallyn@canonical.com>
>
> A few modules are using cap_raised(current_cap(), cap) to authorize
> actions.  This means that tasks which are privileged in non-initial
> user namespaces will be deemed privileged.  The privilege should only
> be granted if the task is in the initial user namespace.
>
> Switching the calls to capable() would change the behavior - it would
> cause the LSM capable hooks to be called, and set PF_SUPERPRIV if
> the capability was used.  So instead, put in an explicit check and
> refuse privilege if the caller is not in init_user_ns.
>
> Changelog:
> Oct 23: Use a nice macro to make the code shorter and easier to read,
>        per advice from Andrew Morgan and David Howells.
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Andrew Morgan <morgan@kernel.org>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> Cc: David Howells <dhowells@redhat.com>
> ---
>  drivers/block/drbd/drbd_nl.c           |    2 +-
>  drivers/md/dm-log-userspace-transfer.c |    2 +-
>  drivers/staging/pohmelfs/config.c      |    2 +-
>  drivers/video/uvesafb.c                |    2 +-
>  include/linux/cred.h                   |    2 ++
>  5 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
> index af2a250..b7b19b8 100644
> --- a/drivers/block/drbd/drbd_nl.c
> +++ b/drivers/block/drbd/drbd_nl.c
> @@ -2297,7 +2297,7 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
>                return;
>        }
>
> -       if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
> +       if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN)) {
>                retcode = ERR_PERM;
>                goto fail;
>        }
> diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
> index 1f23e04..126a79b 100644
> --- a/drivers/md/dm-log-userspace-transfer.c
> +++ b/drivers/md/dm-log-userspace-transfer.c
> @@ -134,7 +134,7 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
>  {
>        struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
>
> -       if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
> +       if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
>        spin_lock(&receiving_list_lock);
> diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c
> index b6c42cb..327c047 100644
> --- a/drivers/staging/pohmelfs/config.c
> +++ b/drivers/staging/pohmelfs/config.c
> @@ -525,7 +525,7 @@ static void pohmelfs_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *n
>  {
>        int err;
>
> -       if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
> +       if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
>        switch (msg->flags) {
> diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
> index 7f8472c..94e0e9d 100644
> --- a/drivers/video/uvesafb.c
> +++ b/drivers/video/uvesafb.c
> @@ -73,7 +73,7 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
>        struct uvesafb_task *utask;
>        struct uvesafb_ktask *task;
>
> -       if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
> +       if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
>        if (msg->seq >= UVESAFB_TASKS_MAX)
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 4030896..2f75da7 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -359,9 +359,11 @@ static inline void put_cred(const struct cred *_cred)
>
>  #ifdef CONFIG_USER_NS
>  #define current_user_ns() (current_cred_xxx(user_ns))
> +#define IN_ROOT_USER_NS() (current_user_ns() == &init_user_ns)
>  #else
>  extern struct user_namespace init_user_ns;
>  #define current_user_ns() (&init_user_ns)
> +#define IN_ROOT_USER_NS() (1)
>  #endif
>
>
> --
> 1.7.0.4
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] protect cap_netlink_recv from user namespaces
  2011-11-04 22:24 ` [PATCH 6/6] protect cap_netlink_recv from user namespaces Serge Hallyn
@ 2011-11-07 19:35   ` Eric Paris
  2011-11-08  3:29     ` Serge E. Hallyn
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Paris @ 2011-11-07 19:35 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, containers, oleg, richard, akpm, ebiederm, dhowells,
	Serge E. Hallyn

On Fri, 2011-11-04 at 22:24 +0000, Serge Hallyn wrote:
> From: Serge E. Hallyn <serge.hallyn@canonical.com>
> 
> cap_netlink_recv() was granting privilege if a capability is in
> current_cap(), regardless of the user namespace.  Fix that by
> targeting the capability check against the user namespace which
> owns the skb.
> 
> Caller passes the user ns down because sock_net is static inline defined in
> net/sock.h, which we'd rather not #include at the cap_netlink_recv function.

This is wrong at least in relation to audit.  I don't know the other
code well enough to know if I think it's ok there.  Lets say I have
(CAP_SYS_ADMIN | CAP_SETUID | CAP_SETGID) and I create a new task with
CLONE_NEWNAME.  This task then immediately does the needful to remove
all audit rules (which supposedly requires CAP_AUDIT_CONTROL).  That's
going to succeed because the task is init in it's namespace, aka:

        /* The creator of the user namespace has all caps. */
        if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
                return 0;

But it just screwed with a global resource.  aka audit.  I don't know
the meaning of these others, but it seems to me probably most or all of
them should be against the init_user_ns, not the namespace the skb came
from....

What am I missing?

-Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] protect cap_netlink_recv from user namespaces
  2011-11-07 19:35   ` Eric Paris
@ 2011-11-08  3:29     ` Serge E. Hallyn
  2011-11-09 14:19       ` Eric Paris
  0 siblings, 1 reply; 19+ messages in thread
From: Serge E. Hallyn @ 2011-11-08  3:29 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, containers, oleg, richard, akpm, ebiederm, dhowells,
	Serge E. Hallyn

Quoting Eric Paris (eparis@redhat.com):
> On Fri, 2011-11-04 at 22:24 +0000, Serge Hallyn wrote:
> > From: Serge E. Hallyn <serge.hallyn@canonical.com>
> > 
> > cap_netlink_recv() was granting privilege if a capability is in
> > current_cap(), regardless of the user namespace.  Fix that by
> > targeting the capability check against the user namespace which
> > owns the skb.
> > 
> > Caller passes the user ns down because sock_net is static inline defined in
> > net/sock.h, which we'd rather not #include at the cap_netlink_recv function.
> 
> This is wrong at least in relation to audit.

You may well be right, but before I proceed, I want to make sure that
you do see the current code is even more wrong?  But I was (as with the other
patch I reverted) probably overzealous and should have switched this a
different way (see bottom).

Regarding audit:  if I create a new network namespace, I can't open a
netlink socket to talk to the kernel.  At least,

	unshare -n /bin/bash
	# strace -f auditctl -l

shows -ECONNREFUSED from the sendto.  So I'm really not sure that the
kernel/audit.c patch is wrong.  (I want to make sure it's understood,
below, that I'm proposing tightening down the checks because that's more
in keeping with the purpose of this patchset, not because I agree the
checks are or will be, in the end, wrong)

>   I don't know the other
> code well enough to know if I think it's ok there.  Lets say I have
> (CAP_SYS_ADMIN | CAP_SETUID | CAP_SETGID) and I create a new task with
> CLONE_NEWNAME.

(just to be sure, do you mean CLONE_NEWUSER?)

>   This task then immediately does the needful to remove
> all audit rules (which supposedly requires CAP_AUDIT_CONTROL).  That's
> going to succeed because the task is init in it's namespace, aka:
> 
>         /* The creator of the user namespace has all caps. */
>         if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
>                 return 0;

That will only happen if the current user created the target user_ns.
So if task p1 created task p2 through clone(CLONE_NEWUSER), then p1 would
pass this on access check to p2's user_ns, but p2 would not.

But that really isn't related to the issue you raise.

> But it just screwed with a global resource.  aka audit.  I don't know
> the meaning of these others, but it seems to me probably most or all of
> them should be against the init_user_ns, not the namespace the skb came
> from....
> 
> What am I missing?

I'm not sure, depends on how much we're misunderstanding each other :)

But, regardless, your point is valid in that I'm not tightening down as
much as I could.  So how about I don't change the security_netlink_recv()
and callers yet, and instead I change cap_netlink_recv() to do:

	if (!IN_ROOT_USER_NS && !cap_raised(current_cap(), cap))

?

thanks,
-serge

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4)
  2011-11-04 22:24 ` [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4) Serge Hallyn
@ 2011-11-09  0:22   ` Andrew Morton
  2011-11-09 14:18     ` Serge E. Hallyn
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2011-11-09  0:22 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, containers, oleg, richard, ebiederm, dhowells,
	eparis, Serge Hallyn

On Fri,  4 Nov 2011 22:24:37 +0000
Serge Hallyn <serge@hallyn.com> wrote:

> +static inline void fixup_uid(struct siginfo *info, struct task_struct *t)
> +{
> +#ifdef CONFIG_USER_NS
> +	if (current_user_ns() == task_cred_xxx(t, user_ns))
> +#endif
> +		return;
> +
> +	if (SI_FROMKERNEL(info))
> +		return;
> +
> +	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> +					current_cred(), info->si_uid);
> +}

err, this function is a no-op if CONFIG_USER_NS=n.  If that was
intentional then why on earth do this in such a weird fashion?  If
unintentional then it makes me wonder how well tested all this was with
CONFIG_USER_NS=n?

I vaguely remember that I've forgotten how all this stuff works.  Some
additional review input would be nice (cough-oleg-cough).


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4)
  2011-11-09  0:22   ` Andrew Morton
@ 2011-11-09 14:18     ` Serge E. Hallyn
  2011-11-10  1:41       ` Matt Helsley
  0 siblings, 1 reply; 19+ messages in thread
From: Serge E. Hallyn @ 2011-11-09 14:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Serge Hallyn, linux-kernel, containers, oleg, richard, ebiederm,
	dhowells, eparis

Quoting Andrew Morton (akpm@linux-foundation.org):
> On Fri,  4 Nov 2011 22:24:37 +0000
> Serge Hallyn <serge@hallyn.com> wrote:
> 
> > +static inline void fixup_uid(struct siginfo *info, struct task_struct *t)
> > +{
> > +#ifdef CONFIG_USER_NS
> > +	if (current_user_ns() == task_cred_xxx(t, user_ns))
> > +#endif
> > +		return;
> > +
> > +	if (SI_FROMKERNEL(info))
> > +		return;
> > +
> > +	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> > +					current_cred(), info->si_uid);
> > +}
> 
> err, this function is a no-op if CONFIG_USER_NS=n.  If that was
> intentional then why on earth do this in such a weird fashion?  If

(I didn't write it, but)  I think the assumption was that the compiler
would optimize it all away if !CONFIG_USER_NS, and in this way the
call site didn't need to be obscured with the #ifdef.  Is there a way
that's better?  Would you prefer something like:

+#ifdef CONFIG_USER_NS
+static inline void fixup_uid(struct siginfo *info, struct task_struct *t)
+{
+	if (current_user_ns() == task_cred_xxx(t, user_ns))
+		return;
+
+	if (SI_FROMKERNEL(info))
+		return;
+
+	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
+					current_cred(), info->si_uid);
+}
+#else
+static inline void fixup_uid(struct siginfo *info, struct task_struct *t)
+{
+		return;
+}
+#endif

?  It's less sneaky at least.

> unintentional then it makes me wonder how well tested all this was with
> CONFIG_USER_NS=n?
> 
> I vaguely remember that I've forgotten how all this stuff works.  Some
> additional review input would be nice (cough-oleg-cough).
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] protect cap_netlink_recv from user namespaces
  2011-11-08  3:29     ` Serge E. Hallyn
@ 2011-11-09 14:19       ` Eric Paris
  2011-11-09 14:44         ` Serge E. Hallyn
  2011-11-19  9:10         ` Eric W. Biederman
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Paris @ 2011-11-09 14:19 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, containers, oleg, richard, akpm, ebiederm, dhowells,
	Serge E. Hallyn

On Tue, 2011-11-08 at 03:29 +0000, Serge E. Hallyn wrote:
> Quoting Eric Paris (eparis@redhat.com):

> But, regardless, your point is valid in that I'm not tightening down as
> much as I could.  So how about I don't change the security_netlink_recv()
> and callers yet, and instead I change cap_netlink_recv() to do:
> 
> 	if (!IN_ROOT_USER_NS && !cap_raised(current_cap(), cap))

Actually better thought.  Remove the LSM hook altogether and just use
capable() in the callers.  This hook, being used this way, was
introduced in c7bdb545 back when we took the effective perms from the
skb.  We don't use the skb any more since netlink is synchronous.  This
is functionally equivalent except the capabilities code checks against
the init_user_ns (something we want) and it will now set PF_PRIV (which
also seems like a good thing) Something like:

security: remove the security_netlink_recv hook as it is equivalent to capable()
    
Once upon a time netlink was not sync and we had to get the effective
capabilities from the skb that was being received.  Today we instead get
the capabilities from the current task.  This has rendered the entire
purpose of the hook moot as it is now functionally equivalent to the
capable() call.
    
Signed-off-by: Eric Paris <eparis@redhat.com>

---

 drivers/scsi/scsi_netlink.c     |    2 +-
 include/linux/security.h        |   14 --------------
 kernel/audit.c                  |    4 ++--
 net/core/rtnetlink.c            |    2 +-
 net/decnet/netfilter/dn_rtmsg.c |    2 +-
 net/ipv4/netfilter/ip_queue.c   |    2 +-
 net/ipv6/netfilter/ip6_queue.c  |    2 +-
 net/netfilter/nfnetlink.c       |    2 +-
 net/netlink/genetlink.c         |    2 +-
 net/xfrm/xfrm_user.c            |    2 +-
 security/capability.c           |    1 -
 security/commoncap.c            |    8 --------
 security/security.c             |    6 ------
 security/selinux/hooks.c        |   19 -------------------
 14 files changed, 10 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 44f76e8..c77628a 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -112,7 +112,7 @@ scsi_nl_rcv_msg(struct sk_buff *skb)
 			goto next_msg;
 		}
 
-		if (security_netlink_recv(skb, CAP_SYS_ADMIN)) {
+		if (!capable(CAP_SYS_ADMIN)) {
 			err = -EPERM;
 			goto next_msg;
 		}
diff --git a/include/linux/security.h b/include/linux/security.h
index 1bb742b..bb7e8a0 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -96,7 +96,6 @@ struct xfrm_user_sec_ctx;
 struct seq_file;
 
 extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb);
-extern int cap_netlink_recv(struct sk_buff *skb, int cap);
 
 void reset_security_ops(void);
 
@@ -798,12 +797,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	@skb contains the sk_buff structure for the netlink message.
  *	Return 0 if the information was successfully saved and message
  *	is allowed to be transmitted.
- * @netlink_recv:
- *	Check permission before processing the received netlink message in
- *	@skb.
- *	@skb contains the sk_buff structure for the netlink message.
- *	@cap indicates the capability required
- *	Return 0 if permission is granted.
  *
  * Security hooks for Unix domain networking.
  *
@@ -1564,7 +1557,6 @@ struct security_operations {
 			  struct sembuf *sops, unsigned nsops, int alter);
 
 	int (*netlink_send) (struct sock *sk, struct sk_buff *skb);
-	int (*netlink_recv) (struct sk_buff *skb, int cap);
 
 	void (*d_instantiate) (struct dentry *dentry, struct inode *inode);
 
@@ -1817,7 +1809,6 @@ void security_d_instantiate(struct dentry *dentry, struct inode *inode);
 int security_getprocattr(struct task_struct *p, char *name, char **value);
 int security_setprocattr(struct task_struct *p, char *name, void *value, size_t size);
 int security_netlink_send(struct sock *sk, struct sk_buff *skb);
-int security_netlink_recv(struct sk_buff *skb, int cap);
 int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
 int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void security_release_secctx(char *secdata, u32 seclen);
@@ -2505,11 +2496,6 @@ static inline int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return cap_netlink_send(sk, skb);
 }
 
-static inline int security_netlink_recv(struct sk_buff *skb, int cap)
-{
-	return cap_netlink_recv(skb, cap);
-}
-
 static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
 {
 	return -EOPNOTSUPP;
diff --git a/kernel/audit.c b/kernel/audit.c
index 2c1d6ab..57e3f51 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -601,13 +601,13 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
 	case AUDIT_TTY_SET:
 	case AUDIT_TRIM:
 	case AUDIT_MAKE_EQUIV:
-		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL))
+		if (!capable(CAP_AUDIT_CONTROL))
 			err = -EPERM;
 		break;
 	case AUDIT_USER:
 	case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
 	case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
-		if (security_netlink_recv(skb, CAP_AUDIT_WRITE))
+		if (!capable(CAP_AUDIT_WRITE))
 			err = -EPERM;
 		break;
 	default:  /* bad msg */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 39f8dd6..98ee1b6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1930,7 +1930,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	sz_idx = type>>2;
 	kind = type&3;
 
-	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (kind != 2 && !capable(CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 69975e0..1531135 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -108,7 +108,7 @@ static inline void dnrmg_receive_user_skb(struct sk_buff *skb)
 	if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (!capable(CAP_NET_ADMIN))
 		RCV_SKB_FAIL(-EPERM);
 
 	/* Eventually we might send routing messages too */
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index e59aabd..ffabb26 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -430,7 +430,7 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (!capable(CAP_NET_ADMIN))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index e63c397..5e5ce77 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -431,7 +431,7 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (!capable(CAP_NET_ADMIN))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index c879c1a..9da4fc5 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -130,7 +130,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	const struct nfnetlink_subsystem *ss;
 	int type, err;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* All the messages must at least contain nfgenmsg */
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 482fa57..05fedbf 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -516,7 +516,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return -EOPNOTSUPP;
 
 	if ((ops->flags & GENL_ADMIN_PERM) &&
-	    security_netlink_recv(skb, CAP_NET_ADMIN))
+	    !capable(CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d0a42df..7ea716d 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2290,7 +2290,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	link = &xfrm_dispatch[type];
 
 	/* All operations require privileges, even GET */
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
 	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||
diff --git a/security/capability.c b/security/capability.c
index 3cf5ae3..5c1aea0 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -1004,7 +1004,6 @@ void __init security_fixup_ops(struct security_operations *ops)
 	set_to_cap_if_null(ops, sem_semctl);
 	set_to_cap_if_null(ops, sem_semop);
 	set_to_cap_if_null(ops, netlink_send);
-	set_to_cap_if_null(ops, netlink_recv);
 	set_to_cap_if_null(ops, d_instantiate);
 	set_to_cap_if_null(ops, getprocattr);
 	set_to_cap_if_null(ops, setprocattr);
diff --git a/security/commoncap.c b/security/commoncap.c
index 90fdf97..454e974 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -56,14 +56,6 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-int cap_netlink_recv(struct sk_buff *skb, int cap)
-{
-	if (!cap_raised(current_cap(), cap))
-		return -EPERM;
-	return 0;
-}
-EXPORT_SYMBOL(cap_netlink_recv);
-
 /**
  * cap_capable - Determine whether a task has a particular effective capability
  * @cred: The credentials to use
diff --git a/security/security.c b/security/security.c
index 5560472..17ee1c0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -983,12 +983,6 @@ int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return security_ops->netlink_send(sk, skb);
 }
 
-int security_netlink_recv(struct sk_buff *skb, int cap)
-{
-	return security_ops->netlink_recv(skb, cap);
-}
-EXPORT_SYMBOL(security_netlink_recv);
-
 int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
 {
 	return security_ops->secid_to_secctx(secid, secdata, seclen);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 3deee07..668fe48 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4731,24 +4731,6 @@ static int selinux_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return selinux_nlmsg_perm(sk, skb);
 }
 
-static int selinux_netlink_recv(struct sk_buff *skb, int capability)
-{
-	int err;
-	struct common_audit_data ad;
-	u32 sid;
-
-	err = cap_netlink_recv(skb, capability);
-	if (err)
-		return err;
-
-	COMMON_AUDIT_DATA_INIT(&ad, CAP);
-	ad.u.cap = capability;
-
-	security_task_getsecid(current, &sid);
-	return avc_has_perm(sid, sid, SECCLASS_CAPABILITY,
-			    CAP_TO_MASK(capability), &ad);
-}
-
 static int ipc_alloc_security(struct task_struct *task,
 			      struct kern_ipc_perm *perm,
 			      u16 sclass)
@@ -5477,7 +5459,6 @@ static struct security_operations selinux_ops = {
 	.vm_enough_memory =		selinux_vm_enough_memory,
 
 	.netlink_send =			selinux_netlink_send,
-	.netlink_recv =			selinux_netlink_recv,
 
 	.bprm_set_creds =		selinux_bprm_set_creds,
 	.bprm_committing_creds =	selinux_bprm_committing_creds,



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] protect cap_netlink_recv from user namespaces
  2011-11-09 14:19       ` Eric Paris
@ 2011-11-09 14:44         ` Serge E. Hallyn
  2011-11-19  9:10         ` Eric W. Biederman
  1 sibling, 0 replies; 19+ messages in thread
From: Serge E. Hallyn @ 2011-11-09 14:44 UTC (permalink / raw)
  To: Eric Paris
  Cc: Serge E. Hallyn, linux-kernel, containers, oleg, richard, akpm,
	ebiederm, dhowells

Quoting Eric Paris (eparis@redhat.com):
> On Tue, 2011-11-08 at 03:29 +0000, Serge E. Hallyn wrote:
> > Quoting Eric Paris (eparis@redhat.com):
> 
> > But, regardless, your point is valid in that I'm not tightening down as
> > much as I could.  So how about I don't change the security_netlink_recv()
> > and callers yet, and instead I change cap_netlink_recv() to do:
> > 
> > 	if (!IN_ROOT_USER_NS && !cap_raised(current_cap(), cap))
> 
> Actually better thought.  Remove the LSM hook altogether and just use
> capable() in the callers.  This hook, being used this way, was

Heh, that sounds terrific!  Only concern I have is it will cause
PF_SUPERPRIV to be set when successful, which it didn't use to.
Now it looks like that might be more correct, but it's a change.
Are people who care about accounting stats going to notice and be
surprised?

> introduced in c7bdb545 back when we took the effective perms from the
> skb.  We don't use the skb any more since netlink is synchronous.  This
> is functionally equivalent except the capabilities code checks against
> the init_user_ns (something we want) and it will now set PF_PRIV (which
> also seems like a good thing) Something like:
> 
> security: remove the security_netlink_recv hook as it is equivalent to capable()
>     
> Once upon a time netlink was not sync and we had to get the effective
> capabilities from the skb that was being received.  Today we instead get
> the capabilities from the current task.  This has rendered the entire
> purpose of the hook moot as it is now functionally equivalent to the
> capable() call.
>     
> Signed-off-by: Eric Paris <eparis@redhat.com>

Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>

thanks,
-serge

> 
> ---
> 
>  drivers/scsi/scsi_netlink.c     |    2 +-
>  include/linux/security.h        |   14 --------------
>  kernel/audit.c                  |    4 ++--
>  net/core/rtnetlink.c            |    2 +-
>  net/decnet/netfilter/dn_rtmsg.c |    2 +-
>  net/ipv4/netfilter/ip_queue.c   |    2 +-
>  net/ipv6/netfilter/ip6_queue.c  |    2 +-
>  net/netfilter/nfnetlink.c       |    2 +-
>  net/netlink/genetlink.c         |    2 +-
>  net/xfrm/xfrm_user.c            |    2 +-
>  security/capability.c           |    1 -
>  security/commoncap.c            |    8 --------
>  security/security.c             |    6 ------
>  security/selinux/hooks.c        |   19 -------------------
>  14 files changed, 10 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
> index 44f76e8..c77628a 100644
> --- a/drivers/scsi/scsi_netlink.c
> +++ b/drivers/scsi/scsi_netlink.c
> @@ -112,7 +112,7 @@ scsi_nl_rcv_msg(struct sk_buff *skb)
>  			goto next_msg;
>  		}
>  
> -		if (security_netlink_recv(skb, CAP_SYS_ADMIN)) {
> +		if (!capable(CAP_SYS_ADMIN)) {
>  			err = -EPERM;
>  			goto next_msg;
>  		}
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 1bb742b..bb7e8a0 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -96,7 +96,6 @@ struct xfrm_user_sec_ctx;
>  struct seq_file;
>  
>  extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb);
> -extern int cap_netlink_recv(struct sk_buff *skb, int cap);
>  
>  void reset_security_ops(void);
>  
> @@ -798,12 +797,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
>   *	@skb contains the sk_buff structure for the netlink message.
>   *	Return 0 if the information was successfully saved and message
>   *	is allowed to be transmitted.
> - * @netlink_recv:
> - *	Check permission before processing the received netlink message in
> - *	@skb.
> - *	@skb contains the sk_buff structure for the netlink message.
> - *	@cap indicates the capability required
> - *	Return 0 if permission is granted.
>   *
>   * Security hooks for Unix domain networking.
>   *
> @@ -1564,7 +1557,6 @@ struct security_operations {
>  			  struct sembuf *sops, unsigned nsops, int alter);
>  
>  	int (*netlink_send) (struct sock *sk, struct sk_buff *skb);
> -	int (*netlink_recv) (struct sk_buff *skb, int cap);
>  
>  	void (*d_instantiate) (struct dentry *dentry, struct inode *inode);
>  
> @@ -1817,7 +1809,6 @@ void security_d_instantiate(struct dentry *dentry, struct inode *inode);
>  int security_getprocattr(struct task_struct *p, char *name, char **value);
>  int security_setprocattr(struct task_struct *p, char *name, void *value, size_t size);
>  int security_netlink_send(struct sock *sk, struct sk_buff *skb);
> -int security_netlink_recv(struct sk_buff *skb, int cap);
>  int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
>  int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
>  void security_release_secctx(char *secdata, u32 seclen);
> @@ -2505,11 +2496,6 @@ static inline int security_netlink_send(struct sock *sk, struct sk_buff *skb)
>  	return cap_netlink_send(sk, skb);
>  }
>  
> -static inline int security_netlink_recv(struct sk_buff *skb, int cap)
> -{
> -	return cap_netlink_recv(skb, cap);
> -}
> -
>  static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
>  {
>  	return -EOPNOTSUPP;
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 2c1d6ab..57e3f51 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -601,13 +601,13 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
>  	case AUDIT_TTY_SET:
>  	case AUDIT_TRIM:
>  	case AUDIT_MAKE_EQUIV:
> -		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL))
> +		if (!capable(CAP_AUDIT_CONTROL))
>  			err = -EPERM;
>  		break;
>  	case AUDIT_USER:
>  	case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
>  	case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
> -		if (security_netlink_recv(skb, CAP_AUDIT_WRITE))
> +		if (!capable(CAP_AUDIT_WRITE))
>  			err = -EPERM;
>  		break;
>  	default:  /* bad msg */
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 39f8dd6..98ee1b6 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1930,7 +1930,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>  	sz_idx = type>>2;
>  	kind = type&3;
>  
> -	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN))
> +	if (kind != 2 && !capable(CAP_NET_ADMIN))
>  		return -EPERM;
>  
>  	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
> diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
> index 69975e0..1531135 100644
> --- a/net/decnet/netfilter/dn_rtmsg.c
> +++ b/net/decnet/netfilter/dn_rtmsg.c
> @@ -108,7 +108,7 @@ static inline void dnrmg_receive_user_skb(struct sk_buff *skb)
>  	if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
>  		return;
>  
> -	if (security_netlink_recv(skb, CAP_NET_ADMIN))
> +	if (!capable(CAP_NET_ADMIN))
>  		RCV_SKB_FAIL(-EPERM);
>  
>  	/* Eventually we might send routing messages too */
> diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
> index e59aabd..ffabb26 100644
> --- a/net/ipv4/netfilter/ip_queue.c
> +++ b/net/ipv4/netfilter/ip_queue.c
> @@ -430,7 +430,7 @@ __ipq_rcv_skb(struct sk_buff *skb)
>  	if (type <= IPQM_BASE)
>  		return;
>  
> -	if (security_netlink_recv(skb, CAP_NET_ADMIN))
> +	if (!capable(CAP_NET_ADMIN))
>  		RCV_SKB_FAIL(-EPERM);
>  
>  	spin_lock_bh(&queue_lock);
> diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
> index e63c397..5e5ce77 100644
> --- a/net/ipv6/netfilter/ip6_queue.c
> +++ b/net/ipv6/netfilter/ip6_queue.c
> @@ -431,7 +431,7 @@ __ipq_rcv_skb(struct sk_buff *skb)
>  	if (type <= IPQM_BASE)
>  		return;
>  
> -	if (security_netlink_recv(skb, CAP_NET_ADMIN))
> +	if (!capable(CAP_NET_ADMIN))
>  		RCV_SKB_FAIL(-EPERM);
>  
>  	spin_lock_bh(&queue_lock);
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index c879c1a..9da4fc5 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -130,7 +130,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>  	const struct nfnetlink_subsystem *ss;
>  	int type, err;
>  
> -	if (security_netlink_recv(skb, CAP_NET_ADMIN))
> +	if (!capable(CAP_NET_ADMIN))
>  		return -EPERM;
>  
>  	/* All the messages must at least contain nfgenmsg */
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 482fa57..05fedbf 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -516,7 +516,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>  		return -EOPNOTSUPP;
>  
>  	if ((ops->flags & GENL_ADMIN_PERM) &&
> -	    security_netlink_recv(skb, CAP_NET_ADMIN))
> +	    !capable(CAP_NET_ADMIN))
>  		return -EPERM;
>  
>  	if (nlh->nlmsg_flags & NLM_F_DUMP) {
> diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
> index d0a42df..7ea716d 100644
> --- a/net/xfrm/xfrm_user.c
> +++ b/net/xfrm/xfrm_user.c
> @@ -2290,7 +2290,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>  	link = &xfrm_dispatch[type];
>  
>  	/* All operations require privileges, even GET */
> -	if (security_netlink_recv(skb, CAP_NET_ADMIN))
> +	if (!capable(CAP_NET_ADMIN))
>  		return -EPERM;
>  
>  	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||
> diff --git a/security/capability.c b/security/capability.c
> index 3cf5ae3..5c1aea0 100644
> --- a/security/capability.c
> +++ b/security/capability.c
> @@ -1004,7 +1004,6 @@ void __init security_fixup_ops(struct security_operations *ops)
>  	set_to_cap_if_null(ops, sem_semctl);
>  	set_to_cap_if_null(ops, sem_semop);
>  	set_to_cap_if_null(ops, netlink_send);
> -	set_to_cap_if_null(ops, netlink_recv);
>  	set_to_cap_if_null(ops, d_instantiate);
>  	set_to_cap_if_null(ops, getprocattr);
>  	set_to_cap_if_null(ops, setprocattr);
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 90fdf97..454e974 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -56,14 +56,6 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
>  	return 0;
>  }
>  
> -int cap_netlink_recv(struct sk_buff *skb, int cap)
> -{
> -	if (!cap_raised(current_cap(), cap))
> -		return -EPERM;
> -	return 0;
> -}
> -EXPORT_SYMBOL(cap_netlink_recv);
> -
>  /**
>   * cap_capable - Determine whether a task has a particular effective capability
>   * @cred: The credentials to use
> diff --git a/security/security.c b/security/security.c
> index 5560472..17ee1c0 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -983,12 +983,6 @@ int security_netlink_send(struct sock *sk, struct sk_buff *skb)
>  	return security_ops->netlink_send(sk, skb);
>  }
>  
> -int security_netlink_recv(struct sk_buff *skb, int cap)
> -{
> -	return security_ops->netlink_recv(skb, cap);
> -}
> -EXPORT_SYMBOL(security_netlink_recv);
> -
>  int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
>  {
>  	return security_ops->secid_to_secctx(secid, secdata, seclen);
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3deee07..668fe48 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4731,24 +4731,6 @@ static int selinux_netlink_send(struct sock *sk, struct sk_buff *skb)
>  	return selinux_nlmsg_perm(sk, skb);
>  }
>  
> -static int selinux_netlink_recv(struct sk_buff *skb, int capability)
> -{
> -	int err;
> -	struct common_audit_data ad;
> -	u32 sid;
> -
> -	err = cap_netlink_recv(skb, capability);
> -	if (err)
> -		return err;
> -
> -	COMMON_AUDIT_DATA_INIT(&ad, CAP);
> -	ad.u.cap = capability;
> -
> -	security_task_getsecid(current, &sid);
> -	return avc_has_perm(sid, sid, SECCLASS_CAPABILITY,
> -			    CAP_TO_MASK(capability), &ad);
> -}
> -
>  static int ipc_alloc_security(struct task_struct *task,
>  			      struct kern_ipc_perm *perm,
>  			      u16 sclass)
> @@ -5477,7 +5459,6 @@ static struct security_operations selinux_ops = {
>  	.vm_enough_memory =		selinux_vm_enough_memory,
>  
>  	.netlink_send =			selinux_netlink_send,
> -	.netlink_recv =			selinux_netlink_recv,
>  
>  	.bprm_set_creds =		selinux_bprm_set_creds,
>  	.bprm_committing_creds =	selinux_bprm_committing_creds,
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4)
  2011-11-09 14:18     ` Serge E. Hallyn
@ 2011-11-10  1:41       ` Matt Helsley
  2011-11-10 14:27         ` Serge E. Hallyn
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Helsley @ 2011-11-10  1:41 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andrew Morton, richard, containers, oleg, eparis, linux-kernel,
	dhowells, ebiederm

On Wed, Nov 09, 2011 at 08:18:53AM -0600, Serge E. Hallyn wrote:
> Quoting Andrew Morton (akpm@linux-foundation.org):
> > On Fri,  4 Nov 2011 22:24:37 +0000
> > Serge Hallyn <serge@hallyn.com> wrote:
> > 
> > > +static inline void fixup_uid(struct siginfo *info, struct task_struct *t)

nit: Seems like the function name itself needs a fixup. I can imagine a
user-ns-unaware kernel developer thinking this when they see code calling
this function:

<naive>
Gosh, why is the uid broken? I have no idea what this does or when it
would be needed. Why hasn't it been splatted all over the kernel? Who broke
it?
</naive>

You can't even tell from the name that it's for user namespaces.

Cheers,
	-Matt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4)
  2011-11-10  1:41       ` Matt Helsley
@ 2011-11-10 14:27         ` Serge E. Hallyn
  0 siblings, 0 replies; 19+ messages in thread
From: Serge E. Hallyn @ 2011-11-10 14:27 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Andrew Morton, richard, containers, oleg, eparis, linux-kernel,
	dhowells, ebiederm

Quoting Matt Helsley (matthltc@us.ibm.com):
> On Wed, Nov 09, 2011 at 08:18:53AM -0600, Serge E. Hallyn wrote:
> > Quoting Andrew Morton (akpm@linux-foundation.org):
> > > On Fri,  4 Nov 2011 22:24:37 +0000
> > > Serge Hallyn <serge@hallyn.com> wrote:
> > > 
> > > > +static inline void fixup_uid(struct siginfo *info, struct task_struct *t)
> 
> nit: Seems like the function name itself needs a fixup. I can imagine a
> user-ns-unaware kernel developer thinking this when they see code calling
> this function:
> 
> <naive>
> Gosh, why is the uid broken? I have no idea what this does or when it
> would be needed. Why hasn't it been splatted all over the kernel? Who broke
> it?
> </naive>
> 
> You can't even tell from the name that it's for user namespaces.

Heh, my first reaction was that you're over-reacting, but you're probably
right.  I don't want to conflict with the 'user_ns_map_uids' function,
so how about userns_fixup_signal_uids()?  Pls let me know if you have a
better name, otherwise I'll switch to this one.

thanks,
-serge

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: user namespaces: fix some uid/privilege leaks
  2011-11-04 22:24 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (5 preceding siblings ...)
  2011-11-04 22:24 ` [PATCH 6/6] protect cap_netlink_recv from user namespaces Serge Hallyn
@ 2011-11-11  4:13 ` Serge E. Hallyn
  6 siblings, 0 replies; 19+ messages in thread
From: Serge E. Hallyn @ 2011-11-11  4:13 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: oleg, richard, akpm, ebiederm, dhowells, eparis, matthltc

For what it's worth, and for anyone interested in taking a look, a
tree with feedback addressed is at

http://kernel.ubuntu.com/git?p=serge/linux-2.6.git;a=shortlog;h=refs/heads/userns.9

I'll re-test with and without userns next week, then resend.

thanks,
-serge

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] protect cap_netlink_recv from user namespaces
  2011-11-09 14:19       ` Eric Paris
  2011-11-09 14:44         ` Serge E. Hallyn
@ 2011-11-19  9:10         ` Eric W. Biederman
  2011-11-19 23:25           ` Serge E. Hallyn
  1 sibling, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2011-11-19  9:10 UTC (permalink / raw)
  To: Eric Paris
  Cc: Serge E. Hallyn, richard, containers, linux-kernel, oleg,
	dhowells, akpm

Eric Paris <eparis@redhat.com> writes:

> On Tue, 2011-11-08 at 03:29 +0000, Serge E. Hallyn wrote:
>> Quoting Eric Paris (eparis@redhat.com):
>
>> But, regardless, your point is valid in that I'm not tightening down as
>> much as I could.  So how about I don't change the security_netlink_recv()
>> and callers yet, and instead I change cap_netlink_recv() to do:
>> 
>> 	if (!IN_ROOT_USER_NS && !cap_raised(current_cap(), cap))
>
> Actually better thought.  Remove the LSM hook altogether and just use
> capable() in the callers.  This hook, being used this way, was
> introduced in c7bdb545 back when we took the effective perms from the
> skb.  We don't use the skb any more since netlink is synchronous.  This
> is functionally equivalent except the capabilities code checks against
> the init_user_ns (something we want) and it will now set PF_PRIV (which
> also seems like a good thing) Something like:
>
> security: remove the security_netlink_recv hook as it is equivalent to capable()
>     
> Once upon a time netlink was not sync and we had to get the effective
> capabilities from the skb that was being received.  Today we instead get
> the capabilities from the current task.  This has rendered the entire
> purpose of the hook moot as it is now functionally equivalent to the
> capable() call.
>
> Signed-off-by: Eric Paris <eparis@redhat.com>

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Darn.  I missed this one went it went past the first time.  Let's do
this.

As Serge pointed out checking against the user namespace of the network
namespace happens to be safe because the subsystems that are brittle
really have problems don't support multiple network namespaces.

Still I like the idea of erring on the conservative side here and
making everything safe.  We can open relax the restrictions later
by using ns_capable.  I want to get to a point where it is safe
for an unprivileged user to create their own user namespace,
and most of that is just getting the capable calls correct.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] protect cap_netlink_recv from user namespaces
  2011-11-19  9:10         ` Eric W. Biederman
@ 2011-11-19 23:25           ` Serge E. Hallyn
  0 siblings, 0 replies; 19+ messages in thread
From: Serge E. Hallyn @ 2011-11-19 23:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Eric Paris, richard, containers, linux-kernel, oleg, dhowells,
	akpm

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Eric Paris <eparis@redhat.com> writes:
> 
> > On Tue, 2011-11-08 at 03:29 +0000, Serge E. Hallyn wrote:
> >> Quoting Eric Paris (eparis@redhat.com):
> >
> >> But, regardless, your point is valid in that I'm not tightening down as
> >> much as I could.  So how about I don't change the security_netlink_recv()
> >> and callers yet, and instead I change cap_netlink_recv() to do:
> >> 
> >> 	if (!IN_ROOT_USER_NS && !cap_raised(current_cap(), cap))
> >
> > Actually better thought.  Remove the LSM hook altogether and just use
> > capable() in the callers.  This hook, being used this way, was
> > introduced in c7bdb545 back when we took the effective perms from the
> > skb.  We don't use the skb any more since netlink is synchronous.  This
> > is functionally equivalent except the capabilities code checks against
> > the init_user_ns (something we want) and it will now set PF_PRIV (which
> > also seems like a good thing) Something like:
> >
> > security: remove the security_netlink_recv hook as it is equivalent to capable()
> >     
> > Once upon a time netlink was not sync and we had to get the effective
> > capabilities from the skb that was being received.  Today we instead get
> > the capabilities from the current task.  This has rendered the entire
> > purpose of the hook moot as it is now functionally equivalent to the
> > capable() call.
> >
> > Signed-off-by: Eric Paris <eparis@redhat.com>
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Darn.  I missed this one went it went past the first time.  Let's do
> this.
> 
> As Serge pointed out checking against the user namespace of the network
> namespace happens to be safe because the subsystems that are brittle
> really have problems don't support multiple network namespaces.
> 
> Still I like the idea of erring on the conservative side here and
> making everything safe.  We can open relax the restrictions later
> by using ns_capable.  I want to get to a point where it is safe
> for an unprivileged user to create their own user namespace,
> and most of that is just getting the capable calls correct.
> 
> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

Ok.  Oh!  This was a part of my new 6-patch set I was going to send, but
when you pursuaded me that some were not worth it if you're rethinking the
nature of uids, I only sent patch 1.

I did queue up the patch at http://kernel.ubuntu.com/git?p=serge/linux-2.6.git;a=commit;h=b97c54cb518160466095db8f8d2ecf5bd4f81ce2
and tested it a bit in ltp (with userns both on and off).

Eric Paris, would you like to resend it separately (with Eric's and my
ack's, as above and at http://lkml.org/lkml/2011/11/9/195), or would you
like me to do so?

thanks,
-serge

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-11-19 23:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-04 22:24 user namespaces: fix some uid/privilege leaks Serge Hallyn
2011-11-04 22:24 ` [PATCH 1/6] user namespace: make signal.c respect user namespaces (v4) Serge Hallyn
2011-11-09  0:22   ` Andrew Morton
2011-11-09 14:18     ` Serge E. Hallyn
2011-11-10  1:41       ` Matt Helsley
2011-11-10 14:27         ` Serge E. Hallyn
2011-11-04 22:24 ` [PATCH 2/6] User namespace: don't allow sysctl in non-init user ns (v2) Serge Hallyn
2011-11-04 22:24 ` [PATCH 3/6] user namespace: clamp down users of cap_raised Serge Hallyn
2011-11-06  1:14   ` Andrew G. Morgan
2011-11-04 22:24 ` [PATCH 4/6] Add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
2011-11-04 22:24 ` [PATCH 5/6] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
2011-11-04 22:24 ` [PATCH 6/6] protect cap_netlink_recv from user namespaces Serge Hallyn
2011-11-07 19:35   ` Eric Paris
2011-11-08  3:29     ` Serge E. Hallyn
2011-11-09 14:19       ` Eric Paris
2011-11-09 14:44         ` Serge E. Hallyn
2011-11-19  9:10         ` Eric W. Biederman
2011-11-19 23:25           ` Serge E. Hallyn
2011-11-11  4:13 ` user namespaces: fix some uid/privilege leaks Serge E. Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox