user namespaces: fix some uid/privilege leaks

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* user namespaces: fix some uid/privilege leaks
@ 2011-10-18 21:54 Serge Hallyn
  2011-10-18 21:54 ` [PATCH 1/9] pid_ns: ensure pid is not freed during kill_pid_info_as_uid Serge Hallyn
                   ` (9 more replies)
  0 siblings, 10 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis

The set of patches which I'm currently aiming to get upstream is queued at:
http://kernel.ubuntu.com/git?p=serge/linux-2.6.git;a=shortlog;h=refs/heads/userns

In the below descriptions, it helps to remember that a task can have more
privilege to his own, child, user namespace, than he does in his parent or
the initial user namespace.  In fact a task created in a new user namespace
receives all capabilities (within bounding set) in the new user namespace.
So checks for privilege in a task's own user ns can not be safely used in place
of checks against another user ns.

The set includes:

0001-pid_ns-ensure-pid-is-not-freed-during-kill_pid_info_.patch
	Fix a case where a pid could be referenced after being freed.
	(This is in Greg's usb tree, but not yet in Linus' tree; it's
	here just to show full context)

0002-user-namespace-usb-make-usb-urbs-user-namespace-awar.patch
	Take the user namespace into account when comparing uids when
	signals are sent by usb urbs.
	(This is in Greg's usb tree, but not yet in Linus' tree; it's
	here just to show full context)

0003-user-namespace-make-signal.c-respect-user-namespaces.patch
	This convers the uid for the task sending a signal to the
	user namespace of the receiver.  It is somewhat analogous
	to what is done with the sender's pid.
	Waiting on feedback from Oleg, but I believe this patch is
	ready.

0004-User-namespace-don-t-allow-sysctl-in-non-init-user-n.patch
	This prevents root in a child user namespace from man-handling
	sysctls.  With this patch, a task in a child user namespace
	will only get the world access rights to sysctls.

0005-user-namespace-clamp-down-users-of-cap_raised.patch
	This clamps down on cases where privilege to your own user
	namespace were checked for access to the initial user namespace.

0006-Add-Documentation-namespaces-user_namespace.txt-v3.patch
	Documentation.

0007-user-namespace-make-each-net-net_ns-belong-to-a-user.patch
	This adds a struct user_namespace pointer to the net_ns for use
	by later patches.

0008-protect-cap_netlink_recv-from-user-namespaces.patch
	Now that net_ns is owned by a user_ns, cap_netlink_recv() can
	target privilege checks to the user_ns owning the resource.  The
	current check against current_cap() is unsafe.

0009-make-net-core-scm.c-uid-comparisons-user-namespace-a.patch
	In scm_send, uids of sender/receiver are being compared without
	accounting for different user namespaces.  Fix that.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/9] pid_ns: ensure pid is not freed during kill_pid_info_as_uid
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-18 21:54 ` [PATCH 2/9] user namespace: usb: make usb urbs user namespace aware (v2) Serge Hallyn
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge Hallyn, linux-usb

From: Serge Hallyn <serge.hallyn@canonical.com>

[ This is in Greg's USB tree, but not yet in Linus' tree;  it's here
just to give a full context. ]

Alan Stern points out that after spin_unlock(&ps->lock) there is no
guarantee that ps->pid won't be freed.  Since kill_pid_info_as_uid() is
called after the spin_unlock(), the pid passed to it must be pinned.

Reported-by: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-usb@vger.kernel.org
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 drivers/usb/core/devio.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 37518df..eea53eb 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -407,7 +407,7 @@ static void async_completed(struct urb *urb)
 		sinfo.si_errno = as->status;
 		sinfo.si_code = SI_ASYNCIO;
 		sinfo.si_addr = as->userurb;
-		pid = as->pid;
+		pid = get_pid(as->pid);
 		uid = as->uid;
 		euid = as->euid;
 		secid = as->secid;
@@ -422,9 +422,11 @@ static void async_completed(struct urb *urb)
 		cancel_bulk_urbs(ps, as->bulk_addr);
 	spin_unlock(&ps->lock);
 
-	if (signr)
+	if (signr) {
 		kill_pid_info_as_uid(sinfo.si_signo, &sinfo, pid, uid,
 				      euid, secid);
+		put_pid(pid);
+	}
 
 	wake_up(&ps->wait);
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/9] user namespace: usb: make usb urbs user namespace aware (v2)
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
  2011-10-18 21:54 ` [PATCH 1/9] pid_ns: ensure pid is not freed during kill_pid_info_as_uid Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-18 21:54 ` [PATCH 3/9] user namespace: make signal.c respect user namespaces (v4) Serge Hallyn
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge Hallyn, Alan Stern, linux-usb

From: Serge Hallyn <serge.hallyn@canonical.com>

[ This is in Greg's USB tree, but not yet in Linus' tree;  it's here
just to give a full context. ]

Add to the dev_state and alloc_async structures the user namespace
corresponding to the uid and euid.  Pass these to kill_pid_info_as_uid(),
which can then implement a proper, user-namespace-aware uid check.

Changelog:
Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace,
	uid, and euid each separately, pass a struct cred.
Sep 26: Address Alan Stern's comments: don't define a struct cred at
	usbdev_open(), and take and put a cred at async_completed() to
	ensure it lasts for the duration of kill_pid_info_as_cred().

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-usb@vger.kernel.org
---
 drivers/usb/core/devio.c |   30 +++++++++++++-----------------
 include/linux/sched.h    |    3 ++-
 kernel/signal.c          |   24 ++++++++++++++++--------
 3 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index eea53eb..000c25d 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -46,6 +46,7 @@
 #include <linux/cdev.h>
 #include <linux/notifier.h>
 #include <linux/security.h>
+#include <linux/user_namespace.h>
 #include <asm/uaccess.h>
 #include <asm/byteorder.h>
 #include <linux/moduleparam.h>
@@ -68,7 +69,7 @@ struct dev_state {
 	wait_queue_head_t wait;     /* wake up if a request completed */
 	unsigned int discsignr;
 	struct pid *disc_pid;
-	uid_t disc_uid, disc_euid;
+	const struct cred *cred;
 	void __user *disccontext;
 	unsigned long ifclaimed;
 	u32 secid;
@@ -79,7 +80,7 @@ struct async {
 	struct list_head asynclist;
 	struct dev_state *ps;
 	struct pid *pid;
-	uid_t uid, euid;
+	const struct cred *cred;
 	unsigned int signr;
 	unsigned int ifnum;
 	void __user *userbuffer;
@@ -248,6 +249,7 @@ static struct async *alloc_async(unsigned int numisoframes)
 static void free_async(struct async *as)
 {
 	put_pid(as->pid);
+	put_cred(as->cred);
 	kfree(as->urb->transfer_buffer);
 	kfree(as->urb->setup_packet);
 	usb_free_urb(as->urb);
@@ -393,9 +395,8 @@ static void async_completed(struct urb *urb)
 	struct dev_state *ps = as->ps;
 	struct siginfo sinfo;
 	struct pid *pid = NULL;
-	uid_t uid = 0;
-	uid_t euid = 0;
 	u32 secid = 0;
+	const struct cred *cred = NULL;
 	int signr;
 
 	spin_lock(&ps->lock);
@@ -408,8 +409,7 @@ static void async_completed(struct urb *urb)
 		sinfo.si_code = SI_ASYNCIO;
 		sinfo.si_addr = as->userurb;
 		pid = get_pid(as->pid);
-		uid = as->uid;
-		euid = as->euid;
+		cred = get_cred(as->cred);
 		secid = as->secid;
 	}
 	snoop(&urb->dev->dev, "urb complete\n");
@@ -423,9 +423,9 @@ static void async_completed(struct urb *urb)
 	spin_unlock(&ps->lock);
 
 	if (signr) {
-		kill_pid_info_as_uid(sinfo.si_signo, &sinfo, pid, uid,
-				      euid, secid);
+		kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred, secid);
 		put_pid(pid);
+		put_cred(cred);
 	}
 
 	wake_up(&ps->wait);
@@ -658,7 +658,6 @@ static int usbdev_open(struct inode *inode, struct file *file)
 {
 	struct usb_device *dev = NULL;
 	struct dev_state *ps;
-	const struct cred *cred = current_cred();
 	int ret;
 
 	ret = -ENOMEM;
@@ -708,8 +707,7 @@ static int usbdev_open(struct inode *inode, struct file *file)
 	init_waitqueue_head(&ps->wait);
 	ps->discsignr = 0;
 	ps->disc_pid = get_pid(task_pid(current));
-	ps->disc_uid = cred->uid;
-	ps->disc_euid = cred->euid;
+	ps->cred = get_current_cred();
 	ps->disccontext = NULL;
 	ps->ifclaimed = 0;
 	security_task_getsecid(current, &ps->secid);
@@ -751,6 +749,7 @@ static int usbdev_release(struct inode *inode, struct file *file)
 	usb_unlock_device(dev);
 	usb_put_dev(dev);
 	put_pid(ps->disc_pid);
+	put_cred(ps->cred);
 
 	as = async_getcompleted(ps);
 	while (as) {
@@ -1050,7 +1049,6 @@ static int proc_do_submiturb(struct dev_state *ps, struct usbdevfs_urb *uurb,
 	struct usb_host_endpoint *ep;
 	struct async *as;
 	struct usb_ctrlrequest *dr = NULL;
-	const struct cred *cred = current_cred();
 	unsigned int u, totlen, isofrmlen;
 	int ret, ifnum = -1;
 	int is_in;
@@ -1264,8 +1262,7 @@ static int proc_do_submiturb(struct dev_state *ps, struct usbdevfs_urb *uurb,
 	as->signr = uurb->signr;
 	as->ifnum = ifnum;
 	as->pid = get_pid(task_pid(current));
-	as->uid = cred->uid;
-	as->euid = cred->euid;
+	as->cred = get_current_cred();
 	security_task_getsecid(current, &as->secid);
 	if (!is_in && uurb->buffer_length > 0) {
 		if (copy_from_user(as->urb->transfer_buffer, uurb->buffer,
@@ -1983,9 +1980,8 @@ static void usbdev_remove(struct usb_device *udev)
 			sinfo.si_errno = EPIPE;
 			sinfo.si_code = SI_ASYNCIO;
 			sinfo.si_addr = ps->disccontext;
-			kill_pid_info_as_uid(ps->discsignr, &sinfo,
-					ps->disc_pid, ps->disc_uid,
-					ps->disc_euid, ps->secid);
+			kill_pid_info_as_cred(ps->discsignr, &sinfo,
+					ps->disc_pid, ps->cred, ps->secid);
 		}
 	}
 }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 41d0237..600eb0a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2165,7 +2165,8 @@ extern int force_sigsegv(int, struct task_struct *);
 extern int force_sig_info(int, struct siginfo *, struct task_struct *);
 extern int __kill_pgrp_info(int sig, struct siginfo *info, struct pid *pgrp);
 extern int kill_pid_info(int sig, struct siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_uid(int, struct siginfo *, struct pid *, uid_t, uid_t, u32);
+extern int kill_pid_info_as_cred(int, struct siginfo *, struct pid *,
+				const struct cred *, u32);
 extern int kill_pgrp(struct pid *pid, int sig, int priv);
 extern int kill_pid(struct pid *pid, int sig, int priv);
 extern int kill_proc_info(int, struct siginfo *, pid_t);
diff --git a/kernel/signal.c b/kernel/signal.c
index 291c970..d252be2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1344,13 +1344,24 @@ int kill_proc_info(int sig, struct siginfo *info, pid_t pid)
 	return error;
 }
 
+static int kill_as_cred_perm(const struct cred *cred,
+			     struct task_struct *target)
+{
+	const struct cred *pcred = __task_cred(target);
+	if (cred->user_ns != pcred->user_ns)
+		return 0;
+	if (cred->euid != pcred->suid && cred->euid != pcred->uid &&
+	    cred->uid  != pcred->suid && cred->uid  != pcred->uid)
+		return 0;
+	return 1;
+}
+
 /* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
-		      uid_t uid, uid_t euid, u32 secid)
+int kill_pid_info_as_cred(int sig, struct siginfo *info, struct pid *pid,
+			 const struct cred *cred, u32 secid)
 {
 	int ret = -EINVAL;
 	struct task_struct *p;
-	const struct cred *pcred;
 	unsigned long flags;
 
 	if (!valid_signal(sig))
@@ -1362,10 +1373,7 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
 		ret = -ESRCH;
 		goto out_unlock;
 	}
-	pcred = __task_cred(p);
-	if (si_fromuser(info) &&
-	    euid != pcred->suid && euid != pcred->uid &&
-	    uid  != pcred->suid && uid  != pcred->uid) {
+	if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
 		ret = -EPERM;
 		goto out_unlock;
 	}
@@ -1384,7 +1392,7 @@ out_unlock:
 	rcu_read_unlock();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kill_pid_info_as_uid);
+EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
 
 /*
  * kill_something_info() interprets pid in interesting ways just like kill(2).
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/9] user namespace: make signal.c respect user namespaces (v4)
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
  2011-10-18 21:54 ` [PATCH 1/9] pid_ns: ensure pid is not freed during kill_pid_info_as_uid Serge Hallyn
  2011-10-18 21:54 ` [PATCH 2/9] user namespace: usb: make usb urbs user namespace aware (v2) Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-18 21:54 ` [PATCH 4/9] User namespace: don't allow sysctl in non-init user ns (v2) Serge Hallyn
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge Hallyn

From: Serge Hallyn <serge.hallyn@canonical.com>

ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
user namespace. (new, thanks Oleg)

__send_signal: convert current's uid to the recipient's user namespace
for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
again :)

do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
user namespace

ptrace_signal maps parent's uid into current's user namespace before
including in signal to current.  IIUC Oleg has argued that this shouldn't
matter as the debugger will play with it, but it seems like not converting
the value currently being set is misleading.

Changelog:
Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
	simplify callers and help make clear what we are translating
        (which uid into which namespace).  Passing the target task would
	make callers even easier to read, but we pass in user_ns because
	current_user_ns() != task_cred_xxx(current, user_ns).
Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
	in ptrace_signal().
Sep 23: In send_signal(), detect when (user) signal is coming from an
	ancestor or unrelated user namespace.  Pass that on to __send_signal,
	which sets si_uid to 0 or overflowuid if needed.
Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
	SI_FROMKERNEL cases at callers, because we can't assume sender is
	current in those cases.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 ipc/mqueue.c    |    7 ++++++-
 kernel/signal.c |   38 +++++++++++++++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index ed049ea..bdf57df 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -32,6 +32,7 @@
 #include <linux/nsproxy.h>
 #include <linux/pid.h>
 #include <linux/ipc_namespace.h>
+#include <linux/user_namespace.h>
 #include <linux/slab.h>
 
 #include <net/sock.h>
@@ -543,9 +544,13 @@ static void __do_notify(struct mqueue_inode_info *info)
 			sig_i.si_errno = 0;
 			sig_i.si_code = SI_MESGQ;
 			sig_i.si_value = info->notify.sigev_value;
+			/* map current pid/uid into info->owner's namespaces */
+			rcu_read_lock();
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = current_uid();
+			sig_i.si_uid = user_ns_map_uid(info->user->user_ns,
+						current_cred(), current_uid());
+			rcu_read_lock();
 
 			kill_pid_info(info->notify.sigev_signo,
 				      &sig_i, info->notify_owner);
diff --git a/kernel/signal.c b/kernel/signal.c
index d252be2..6a6c2f4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -28,6 +28,7 @@
 #include <linux/freezer.h>
 #include <linux/pid_namespace.h>
 #include <linux/nsproxy.h>
+#include <linux/user_namespace.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/signal.h>
 
@@ -1019,6 +1020,29 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
+/*
+ * map the uid in struct cred into user namespace *ns
+ */
+static inline uid_t map_cred_ns(const struct cred *cred,
+				struct user_namespace *ns)
+{
+	return user_ns_map_uid(ns, cred, cred->uid);
+}
+
+static inline void fixup_uid(struct siginfo *info, struct task_struct *t)
+{
+#ifdef CONFIG_USER_NS
+	if (current_user_ns() == task_cred_xxx(t, user_ns))
+#endif
+		return;
+
+	if (SI_FROMKERNEL(info))
+		return;
+
+	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
+					current_cred(), info->si_uid);
+}
+
 static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			int group, int from_ancestor_ns)
 {
@@ -1088,6 +1112,9 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 				q->info.si_pid = 0;
 			break;
 		}
+
+		fixup_uid(info, t);
+
 	} else if (!is_si_special(info)) {
 		if (sig >= SIGRTMIN && info->si_code != SI_USER) {
 			/*
@@ -1626,7 +1653,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
-	info.si_uid = __task_cred(tsk)->uid;
+	info.si_uid = map_cred_ns(__task_cred(tsk),
+			task_cred_xxx(tsk->parent, user_ns));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
@@ -1711,7 +1739,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
-	info.si_uid = __task_cred(tsk)->uid;
+	info.si_uid = map_cred_ns(__task_cred(tsk),
+			task_cred_xxx(parent, user_ns));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime);
@@ -2129,8 +2158,11 @@ static int ptrace_signal(int signr, siginfo_t *info,
 		info->si_signo = signr;
 		info->si_errno = 0;
 		info->si_code = SI_USER;
+		rcu_read_lock();
 		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = task_uid(current->parent);
+		info->si_uid = map_cred_ns(__task_cred(current->parent),
+				current_user_ns());
+		rcu_read_unlock();
 	}
 
 	/* If the (new) signal is now blocked, requeue it.  */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/9] User namespace: don't allow sysctl in non-init user ns (v2)
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (2 preceding siblings ...)
  2011-10-18 21:54 ` [PATCH 3/9] user namespace: make signal.c respect user namespaces (v4) Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-18 21:54 ` [PATCH 5/9] user namespace: clamp down users of cap_raised Serge Hallyn
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge Hallyn

From: Serge Hallyn <serge.hallyn@canonical.com>

sysctl.c has its own custom uid check, which is not user namespace
aware.  As discovered by Richard, that allows root in a container
privileged access to set all sysctls.

To fix that, don't compare uid or groups if current is not in the
initial user namespace.  We may at some point want to relax that check
so that some sysctls are allowed - for instance dmesg_restrict when
syslog is containerized.

Changelog:
Sep 22: As Miquel van Smoorenburg pointed out, rather than always
	refusing access if not in initial user_ns, we should allow
	world access rights to sysctl files.  We just want to prevent
	a task in a non-init user namespace from getting the root user
	or group access rights.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: richard@nod.at
Cc: Miquel van Smoorenburg <mikevs@xs4all.net>
---
 kernel/sysctl.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 11d65b5..95988dc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1697,10 +1697,12 @@ void register_sysctl_root(struct ctl_table_root *root)
 
 static int test_perm(int mode, int op)
 {
-	if (!current_euid())
-		mode >>= 6;
-	else if (in_egroup_p(0))
-		mode >>= 3;
+	if (current_user_ns() == &init_user_ns) {
+		if (!current_euid())
+			mode >>= 6;
+		else if (in_egroup_p(0))
+			mode >>= 3;
+	}
 	if ((op & ~mode & (MAY_READ|MAY_WRITE|MAY_EXEC)) == 0)
 		return 0;
 	return -EACCES;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/9] user namespace: clamp down users of cap_raised
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (3 preceding siblings ...)
  2011-10-18 21:54 ` [PATCH 4/9] User namespace: don't allow sysctl in non-init user ns (v2) Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-19  4:33   ` Andrew G. Morgan
  2011-10-19  9:01   ` David Howells
  2011-10-18 21:54 ` [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge E. Hallyn, Andrew Morgan

From: "Serge E. Hallyn" <serge.hallyn@canonical.com>

A few modules are using cap_raised(current_cap(), cap) to authorize
actions.  This means that tasks which are privileged in non-initial
user namespaces will be deemed privileged.  The privilege should only
be granted if the task is in the initial user namespace.

Switching the calls to capable() would change the behavior - it would
cause the LSM capable hooks to be called, and set PF_SUPERPRIV if
the capability was used.  So instead, put in an explicit check and
refuse privilege if the caller is not in init_user_ns.

Vasiliy had suggested introducing a new helper for this.  I'm open
to suggestions, but for four callers and for a discouraged idiom,
I'd rather not pollute the capable* function namespace with a bad name.
(even has_capability goes through the LSM hooks)

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
---
 drivers/block/drbd/drbd_nl.c           |    5 +++++
 drivers/md/dm-log-userspace-transfer.c |    3 +++
 drivers/staging/pohmelfs/config.c      |    3 +++
 drivers/video/uvesafb.c                |    3 +++
 4 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 0feab26..9a87a14 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2297,6 +2297,11 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
 		return;
 	}
 
+	if (current_user_ns() != &init_user_ns) {
+		retcode = ERR_PERM;
+		goto fail;
+	}
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 1f23e04..140ca81 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -134,6 +134,9 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c
index b6c42cb..cd259d0 100644
--- a/drivers/staging/pohmelfs/config.c
+++ b/drivers/staging/pohmelfs/config.c
@@ -525,6 +525,9 @@ static void pohmelfs_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *n
 {
 	int err;
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 7f8472c..71dab8e 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -73,6 +73,9 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;
 
+	if (current_user_ns() != &init_user_ns)
+		return;
+
 	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3)
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (4 preceding siblings ...)
  2011-10-18 21:54 ` [PATCH 5/9] user namespace: clamp down users of cap_raised Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-18 21:54 ` [PATCH 7/9] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge E. Hallyn, Serge E. Hallyn, Randy Dunlap

From: "Serge E. Hallyn" <serge@hallyn.com>

Provide a description of user namespaces

Changelog:
   jul 26: incorporate feedback from David Howells.
   jul 29: incorporate feedback from Randy Dunlap.
   sep 15: remove information which is not yet certain.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
---
 Documentation/namespaces/user_namespace.txt |   51 +++++++++++++++++++++++++++
 1 files changed, 51 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/namespaces/user_namespace.txt

diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
new file mode 100644
index 0000000..0ca6564
--- /dev/null
+++ b/Documentation/namespaces/user_namespace.txt
@@ -0,0 +1,51 @@
+Description
+===========
+
+Traditionally, each task is owned by a user ID (UID) and belongs to one or more
+groups (GID).  Both are simple numeric IDs, though userspace usually translates
+them to names.  The user namespace allows tasks to have different views of the
+UIDs and GIDs associated with tasks and other resources.  (See 'UID mapping'
+below for more.)
+
+The user namespace is a simple hierarchical one.  The system starts with all
+tasks belonging to the initial user namespace.  A task creates a new user
+namespace by passing the CLONE_NEWUSER flag to clone(2).  This requires the
+creating task to have the CAP_SETUID, CAP_SETGID, and CAP_CHOWN capabilities,
+but it does not need to be running as root.  The clone(2) call will result in a
+new task which to itself appears to be running as UID and GID 0, but to its
+creator seems to have the creator's credentials.
+
+To this new task, any resource belonging to the initial user namespace will
+appear to belong to user and group 'nobody', which are UID and GID -1.
+Permission to open such files will be granted according to world access
+permissions.  UID comparisons and group membership checks will return false,
+and privilege will be denied.
+
+When a task belonging to (for example) userid 500 in the initial user namespace
+creates a new user namespace, even though the new task will see itself as
+belonging to UID 0, any task in the initial user namespace will see it as
+belonging to UID 500.  Therefore, UID 500 in the initial user namespace will be
+able to kill the new task.
+
+Userid mapping for the VFS is not yet implemented, though prototypes exist.
+
+Relationship between the User namespace and other namespaces
+============================================================
+
+Other namespaces, such as UTS and network, are owned by a user namespace.  When
+such a namespace is created, it is assigned to the user namespace of the task
+by which it was created.  Therefore, attempts to exercise privilege to
+resources in, for instance, a particular network namespace, can be properly
+validated by checking whether the caller has the needed privilege (i.e.
+CAP_NET_ADMIN) targeted to the user namespace which owns the network namespace.
+This is done using the ns_capable() function.
+
+As an example, if a new task is cloned with a private user namespace but
+no private network namespace, then the task's network namespace is owned
+by the parent user namespace.  The new task has no privilege to the
+parent user namespace, so it will not be able to create or configure
+network devices.  If, instead, the task were cloned with both private
+user and network namespaces, then the private network namespace is owned
+by the private user namespace, and so root in the new user namespace
+will have privilege targeted to the network namespace.  It will be able
+to create and configure network devices.
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 7/9] user namespace: make each net (net_ns) belong to a user_ns
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (5 preceding siblings ...)
  2011-10-18 21:54 ` [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-18 21:54 ` [PATCH 8/9] protect cap_netlink_recv from user namespaces Serge Hallyn
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge E. Hallyn, netdev

From: "Serge E. Hallyn" <serge.hallyn@canonical.com>

The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

Changelog:
   jul 8: nsproxy: don't assign netns->userns if not cloning.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: netdev@vger.kernel.org
---
 include/net/net_namespace.h |    2 ++
 kernel/nsproxy.c            |    2 ++
 net/core/net_namespace.c    |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 3bb6fa0..d91fe5f 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -29,6 +29,7 @@ struct ctl_table_header;
 struct net_generic;
 struct sock;
 struct netns_ipvs;
+struct user_namespace;
 
 
 #define NETDEV_HASHBITS    8
@@ -101,6 +102,7 @@ struct net {
 	struct netns_xfrm	xfrm;
 #endif
 	struct netns_ipvs	*ipvs;
+	struct user_namespace	*user_ns;
 };
 
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 9aeab4b..0d5bf8d 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -95,6 +95,8 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 		err = PTR_ERR(new_nsp->net_ns);
 		goto out_net;
 	}
+	if (flags & CLONE_NEWNET)
+		new_nsp->net_ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 
 	return new_nsp;
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 5bbdbf0..791c19c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -10,6 +10,7 @@
 #include <linux/nsproxy.h>
 #include <linux/proc_fs.h>
 #include <linux/file.h>
+#include <linux/user_namespace.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -209,6 +210,7 @@ static void net_free(struct net *net)
 	}
 #endif
 	kfree(net->gen);
+	put_user_ns(net->user_ns);
 	kmem_cache_free(net_cachep, net);
 }
 
@@ -389,6 +391,7 @@ static int __init net_ns_init(void)
 	rcu_assign_pointer(init_net.gen, ng);
 
 	mutex_lock(&net_mutex);
+	init_net.user_ns = &init_user_ns;
 	if (setup_net(&init_net))
 		panic("Could not setup the initial network namespace");
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 8/9] protect cap_netlink_recv from user namespaces
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (6 preceding siblings ...)
  2011-10-18 21:54 ` [PATCH 7/9] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-18 21:54 ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Serge Hallyn
  2011-10-19  9:36 ` [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3) David Howells
  9 siblings, 0 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge E. Hallyn, netdev

From: "Serge E. Hallyn" <serge.hallyn@canonical.com>

cap_netlink_recv() was granting privilege if a capability is in
current_cap(), regardless of the user namespace.  Fix that by
targeting the capability check against the user namespace which
owns the skb.

Caller passes the user ns down because sock_net is static inline defined in
net/sock.h, which we'd rather not #include at the cap_netlink_recv function.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: netdev@vger.kernel.org
---
 drivers/scsi/scsi_netlink.c     |    3 ++-
 include/linux/security.h        |   14 +++++++++-----
 kernel/audit.c                  |    6 ++++--
 net/core/rtnetlink.c            |    3 ++-
 net/decnet/netfilter/dn_rtmsg.c |    3 ++-
 net/ipv4/netfilter/ip_queue.c   |    3 ++-
 net/ipv6/netfilter/ip6_queue.c  |    3 ++-
 net/netfilter/nfnetlink.c       |    2 +-
 net/netlink/genetlink.c         |    2 +-
 net/xfrm/xfrm_user.c            |    2 +-
 security/commoncap.c            |    6 ++----
 security/security.c             |    4 ++--
 security/selinux/hooks.c        |    5 +++--
 13 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 26a8a45..0aa2e57 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -111,7 +111,8 @@ scsi_nl_rcv_msg(struct sk_buff *skb)
 			goto next_msg;
 		}
 
-		if (security_netlink_recv(skb, CAP_SYS_ADMIN)) {
+		if (security_netlink_recv(skb, CAP_SYS_ADMIN,
+					  sock_net(skb->sk)->user_ns)) {
 			err = -EPERM;
 			goto next_msg;
 		}
diff --git a/include/linux/security.h b/include/linux/security.h
index ebd2a53..cfa1f47 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -95,7 +95,8 @@ struct xfrm_user_sec_ctx;
 struct seq_file;
 
 extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb);
-extern int cap_netlink_recv(struct sk_buff *skb, int cap);
+extern int cap_netlink_recv(struct sk_buff *skb, int cap,
+			    struct user_namespace *ns);
 
 void reset_security_ops(void);
 
@@ -797,6 +798,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	@skb.
  *	@skb contains the sk_buff structure for the netlink message.
  *	@cap indicates the capability required
+ *	@ns is the user namespace which owns skb
  *	Return 0 if permission is granted.
  *
  * Security hooks for Unix domain networking.
@@ -1557,7 +1559,8 @@ struct security_operations {
 			  struct sembuf *sops, unsigned nsops, int alter);
 
 	int (*netlink_send) (struct sock *sk, struct sk_buff *skb);
-	int (*netlink_recv) (struct sk_buff *skb, int cap);
+	int (*netlink_recv) (struct sk_buff *skb, int cap,
+			     struct user_namespace *ns);
 
 	void (*d_instantiate) (struct dentry *dentry, struct inode *inode);
 
@@ -1806,7 +1809,7 @@ void security_d_instantiate(struct dentry *dentry, struct inode *inode);
 int security_getprocattr(struct task_struct *p, char *name, char **value);
 int security_setprocattr(struct task_struct *p, char *name, void *value, size_t size);
 int security_netlink_send(struct sock *sk, struct sk_buff *skb);
-int security_netlink_recv(struct sk_buff *skb, int cap);
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns);
 int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
 int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void security_release_secctx(char *secdata, u32 seclen);
@@ -2498,9 +2501,10 @@ static inline int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return cap_netlink_send(sk, skb);
 }
 
-static inline int security_netlink_recv(struct sk_buff *skb, int cap)
+static inline int security_netlink_recv(struct sk_buff *skb, int cap,
+					struct user_namespace *ns)
 {
-	return cap_netlink_recv(skb, cap);
+	return cap_netlink_recv(skb, cap, ns);
 }
 
 static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
diff --git a/kernel/audit.c b/kernel/audit.c
index 0a1355c..48144c4 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -601,13 +601,15 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
 	case AUDIT_TTY_SET:
 	case AUDIT_TRIM:
 	case AUDIT_MAKE_EQUIV:
-		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL))
+		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	case AUDIT_USER:
 	case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
 	case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
-		if (security_netlink_recv(skb, CAP_AUDIT_WRITE))
+		if (security_netlink_recv(skb, CAP_AUDIT_WRITE,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	default:  /* bad msg */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 99d9e95..4a444de 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1931,7 +1931,8 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	sz_idx = type>>2;
 	kind = type&3;
 
-	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN,
+					       net->user_ns))
 		return -EPERM;
 
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 69975e0..2d052ab 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -108,7 +108,8 @@ static inline void dnrmg_receive_user_skb(struct sk_buff *skb)
 	if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+	    sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	/* Eventually we might send routing messages too */
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index e59aabd..d20bede 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -430,7 +430,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index e63c397..09db01c 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -431,7 +431,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 1905976..bcaff9d 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -130,7 +130,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	const struct nfnetlink_subsystem *ss;
 	int type, err;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	/* All the messages must at least contain nfgenmsg */
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 482fa57..00a101c 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -516,7 +516,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return -EOPNOTSUPP;
 
 	if ((ops->flags & GENL_ADMIN_PERM) &&
-	    security_netlink_recv(skb, CAP_NET_ADMIN))
+	    security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 0256b8a..1808e1e 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2290,7 +2290,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	link = &xfrm_dispatch[type];
 
 	/* All operations require privileges, even GET */
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||
diff --git a/security/commoncap.c b/security/commoncap.c
index a93b3b7..1e48e6a 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -56,11 +56,9 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-int cap_netlink_recv(struct sk_buff *skb, int cap)
+int cap_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	if (!cap_raised(current_cap(), cap))
-		return -EPERM;
-	return 0;
+	return security_capable(ns, current_cred(), cap);
 }
 EXPORT_SYMBOL(cap_netlink_recv);
 
diff --git a/security/security.c b/security/security.c
index 0e4fccf..0a1453e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -941,9 +941,9 @@ int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return security_ops->netlink_send(sk, skb);
 }
 
-int security_netlink_recv(struct sk_buff *skb, int cap)
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	return security_ops->netlink_recv(skb, cap);
+	return security_ops->netlink_recv(skb, cap, ns);
 }
 EXPORT_SYMBOL(security_netlink_recv);
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 266a229..fe290bb 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4723,13 +4723,14 @@ static int selinux_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return selinux_nlmsg_perm(sk, skb);
 }
 
-static int selinux_netlink_recv(struct sk_buff *skb, int capability)
+static int selinux_netlink_recv(struct sk_buff *skb, int capability,
+				struct user_namespace *ns)
 {
 	int err;
 	struct common_audit_data ad;
 	u32 sid;
 
-	err = cap_netlink_recv(skb, capability);
+	err = cap_netlink_recv(skb, capability, ns);
 	if (err)
 		return err;
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (7 preceding siblings ...)
  2011-10-18 21:54 ` [PATCH 8/9] protect cap_netlink_recv from user namespaces Serge Hallyn
@ 2011-10-18 21:54 ` Serge Hallyn
  2011-10-18 22:14   ` Joe Perches
  2011-10-19 13:52   ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Eric W. Biederman
  2011-10-19  9:36 ` [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3) David Howells
  9 siblings, 2 replies; 34+ messages in thread
From: Serge Hallyn @ 2011-10-18 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh, dhowells,
	eparis, Serge E. Hallyn, netdev

From: "Serge E. Hallyn" <serge.hallyn@canonical.com>

Currently uids are compared without regard for the user namespace.
Fix that to prevent tasks in a different user namespace from
wrongly matching on SCM_CREDENTIALS.

In the past, either your uids had to match, or you had to have
CAP_SETXID.  In a namespaced world, you must either (both be in the
same user namespace and have your uids match), or you must have
CAP_SETXID targeted at the other user namespace.  The latter can
happen for instance if uid 500 created a new user namespace and
now interacts with uid 0 in it.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: netdev@vger.kernel.org
---
 net/core/scm.c |   41 ++++++++++++++++++++++++++++++++++-------
 1 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index 811b53f..4f376bf 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -43,17 +43,44 @@
  *	setu(g)id.
  */
 
-static __inline__ int scm_check_creds(struct ucred *creds)
+static __inline__ bool uidequiv(const struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (src->uid == tgt->uid || src->euid == tgt->uid ||
+	    src->suid == tgt->uid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETUID))
+		return true;
+	return false;
+}
+
+static __inline__ bool gidequiv(const struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (src->gid == tgt->gid || src->egid == tgt->gid ||
+	    src->sgid == tgt->gid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETGID))
+		return true;
+	return false;
+}
+
+static __inline__ int scm_check_creds(struct ucred *creds, struct socket *sock)
 {
 	const struct cred *cred = current_cred();
+	struct user_namespace *ns = sock_net(sock->sk)->user_ns;
 
-	if ((creds->pid == task_tgid_vnr(current) || capable(CAP_SYS_ADMIN)) &&
-	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
-	      creds->uid == cred->suid) || capable(CAP_SETUID)) &&
-	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
-	      creds->gid == cred->sgid) || capable(CAP_SETGID))) {
+	if ((creds->pid == task_tgid_vnr(current) || ns_capable(ns, CAP_SYS_ADMIN)) &&
+	     uidequiv(cred, creds, ns) && gidequiv(cred, creds, ns)) {
 	       return 0;
 	}
+
 	return -EPERM;
 }
 
@@ -169,7 +196,7 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct ucred)))
 				goto error;
 			memcpy(&p->creds, CMSG_DATA(cmsg), sizeof(struct ucred));
-			err = scm_check_creds(&p->creds);
+			err = scm_check_creds(&p->creds, sock);
 			if (err)
 				goto error;
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-18 21:54 ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Serge Hallyn
@ 2011-10-18 22:14   ` Joe Perches
  2011-10-18 23:22     ` Serge E. Hallyn
  2011-10-19 13:52   ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Eric W. Biederman
  1 sibling, 1 reply; 34+ messages in thread
From: Joe Perches @ 2011-10-18 22:14 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, ebiederm, akpm, oleg, richard, mikevs, segoon,
	gregkh, dhowells, eparis, Serge E. Hallyn, netdev

On Tue, 2011-10-18 at 21:54 +0000, Serge Hallyn wrote:
> From: "Serge E. Hallyn" <serge.hallyn@canonical.com>

Hi Serge.

Just some trivial style notes.

> Currently uids are compared without regard for the user namespace.
> Fix that to prevent tasks in a different user namespace from
> wrongly matching on SCM_CREDENTIALS.
[]
> diff --git a/net/core/scm.c b/net/core/scm.c

> -static __inline__ int scm_check_creds(struct ucred *creds)
> +static __inline__ bool uidequiv(const struct cred *src, struct ucred *tgt,
> +			       struct user_namespace *ns)

Perhaps inline is better than __inline__ and do these
functions really need to be marked inline at all?

> +{
> +	if (src->user_ns != ns)
> +		goto check_capable;
> +	if (src->uid == tgt->uid || src->euid == tgt->uid ||
> +	    src->suid == tgt->uid)

Perhaps this is less prone to typo errors and are a bit
more readable as:

	if (tgt->uid == src->uid ||
	    tgt->uid == src->euid ||
	    tgt->uid == src->suid)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-18 22:14   ` Joe Perches
@ 2011-10-18 23:22     ` Serge E. Hallyn
  2011-10-19  2:25       ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware (v2) Serge E. Hallyn
  0 siblings, 1 reply; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-18 23:22 UTC (permalink / raw)
  To: Joe Perches
  Cc: linux-kernel, ebiederm, akpm, oleg, richard, mikevs, segoon,
	gregkh, dhowells, eparis, Serge E. Hallyn, netdev

Quoting Joe Perches (joe@perches.com):
> On Tue, 2011-10-18 at 21:54 +0000, Serge Hallyn wrote:
> > From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
> 
> Hi Serge.
> 
> Just some trivial style notes.
> 
> > Currently uids are compared without regard for the user namespace.
> > Fix that to prevent tasks in a different user namespace from
> > wrongly matching on SCM_CREDENTIALS.
> []
> > diff --git a/net/core/scm.c b/net/core/scm.c
> 
> > -static __inline__ int scm_check_creds(struct ucred *creds)
> > +static __inline__ bool uidequiv(const struct cred *src, struct ucred *tgt,
> > +			       struct user_namespace *ns)
> 
> Perhaps inline is better than __inline__ and do these
> functions really need to be marked inline at all?

Dunno, I was just sticking with the current style.

> > +{
> > +	if (src->user_ns != ns)
> > +		goto check_capable;
> > +	if (src->uid == tgt->uid || src->euid == tgt->uid ||
> > +	    src->suid == tgt->uid)
> 
> Perhaps this is less prone to typo errors and are a bit
> more readable as:
> 
> 	if (tgt->uid == src->uid ||
> 	    tgt->uid == src->euid ||
> 	    tgt->uid == src->suid)

I do like that better.

thanks,
-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware (v2)
  2011-10-18 23:22     ` Serge E. Hallyn
@ 2011-10-19  2:25       ` Serge E. Hallyn
  0 siblings, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-19  2:25 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Joe Perches, linux-kernel, ebiederm, akpm, oleg, richard, mikevs,
	segoon, gregkh, dhowells, eparis, netdev

(Thanks for the suggestions, Joe.)

Currently uids are compared without regard for the user namespace.
Fix that to prevent tasks in a different user namespace from
wrongly matching on SCM_CREDENTIALS.

In the past, either your uids had to match, or you had to have
CAP_SETXID.  In a namespaced world, you must either (both be in the
same user namespace and have your uids match), or you must have
CAP_SETXID targeted at the other user namespace.  The latter can
happen for instance if uid 500 created a new user namespace and
now interacts with uid 0 in it.

Changelog: Oct 18:
	Per Joe Perches: don't mark uidequiv and gidequiv fns inline
	(let the compiler do that if appropriate), and change the flow
	of id comparisons to make it clearer.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Joe Perches <joe@perches.com>
---
 net/core/scm.c |   43 ++++++++++++++++++++++++++++++++++++-------
 1 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index 811b53f..2261607 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -43,17 +43,46 @@
  *	setu(g)id.
  */
 
-static __inline__ int scm_check_creds(struct ucred *creds)
+static bool uidequiv(const struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (tgt->uid == src->uid ||
+	    tgt->uid == src->euid ||
+	    tgt->uid == src->suid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETUID))
+		return true;
+	return false;
+}
+
+static bool gidequiv(const struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (tgt->gid == src->gid ||
+	    tgt->gid == src->egid ||
+	    tgt->gid == src->sgid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETGID))
+		return true;
+	return false;
+}
+
+static int scm_check_creds(struct ucred *creds, struct socket *sock)
 {
 	const struct cred *cred = current_cred();
+	struct user_namespace *ns = sock_net(sock->sk)->user_ns;
 
-	if ((creds->pid == task_tgid_vnr(current) || capable(CAP_SYS_ADMIN)) &&
-	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
-	      creds->uid == cred->suid) || capable(CAP_SETUID)) &&
-	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
-	      creds->gid == cred->sgid) || capable(CAP_SETGID))) {
+	if ((creds->pid == task_tgid_vnr(current) || ns_capable(ns, CAP_SYS_ADMIN)) &&
+	     uidequiv(cred, creds, ns) && gidequiv(cred, creds, ns)) {
 	       return 0;
 	}
+
 	return -EPERM;
 }
 
@@ -169,7 +198,7 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct ucred)))
 				goto error;
 			memcpy(&p->creds, CMSG_DATA(cmsg), sizeof(struct ucred));
-			err = scm_check_creds(&p->creds);
+			err = scm_check_creds(&p->creds, sock);
 			if (err)
 				goto error;
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/9] user namespace: clamp down users of cap_raised
  2011-10-18 21:54 ` [PATCH 5/9] user namespace: clamp down users of cap_raised Serge Hallyn
@ 2011-10-19  4:33   ` Andrew G. Morgan
  2011-10-20 13:01     ` Serge E. Hallyn
  2011-10-19  9:01   ` David Howells
  1 sibling, 1 reply; 34+ messages in thread
From: Andrew G. Morgan @ 2011-10-19  4:33 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, ebiederm, akpm, oleg, richard, mikevs, segoon,
	gregkh, dhowells, eparis, Serge E. Hallyn

On Tue, Oct 18, 2011 at 2:54 PM, Serge Hallyn <serge@hallyn.com> wrote:
> From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
>
> A few modules are using cap_raised(current_cap(), cap) to authorize
> actions.  This means that tasks which are privileged in non-initial
> user namespaces will be deemed privileged.  The privilege should only
> be granted if the task is in the initial user namespace.
>
> Switching the calls to capable() would change the behavior - it would
> cause the LSM capable hooks to be called, and set PF_SUPERPRIV if
> the capability was used.  So instead, put in an explicit check and
> refuse privilege if the caller is not in init_user_ns.
>
> Vasiliy had suggested introducing a new helper for this.  I'm open
> to suggestions, but for four callers and for a discouraged idiom,
> I'd rather not pollute the capable* function namespace with a bad name.
> (even has_capability goes through the LSM hooks)
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Andrew Morgan <morgan@kernel.org>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> ---
>  drivers/block/drbd/drbd_nl.c           |    5 +++++
>  drivers/md/dm-log-userspace-transfer.c |    3 +++
>  drivers/staging/pohmelfs/config.c      |    3 +++
>  drivers/video/uvesafb.c                |    3 +++
>  4 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
> index 0feab26..9a87a14 100644
> --- a/drivers/block/drbd/drbd_nl.c
> +++ b/drivers/block/drbd/drbd_nl.c
> @@ -2297,6 +2297,11 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
>                return;
>        }
>
> +       if (current_user_ns() != &init_user_ns) {

I'd feel much happier with this if it did use some inline or macro here.

#define NS_IS_NON_DEFAULT (current_user_ns() != &init_user_ns)

if (NS_IS_NON_DEFAULT || !cap_raised(current_cap(), CAP_SYS_ADMIN)) {
        retcode = ERR_PERM;
        goto fail;
}

Is it important to support the situation that NS support is not configured?

Cheers

Andrew


> +               retcode = ERR_PERM;
> +               goto fail;
> +       }
> +
>        if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
>                retcode = ERR_PERM;
>                goto fail;
> diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
> index 1f23e04..140ca81 100644
> --- a/drivers/md/dm-log-userspace-transfer.c
> +++ b/drivers/md/dm-log-userspace-transfer.c
> @@ -134,6 +134,9 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
>  {
>        struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
>
> +       if (current_user_ns() != &init_user_ns)
> +               return;
> +
>        if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
> diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c
> index b6c42cb..cd259d0 100644
> --- a/drivers/staging/pohmelfs/config.c
> +++ b/drivers/staging/pohmelfs/config.c
> @@ -525,6 +525,9 @@ static void pohmelfs_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *n
>  {
>        int err;
>
> +       if (current_user_ns() != &init_user_ns)
> +               return;
> +
>        if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
> diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
> index 7f8472c..71dab8e 100644
> --- a/drivers/video/uvesafb.c
> +++ b/drivers/video/uvesafb.c
> @@ -73,6 +73,9 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
>        struct uvesafb_task *utask;
>        struct uvesafb_ktask *task;
>
> +       if (current_user_ns() != &init_user_ns)
> +               return;
> +
>        if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
> --
> 1.7.5.4
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/9] user namespace: clamp down users of cap_raised
  2011-10-18 21:54 ` [PATCH 5/9] user namespace: clamp down users of cap_raised Serge Hallyn
  2011-10-19  4:33   ` Andrew G. Morgan
@ 2011-10-19  9:01   ` David Howells
  2011-10-20 13:16     ` Serge E. Hallyn
  2011-10-24 14:43     ` [PATCH 05/10] " Serge E. Hallyn
  1 sibling, 2 replies; 34+ messages in thread
From: David Howells @ 2011-10-19  9:01 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: dhowells, Serge Hallyn, linux-kernel, ebiederm, akpm, oleg,
	richard, mikevs, segoon, gregkh, eparis, Serge E. Hallyn


> #define NS_IS_NON_DEFAULT (current_user_ns() != &init_user_ns)

How about:

	#define IN_ROOT_USER_NS (current_user_ns() == &init_user_ns)

And then:

	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN)) {

I think it reads better.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3)
  2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
                   ` (8 preceding siblings ...)
  2011-10-18 21:54 ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Serge Hallyn
@ 2011-10-19  9:36 ` David Howells
  2011-10-20 12:58   ` Serge E. Hallyn
  2011-10-26 20:33   ` [PATCH 06/10] Add Documentation/namespaces/user_namespace.txt (v4) Serge E. Hallyn
  9 siblings, 2 replies; 34+ messages in thread
From: David Howells @ 2011-10-19  9:36 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: dhowells, linux-kernel, ebiederm, akpm, oleg, richard, mikevs,
	segoon, gregkh, eparis, Serge E. Hallyn, Randy Dunlap

Serge Hallyn <serge@hallyn.com> wrote:

> To this new task, any resource belonging to the initial user namespace will
> appear to belong to user and group 'nobody', which are UID and GID -1.
> Permission to open such files will be granted according to world access
> permissions.  UID comparisons and group membership checks will return false,
> and privilege will be denied.

The last comma there is unnecessary, I think.  You might also want to say
'will fail' rather than 'will return false', but I'm not sure that sums it up
correctly.

> When a task belonging to (for example) userid 500 in the initial user namespace

Why switch to talking about 'userid'?  This should probably be 'UID'.

> Userid mapping for the VFS is not yet implemented, though prototypes exist.

Ditto.

> ... Therefore, attempts to exercise privilege to resources in, for instance,
> a particular network namespace, can be properly validated by checking whether
> the caller has the needed privilege (i.e.  CAP_NET_ADMIN) targeted to the
> user namespace which owns the network namespace.

That sentence looks rather clumsy.  I think you need to split the statement
from the example.

  Other namespaces, such as UTS and network, are owned by a user namespace.
  When such a namespace is created, it is assigned to [owned by? associated
  with?] the user namespace of the task by which it was created.  Attempts to
  exercise privilege in the new namespace are properly validated by checking
  whether the caller has the needed privilege targeted to [granted by?] the
  user namespace that owns the new namespace.  For instance, to use the
  resources in a network namespace, a check must be made that the caller has
  [has been granted?] the CAP_NET_ADMIN privilege.  This is done using the
  ns_capable() function.

You may want to list here what CAPs correspond to what namespaces.

> As an example, if a new task is cloned with a private user namespace but
> no

'not a' instead of 'no'?

> private network namespace, then the task's network namespace is owned
> by the parent user namespace.  The new task has no

Insert 'special' here?

> privilege to

s/to/over/ perhaps?

> the
> parent user namespace, so it will not be able to create or configure

'the'

> network devices

Insert 'therein'?

> .  If,

I don't think you need the comma here.  The 'instead' is the if condition.

> instead, the task were cloned with both private
> user and network namespaces, then the private network namespace is owned
> by the private user namespace, and so root in the new user namespace
> will have privilege targeted to

Interestingly, in these two paragraphs, you've used 'targeted to' in both
directions.

	whether the caller has the needed privilege (...) targeted to the user
	namespace

vs

	the new user namespace will have privilege targeted to the network
	namespace

You might want to consider changing one of them.  I would suggest 'granted by'
for the first and 'targeted at [users of]' for the second.

> the network namespace.  It will be able
> to create and configure network devices.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-18 21:54 ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Serge Hallyn
  2011-10-18 22:14   ` Joe Perches
@ 2011-10-19 13:52   ` Eric W. Biederman
  2011-10-20 12:58     ` Serge E. Hallyn
  1 sibling, 1 reply; 34+ messages in thread
From: Eric W. Biederman @ 2011-10-19 13:52 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, akpm, oleg, richard, mikevs, segoon, gregkh,
	dhowells, eparis, Serge E. Hallyn, netdev

Serge Hallyn <serge@hallyn.com> writes:

> From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
>
> Currently uids are compared without regard for the user namespace.
> Fix that to prevent tasks in a different user namespace from
> wrongly matching on SCM_CREDENTIALS.
>
> In the past, either your uids had to match, or you had to have
> CAP_SETXID.  In a namespaced world, you must either (both be in the
> same user namespace and have your uids match), or you must have
> CAP_SETXID targeted at the other user namespace.  The latter can
> happen for instance if uid 500 created a new user namespace and
> now interacts with uid 0 in it.

Serge this approach is wrong.

Because we pass the cred and the pid through the socket socket itself
is just a conduit and should be ignored in this context.

The only interesting test should be are you allowed to impersonate other
users in your current userk namespace.

So it should be possible to simplify the entire patch to just:
 static __inline__ int scm_check_creds(struct ucred *creds)
 {
 	const struct cred *cred = current_cred();
+	struct user_namespace *ns = cred->user_ns;

-	if ((creds->pid == task_tgid_vnr(current) || capable(CAP_SYS_ADMIN)) &&
-	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
-	      creds->uid == cred->suid) || capable(CAP_SETUID)) &&
-	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
-	      creds->gid == cred->sgid) || capable(CAP_SETGID))) {
+	if ((creds->pid == task_tgid_vnr(current) || ns_capable(ns, CAP_SYS_ADMIN)) &&
+	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
+	      creds->uid == cred->suid) || ns_capable(ns, CAP_SETUID)) &&
+	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
+	      creds->gid == cred->sgid) || ns_capable(ns, CAP_SETGID))) {
  	       return 0;
  	}
  	return -EPERM;
  }

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-19 13:52   ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Eric W. Biederman
@ 2011-10-20 12:58     ` Serge E. Hallyn
  2011-10-20 13:35       ` Eric W. Biederman
  0 siblings, 1 reply; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-20 12:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, akpm, oleg, richard, mikevs, segoon, gregkh,
	dhowells, eparis, Serge E. Hallyn, netdev

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Serge Hallyn <serge@hallyn.com> writes:
> 
> > From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
> >
> > Currently uids are compared without regard for the user namespace.
> > Fix that to prevent tasks in a different user namespace from
> > wrongly matching on SCM_CREDENTIALS.
> >
> > In the past, either your uids had to match, or you had to have
> > CAP_SETXID.  In a namespaced world, you must either (both be in the
> > same user namespace and have your uids match), or you must have
> > CAP_SETXID targeted at the other user namespace.  The latter can
> > happen for instance if uid 500 created a new user namespace and
> > now interacts with uid 0 in it.
> 
> Serge this approach is wrong.

Thanks for looking, Eric.

> Because we pass the cred and the pid through the socket socket itself
> is just a conduit and should be ignored in this context.

Ok, that makes sense, but

> The only interesting test should be are you allowed to impersonate other
> users in your current userk namespace.

Why in your current user namespace?  Shouldn't it be in the
target user ns?  I understand it could be wrong to tie the
user ns owning the socket to the target userns (though I still
kind of like it), but just because I have CAP_SETUID in my
own user_ns doesn't mean I should be able to pose as another
uid in your user_ns.

(Now I also see that cred_to_ucred() translates to the current
user_ns, so that should have been a hint to me before about
your intent, but I'm not convinced I agree with your intent).

And you do the same with the pid.  Why is that a valid assumption?

(I've got that feeling that I'll feel like a dunce once you explain :)

> So it should be possible to simplify the entire patch to just:
>  static __inline__ int scm_check_creds(struct ucred *creds)
>  {
>  	const struct cred *cred = current_cred();
> +	struct user_namespace *ns = cred->user_ns;
> 
> -	if ((creds->pid == task_tgid_vnr(current) || capable(CAP_SYS_ADMIN)) &&
> -	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
> -	      creds->uid == cred->suid) || capable(CAP_SETUID)) &&
> -	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
> -	      creds->gid == cred->sgid) || capable(CAP_SETGID))) {
> +	if ((creds->pid == task_tgid_vnr(current) || ns_capable(ns, CAP_SYS_ADMIN)) &&
> +	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
> +	      creds->uid == cred->suid) || ns_capable(ns, CAP_SETUID)) &&
> +	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
> +	      creds->gid == cred->sgid) || ns_capable(ns, CAP_SETGID))) {
>   	       return 0;
>   	}
>   	return -EPERM;
>   }

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3)
  2011-10-19  9:36 ` [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3) David Howells
@ 2011-10-20 12:58   ` Serge E. Hallyn
  2011-10-26 20:33   ` [PATCH 06/10] Add Documentation/namespaces/user_namespace.txt (v4) Serge E. Hallyn
  1 sibling, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-20 12:58 UTC (permalink / raw)
  To: David Howells
  Cc: linux-kernel, ebiederm, akpm, oleg, richard, mikevs, segoon,
	gregkh, eparis, Serge E. Hallyn, Randy Dunlap

Quoting David Howells (dhowells@redhat.com):
> Serge Hallyn <serge@hallyn.com> wrote:

Thanks, David, I'll incorporate these.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/9] user namespace: clamp down users of cap_raised
  2011-10-19  4:33   ` Andrew G. Morgan
@ 2011-10-20 13:01     ` Serge E. Hallyn
  0 siblings, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-20 13:01 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: linux-kernel, ebiederm, akpm, oleg, richard, mikevs, segoon,
	gregkh, dhowells, eparis, Serge E. Hallyn

Quoting Andrew G. Morgan (morgan@kernel.org):
> On Tue, Oct 18, 2011 at 2:54 PM, Serge Hallyn <serge@hallyn.com> wrote:
> > From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
> >
> > A few modules are using cap_raised(current_cap(), cap) to authorize
> > actions.  This means that tasks which are privileged in non-initial
> > user namespaces will be deemed privileged.  The privilege should only
> > be granted if the task is in the initial user namespace.
> >
> > Switching the calls to capable() would change the behavior - it would
> > cause the LSM capable hooks to be called, and set PF_SUPERPRIV if
> > the capability was used.  So instead, put in an explicit check and
> > refuse privilege if the caller is not in init_user_ns.
> >
> > Vasiliy had suggested introducing a new helper for this.  I'm open
> > to suggestions, but for four callers and for a discouraged idiom,
> > I'd rather not pollute the capable* function namespace with a bad name.
> > (even has_capability goes through the LSM hooks)
> >
> > Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Andrew Morgan <morgan@kernel.org>
> > Cc: Vasiliy Kulikov <segoon@openwall.com>
> > ---
> >  drivers/block/drbd/drbd_nl.c           |    5 +++++
> >  drivers/md/dm-log-userspace-transfer.c |    3 +++
> >  drivers/staging/pohmelfs/config.c      |    3 +++
> >  drivers/video/uvesafb.c                |    3 +++
> >  4 files changed, 14 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
> > index 0feab26..9a87a14 100644
> > --- a/drivers/block/drbd/drbd_nl.c
> > +++ b/drivers/block/drbd/drbd_nl.c
> > @@ -2297,6 +2297,11 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
> >                return;
> >        }
> >
> > +       if (current_user_ns() != &init_user_ns) {
> 
> I'd feel much happier with this if it did use some inline or macro here.
> 
> #define NS_IS_NON_DEFAULT (current_user_ns() != &init_user_ns)
> 
> if (NS_IS_NON_DEFAULT || !cap_raised(current_cap(), CAP_SYS_ADMIN)) {
>         retcode = ERR_PERM;
>         goto fail;
> }

Thanks, I'll do something like this.

> Is it important to support the situation that NS support is not configured?

I'm not sure I understand your question right;  but if I do - when
NS support is not configured, that just means that all tasks are
in init_user_ns.  We sometimes want to shortcut the checks in that
case to speed things up, but NS_IS_NON_DEFAULT() is a valid check
without NS support.

-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/9] user namespace: clamp down users of cap_raised
  2011-10-19  9:01   ` David Howells
@ 2011-10-20 13:16     ` Serge E. Hallyn
  2011-10-24 14:43     ` [PATCH 05/10] " Serge E. Hallyn
  1 sibling, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-20 13:16 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew G. Morgan, linux-kernel, ebiederm, akpm, oleg, richard,
	mikevs, segoon, gregkh, eparis, Serge E. Hallyn

Quoting David Howells (dhowells@redhat.com):
> 
> > #define NS_IS_NON_DEFAULT (current_user_ns() != &init_user_ns)
> 
> How about:
> 
> 	#define IN_ROOT_USER_NS (current_user_ns() == &init_user_ns)
> 
> And then:
> 
> 	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN)) {
> 
> I think it reads better.

My feeble brain does seem to process this faster.

I might make it !IN_INIT_USER_NS(), to be more consistent with
&init_user_ns.

Vasiliy had months ago suggested moving the whole idiom into its own
helper function.  Somehow I couldn't make the jump to this then...

thanks all,
-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-20 12:58     ` Serge E. Hallyn
@ 2011-10-20 13:35       ` Eric W. Biederman
  2011-10-20 14:14         ` Serge E. Hallyn
  2011-10-20 14:24         ` Serge E. Hallyn
  0 siblings, 2 replies; 34+ messages in thread
From: Eric W. Biederman @ 2011-10-20 13:35 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, akpm, oleg, richard, mikevs, segoon, gregkh,
	dhowells, eparis, Serge E. Hallyn, netdev

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> Serge Hallyn <serge@hallyn.com> writes:
>> 
>> > From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
>> >
>> > Currently uids are compared without regard for the user namespace.
>> > Fix that to prevent tasks in a different user namespace from
>> > wrongly matching on SCM_CREDENTIALS.
>> >
>> > In the past, either your uids had to match, or you had to have
>> > CAP_SETXID.  In a namespaced world, you must either (both be in the
>> > same user namespace and have your uids match), or you must have
>> > CAP_SETXID targeted at the other user namespace.  The latter can
>> > happen for instance if uid 500 created a new user namespace and
>> > now interacts with uid 0 in it.
>> 
>> Serge this approach is wrong.
>
> Thanks for looking, Eric.
>
>> Because we pass the cred and the pid through the socket socket itself
>> is just a conduit and should be ignored in this context.
>
> Ok, that makes sense, but
>
>> The only interesting test should be are you allowed to impersonate other
>> users in your current userk namespace.
>
> Why in your current user namespace?  Shouldn't it be in the
> target user ns?  I understand it could be wrong to tie the
> user ns owning the socket to the target userns (though I still
> kind of like it), but just because I have CAP_SETUID in my
> own user_ns doesn't mean I should be able to pose as another
> uid in your user_ns.

First and foremost it is important that you be able if you have the
capability to impersonate other users in your current user namespace.
That is what the capability actually controls.

None of this allows you to impersonate any user in any other user
namespace.  The translation between users prevents that.

> (Now I also see that cred_to_ucred() translates to the current
> user_ns, so that should have been a hint to me before about
> your intent, but I'm not convinced I agree with your intent).
>
> And you do the same with the pid.  Why is that a valid assumption?

Yes.  Basically all the code is allow you to impersonate people you
would have been able to impersonate before.  If your target is in
another namespace you can not fool them.

With pids the logic should be a lot clearer.  Pretend to be a pid you can
see in your current pid namespace.  Lookup and convert to struct pid aka
the namespace agnostic object.  On output return the pid value that
the target process will know you as.

Ultimately I think we need a ns_capable for the current user namespace
instead of a global one.  But I don't see any rush to introduce
ns_capable here.

Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-20 13:35       ` Eric W. Biederman
@ 2011-10-20 14:14         ` Serge E. Hallyn
  2011-10-24  4:15           ` Serge E. Hallyn
  2011-10-20 14:24         ` Serge E. Hallyn
  1 sibling, 1 reply; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-20 14:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, linux-kernel, akpm, oleg, richard, mikevs,
	segoon, gregkh, dhowells, eparis, netdev

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> Serge Hallyn <serge@hallyn.com> writes:
> >> 
> >> > From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
> >> >
> >> > Currently uids are compared without regard for the user namespace.
> >> > Fix that to prevent tasks in a different user namespace from
> >> > wrongly matching on SCM_CREDENTIALS.
> >> >
> >> > In the past, either your uids had to match, or you had to have
> >> > CAP_SETXID.  In a namespaced world, you must either (both be in the
> >> > same user namespace and have your uids match), or you must have
> >> > CAP_SETXID targeted at the other user namespace.  The latter can
> >> > happen for instance if uid 500 created a new user namespace and
> >> > now interacts with uid 0 in it.
> >> 
> >> Serge this approach is wrong.
> >
> > Thanks for looking, Eric.
> >
> >> Because we pass the cred and the pid through the socket socket itself
> >> is just a conduit and should be ignored in this context.
> >
> > Ok, that makes sense, but
> >
> >> The only interesting test should be are you allowed to impersonate other
> >> users in your current userk namespace.
> >
> > Why in your current user namespace?  Shouldn't it be in the
> > target user ns?  I understand it could be wrong to tie the
> > user ns owning the socket to the target userns (though I still
> > kind of like it), but just because I have CAP_SETUID in my
> > own user_ns doesn't mean I should be able to pose as another
> > uid in your user_ns.
> 
> First and foremost it is important that you be able if you have the
> capability to impersonate other users in your current user namespace.
> That is what the capability actually controls.
> 
> None of this allows you to impersonate any user in any other user
> namespace.  The translation between users prevents that.
> 
> > (Now I also see that cred_to_ucred() translates to the current
> > user_ns, so that should have been a hint to me before about
> > your intent, but I'm not convinced I agree with your intent).
> >
> > And you do the same with the pid.  Why is that a valid assumption?
> 
> Yes.  Basically all the code is allow you to impersonate people you
> would have been able to impersonate before.  If your target is in
> another namespace you can not fool them.
> 
> With pids the logic should be a lot clearer.  Pretend to be a pid you can
> see in your current pid namespace.  Lookup and convert to struct pid aka
> the namespace agnostic object.  On output return the pid value that

No.  That conversion is happending before the user-specified pid is
set.

> the target process will know you as.
> 
> Ultimately I think we need a ns_capable for the current user namespace
> instead of a global one.  But I don't see any rush to introduce
> ns_capable here.
> 
> Eric
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-20 13:35       ` Eric W. Biederman
  2011-10-20 14:14         ` Serge E. Hallyn
@ 2011-10-20 14:24         ` Serge E. Hallyn
  1 sibling, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-20 14:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, akpm, oleg, richard, mikevs, segoon, gregkh,
	dhowells, eparis, Serge E. Hallyn, netdev

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> Serge Hallyn <serge@hallyn.com> writes:
> >> 
> >> > From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
> >> >
> >> > Currently uids are compared without regard for the user namespace.
> >> > Fix that to prevent tasks in a different user namespace from
> >> > wrongly matching on SCM_CREDENTIALS.
> >> >
> >> > In the past, either your uids had to match, or you had to have
> >> > CAP_SETXID.  In a namespaced world, you must either (both be in the
> >> > same user namespace and have your uids match), or you must have
> >> > CAP_SETXID targeted at the other user namespace.  The latter can
> >> > happen for instance if uid 500 created a new user namespace and
> >> > now interacts with uid 0 in it.
> >> 
> >> Serge this approach is wrong.
> >
> > Thanks for looking, Eric.
> >
> >> Because we pass the cred and the pid through the socket socket itself
> >> is just a conduit and should be ignored in this context.
> >
> > Ok, that makes sense, but
> >
> >> The only interesting test should be are you allowed to impersonate other
> >> users in your current userk namespace.
> >
> > Why in your current user namespace?  Shouldn't it be in the
> > target user ns?  I understand it could be wrong to tie the
> > user ns owning the socket to the target userns (though I still
> > kind of like it), but just because I have CAP_SETUID in my
> > own user_ns doesn't mean I should be able to pose as another
> > uid in your user_ns.
> 
> First and foremost it is important that you be able if you have the
> capability to impersonate other users in your current user namespace.
> That is what the capability actually controls.
> 
> None of this allows you to impersonate any user in any other user
> namespace.  The translation between users prevents that.
> 
> > (Now I also see that cred_to_ucred() translates to the current
> > user_ns, so that should have been a hint to me before about
> > your intent, but I'm not convinced I agree with your intent).
> >
> > And you do the same with the pid.  Why is that a valid assumption?
> 
> Yes.  Basically all the code is allow you to impersonate people you
> would have been able to impersonate before.  If your target is in
> another namespace you can not fool them.
> 
> With pids the logic should be a lot clearer.  Pretend to be a pid you can
> see in your current pid namespace.  Lookup and convert to struct pid aka
> the namespace agnostic object.  On output return the pid value that
> the target process will know you as.
> 
> Ultimately I think we need a ns_capable for the current user namespace
> instead of a global one.  But I don't see any rush to introduce
> ns_capable here.

I think I agree - I was mistakenly thinking that without this patch
there is an opportunity for a less privileged task in child user ns
to impersonate, but that's not possible, so let's drop this patch
for now!

thanks,
-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-20 14:14         ` Serge E. Hallyn
@ 2011-10-24  4:15           ` Serge E. Hallyn
  2011-10-24  4:27             ` Eric W. Biederman
  0 siblings, 1 reply; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-24  4:15 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, linux-kernel, akpm, oleg, richard, mikevs,
	segoon, gregkh, dhowells, eparis, netdev

Quoting Serge E. Hallyn (serge.hallyn@canonical.com):
> Quoting Eric W. Biederman (ebiederm@xmission.com):
> > "Serge E. Hallyn" <serge@hallyn.com> writes:
> > 
> > > Quoting Eric W. Biederman (ebiederm@xmission.com):
> > >> Serge Hallyn <serge@hallyn.com> writes:
> > >> 
> > >> > From: "Serge E. Hallyn" <serge.hallyn@canonical.com>
> > >> >
> > >> > Currently uids are compared without regard for the user namespace.
> > >> > Fix that to prevent tasks in a different user namespace from
> > >> > wrongly matching on SCM_CREDENTIALS.
> > >> >
> > >> > In the past, either your uids had to match, or you had to have
> > >> > CAP_SETXID.  In a namespaced world, you must either (both be in the
> > >> > same user namespace and have your uids match), or you must have
> > >> > CAP_SETXID targeted at the other user namespace.  The latter can
> > >> > happen for instance if uid 500 created a new user namespace and
> > >> > now interacts with uid 0 in it.
> > >> 
> > >> Serge this approach is wrong.
> > >
> > > Thanks for looking, Eric.
> > >
> > >> Because we pass the cred and the pid through the socket socket itself
> > >> is just a conduit and should be ignored in this context.
> > >
> > > Ok, that makes sense, but
> > >
> > >> The only interesting test should be are you allowed to impersonate other
> > >> users in your current userk namespace.
> > >
> > > Why in your current user namespace?  Shouldn't it be in the
> > > target user ns?  I understand it could be wrong to tie the
> > > user ns owning the socket to the target userns (though I still
> > > kind of like it), but just because I have CAP_SETUID in my
> > > own user_ns doesn't mean I should be able to pose as another
> > > uid in your user_ns.
> > 
> > First and foremost it is important that you be able if you have the
> > capability to impersonate other users in your current user namespace.
> > That is what the capability actually controls.
> > 
> > None of this allows you to impersonate any user in any other user
> > namespace.  The translation between users prevents that.
> > 
> > > (Now I also see that cred_to_ucred() translates to the current
> > > user_ns, so that should have been a hint to me before about
> > > your intent, but I'm not convinced I agree with your intent).
> > >
> > > And you do the same with the pid.  Why is that a valid assumption?
> > 
> > Yes.  Basically all the code is allow you to impersonate people you
> > would have been able to impersonate before.  If your target is in
> > another namespace you can not fool them.
> > 
> > With pids the logic should be a lot clearer.  Pretend to be a pid you can
> > see in your current pid namespace.  Lookup and convert to struct pid aka
> > the namespace agnostic object.  On output return the pid value that
> 
> No.  That conversion is happending before the user-specified pid is
> set.

Never mind, it all gets a little convoluted, but I see how it works,
and - when the time comes - how to do it right for userns.  :)  Sorry
about that.

thanks,
-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware
  2011-10-24  4:15           ` Serge E. Hallyn
@ 2011-10-24  4:27             ` Eric W. Biederman
  0 siblings, 0 replies; 34+ messages in thread
From: Eric W. Biederman @ 2011-10-24  4:27 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge E. Hallyn, linux-kernel, akpm, oleg, richard, mikevs,
	segoon, gregkh, dhowells, eparis, netdev

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Never mind, it all gets a little convoluted, but I see how it works,
> and - when the time comes - how to do it right for userns.  :)  Sorry
> about that.

No problem.

Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 05/10] user namespace: clamp down users of cap_raised
  2011-10-19  9:01   ` David Howells
  2011-10-20 13:16     ` Serge E. Hallyn
@ 2011-10-24 14:43     ` Serge E. Hallyn
  2011-10-24 15:47       ` Andrew G. Morgan
  1 sibling, 1 reply; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-24 14:43 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew G. Morgan, linux-kernel, ebiederm, akpm, oleg, richard,
	mikevs, segoon, gregkh, eparis, Serge E. Hallyn

A few modules are using cap_raised(current_cap(), cap) to authorize
actions.  This means that tasks which are privileged in non-initial
user namespaces will be deemed privileged.  The privilege should only
be granted if the task is in the initial user namespace.

Switching the calls to capable() would change the behavior - it would
cause the LSM capable hooks to be called, and set PF_SUPERPRIV if
the capability was used.  So instead, put in an explicit check and
refuse privilege if the caller is not in init_user_ns.

Changelog:
Oct 23: Use a nice macro to make the code shorter and easier to read,
	per advice from Andrew Morgan and David Howells.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: David Howells <dhowells@redhat.com>
---
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/staging/pohmelfs/config.c      |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 include/linux/cred.h                   |    2 ++
 5 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 0feab26..f799b15 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2297,7 +2297,7 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
 		return;
 	}
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
+	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
 	}
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 1f23e04..126a79b 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -134,7 +134,7 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
 	spin_lock(&receiving_list_lock);
diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c
index b6c42cb..327c047 100644
--- a/drivers/staging/pohmelfs/config.c
+++ b/drivers/staging/pohmelfs/config.c
@@ -525,7 +525,7 @@ static void pohmelfs_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *n
 {
 	int err;
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
 	switch (msg->flags) {
diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 7f8472c..94e0e9d 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -73,7 +73,7 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
 		return;
 
 	if (msg->seq >= UVESAFB_TASKS_MAX)
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 4030896..2f75da7 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -359,9 +359,11 @@ static inline void put_cred(const struct cred *_cred)
 
 #ifdef CONFIG_USER_NS
 #define current_user_ns() (current_cred_xxx(user_ns))
+#define IN_ROOT_USER_NS() (current_user_ns() == &init_user_ns)
 #else
 extern struct user_namespace init_user_ns;
 #define current_user_ns() (&init_user_ns)
+#define IN_ROOT_USER_NS() (1)
 #endif
 
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] user namespace: clamp down users of cap_raised
  2011-10-24 14:43     ` [PATCH 05/10] " Serge E. Hallyn
@ 2011-10-24 15:47       ` Andrew G. Morgan
  2011-10-24 17:28         ` Serge E. Hallyn
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew G. Morgan @ 2011-10-24 15:47 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: David Howells, linux-kernel, ebiederm, akpm, oleg, richard,
	mikevs, segoon, gregkh, eparis, Serge E. Hallyn

Serge,

It seems as if this whole thing is really idiomatic. How about?

#define IN_ROOT_USER_NS_CAPABLE(cap)  \
   ((current_user_ns() == &init_user_ns) && cap_raised(current_cap(), cap))

Cheers

Andrew

On Mon, Oct 24, 2011 at 7:43 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
> A few modules are using cap_raised(current_cap(), cap) to authorize
> actions.  This means that tasks which are privileged in non-initial
> user namespaces will be deemed privileged.  The privilege should only
> be granted if the task is in the initial user namespace.
>
> Switching the calls to capable() would change the behavior - it would
> cause the LSM capable hooks to be called, and set PF_SUPERPRIV if
> the capability was used.  So instead, put in an explicit check and
> refuse privilege if the caller is not in init_user_ns.
>
> Changelog:
> Oct 23: Use a nice macro to make the code shorter and easier to read,
>        per advice from Andrew Morgan and David Howells.
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Andrew Morgan <morgan@kernel.org>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> Cc: David Howells <dhowells@redhat.com>
> ---
>  drivers/block/drbd/drbd_nl.c           |    2 +-
>  drivers/md/dm-log-userspace-transfer.c |    2 +-
>  drivers/staging/pohmelfs/config.c      |    2 +-
>  drivers/video/uvesafb.c                |    2 +-
>  include/linux/cred.h                   |    2 ++
>  5 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
> index 0feab26..f799b15 100644
> --- a/drivers/block/drbd/drbd_nl.c
> +++ b/drivers/block/drbd/drbd_nl.c
> @@ -2297,7 +2297,7 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
>                return;
>        }
>
> -       if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
> +       if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN)) {
>                retcode = ERR_PERM;
>                goto fail;
>        }
> diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
> index 1f23e04..126a79b 100644
> --- a/drivers/md/dm-log-userspace-transfer.c
> +++ b/drivers/md/dm-log-userspace-transfer.c
> @@ -134,7 +134,7 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
>  {
>        struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
>
> -       if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
> +       if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
>        spin_lock(&receiving_list_lock);
> diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c
> index b6c42cb..327c047 100644
> --- a/drivers/staging/pohmelfs/config.c
> +++ b/drivers/staging/pohmelfs/config.c
> @@ -525,7 +525,7 @@ static void pohmelfs_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *n
>  {
>        int err;
>
> -       if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
> +       if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
>        switch (msg->flags) {
> diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
> index 7f8472c..94e0e9d 100644
> --- a/drivers/video/uvesafb.c
> +++ b/drivers/video/uvesafb.c
> @@ -73,7 +73,7 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
>        struct uvesafb_task *utask;
>        struct uvesafb_ktask *task;
>
> -       if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
> +       if (!IN_ROOT_USER_NS() || !cap_raised(current_cap(), CAP_SYS_ADMIN))
>                return;
>
>        if (msg->seq >= UVESAFB_TASKS_MAX)
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 4030896..2f75da7 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -359,9 +359,11 @@ static inline void put_cred(const struct cred *_cred)
>
>  #ifdef CONFIG_USER_NS
>  #define current_user_ns() (current_cred_xxx(user_ns))
> +#define IN_ROOT_USER_NS() (current_user_ns() == &init_user_ns)
>  #else
>  extern struct user_namespace init_user_ns;
>  #define current_user_ns() (&init_user_ns)
> +#define IN_ROOT_USER_NS() (1)
>  #endif
>
>
> --
> 1.7.5.4
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] user namespace: clamp down users of cap_raised
  2011-10-24 15:47       ` Andrew G. Morgan
@ 2011-10-24 17:28         ` Serge E. Hallyn
  2011-10-25  0:43           ` Andrew G. Morgan
  0 siblings, 1 reply; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-24 17:28 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, David Howells, linux-kernel, ebiederm, akpm,
	oleg, richard, mikevs, segoon, gregkh, eparis

Quoting Andrew G. Morgan (morgan@kernel.org):
> Serge,
> 
> It seems as if this whole thing is really idiomatic. How about?
> 
> #define IN_ROOT_USER_NS_CAPABLE(cap)  \
>    ((current_user_ns() == &init_user_ns) && cap_raised(current_cap(), cap))

My objection to this was that it seems to encourage others to use it :)  I'm
not sure we want that.  Also, IN_ROOT_USER_NS seems more generally useful.

But if I'm the only one who feels this way I'll go ahead and do it...

thanks,
-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] user namespace: clamp down users of cap_raised
  2011-10-24 17:28         ` Serge E. Hallyn
@ 2011-10-25  0:43           ` Andrew G. Morgan
  2011-10-25  3:03             ` Serge E. Hallyn
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew G. Morgan @ 2011-10-25  0:43 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Serge E. Hallyn, David Howells, linux-kernel, ebiederm, akpm,
	oleg, richard, mikevs, segoon, gregkh, eparis

On Mon, Oct 24, 2011 at 10:28 AM, Serge E. Hallyn
<serge.hallyn@canonical.com> wrote:
> Quoting Andrew G. Morgan (morgan@kernel.org):
>> Serge,
>>
>> It seems as if this whole thing is really idiomatic. How about?
>>
>> #define IN_ROOT_USER_NS_CAPABLE(cap)  \
>>    ((current_user_ns() == &init_user_ns) && cap_raised(current_cap(), cap))
>
> My objection to this was that it seems to encourage others to use it :)  I'm
> not sure we want that.  Also, IN_ROOT_USER_NS seems more generally useful.

What is driving the choice of when its appropriate? How can a
developer determine this? If you make it hard, presumably folk won't
do it by default, but will that create a burdon on others to go round
patching things like this up?

> But if I'm the only one who feels this way I'll go ahead and do it...

I'm more of a optimize for a human to read the source code (ie. debug
a problem) kind of person. If IN_ROOT_USER_NS is useful, you could
always define IN_ROOT_USER_NS_CAPABLE in terms of IN_ROOT_USER_NS &&
... and provide both.

I guess I'm unclear, however, when you want developers to use one or
the other variant of the basic capable() functionality. Since I'm not
clear, I'm suspecting this is a fragile situation.

Cheers

Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] user namespace: clamp down users of cap_raised
  2011-10-25  0:43           ` Andrew G. Morgan
@ 2011-10-25  3:03             ` Serge E. Hallyn
  2011-10-25 17:33               ` Eric Paris
  0 siblings, 1 reply; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-25  3:03 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, David Howells, linux-kernel, ebiederm, akpm,
	oleg, richard, mikevs, segoon, gregkh, eparis

Quoting Andrew G. Morgan (morgan@kernel.org):
> On Mon, Oct 24, 2011 at 10:28 AM, Serge E. Hallyn
> <serge.hallyn@canonical.com> wrote:
> > Quoting Andrew G. Morgan (morgan@kernel.org):
> >> Serge,
> >>
> >> It seems as if this whole thing is really idiomatic. How about?
> >>
> >> #define IN_ROOT_USER_NS_CAPABLE(cap)  \
> >>    ((current_user_ns() == &init_user_ns) && cap_raised(current_cap(), cap))
> >
> > My objection to this was that it seems to encourage others to use it :)  I'm
> > not sure we want that.  Also, IN_ROOT_USER_NS seems more generally useful.
> 
> What is driving the choice of when its appropriate? How can a

I'd like to say it's never appropriate.  The reason is that it bypasses
the whole security_ops->capable() sequence, so for instance SELinux is
kept in the dark.

> developer determine this? If you make it hard, presumably folk won't
> do it by default, but will that create a burdon on others to go round
> patching things like this up?
> 
> > But if I'm the only one who feels this way I'll go ahead and do it...
> 
> I'm more of a optimize for a human to read the source code (ie. debug
> a problem) kind of person. If IN_ROOT_USER_NS is useful, you could
> always define IN_ROOT_USER_NS_CAPABLE in terms of IN_ROOT_USER_NS &&

My other objection is that, in contrast to IN_ROOT_USER_NS(), which is
very clear, IN_ROOT_USER_NS_CAPABLE() is not as helpful.  I'm sure a
better name is out there somewhere, though.

> ... and provide both.
> 
> I guess I'm unclear, however, when you want developers to use one or
> the other variant of the basic capable() functionality. Since I'm not
> clear, I'm suspecting this is a fragile situation.

I think only security code (LSMs) should be using cap_raised directly.
Everything else should go through the capable()/has_capability() family
of functions.  Which, incidentally, have been (or are about to be) made
less of a mess and thus less fragile by Eric Paris' patchset starting at
http://www.spinics.net/linux/fedora/linux-security-module/msg11896.html

-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] user namespace: clamp down users of cap_raised
  2011-10-25  3:03             ` Serge E. Hallyn
@ 2011-10-25 17:33               ` Eric Paris
  2011-10-25 20:09                 ` Serge E. Hallyn
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Paris @ 2011-10-25 17:33 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andrew G. Morgan, Serge E. Hallyn, David Howells, linux-kernel,
	ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh

On Mon, 2011-10-24 at 22:03 -0500, Serge E. Hallyn wrote:
> Quoting Andrew G. Morgan (morgan@kernel.org):
> > On Mon, Oct 24, 2011 at 10:28 AM, Serge E. Hallyn
> > <serge.hallyn@canonical.com> wrote:
> > > Quoting Andrew G. Morgan (morgan@kernel.org):
> > >> Serge,
> > >>
> > >> It seems as if this whole thing is really idiomatic. How about?
> > >>
> > >> #define IN_ROOT_USER_NS_CAPABLE(cap)  \
> > >>    ((current_user_ns() == &init_user_ns) && cap_raised(current_cap(), cap))
> > >
> > > My objection to this was that it seems to encourage others to use it :)  I'm
> > > not sure we want that.  Also, IN_ROOT_USER_NS seems more generally useful.
> > 
> > What is driving the choice of when its appropriate? How can a
> 
> I'd like to say it's never appropriate.  The reason is that it bypasses
> the whole security_ops->capable() sequence, so for instance SELinux is
> kept in the dark.
> 
> > developer determine this? If you make it hard, presumably folk won't
> > do it by default, but will that create a burdon on others to go round
> > patching things like this up?
> > 
> > > But if I'm the only one who feels this way I'll go ahead and do it...
> > 
> > I'm more of a optimize for a human to read the source code (ie. debug
> > a problem) kind of person. If IN_ROOT_USER_NS is useful, you could
> > always define IN_ROOT_USER_NS_CAPABLE in terms of IN_ROOT_USER_NS &&
> 
> My other objection is that, in contrast to IN_ROOT_USER_NS(), which is
> very clear, IN_ROOT_USER_NS_CAPABLE() is not as helpful.  I'm sure a
> better name is out there somewhere, though.
> 
> > ... and provide both.
> > 
> > I guess I'm unclear, however, when you want developers to use one or
> > the other variant of the basic capable() functionality. Since I'm not
> > clear, I'm suspecting this is a fragile situation.
> 
> I think only security code (LSMs) should be using cap_raised directly.
> Everything else should go through the capable()/has_capability() family
> of functions.  Which, incidentally, have been (or are about to be) made
> less of a mess and thus less fragile by Eric Paris' patchset starting at
> http://www.spinics.net/linux/fedora/linux-security-module/msg11896.html

(sorry out all last week)

I was going to ask why we have these user at all.  Is there a reason
they are bypassing the LSM and not setting PF_PRIV?  Is the best
solution to just bring them back into the capable fold?

-Eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/10] user namespace: clamp down users of cap_raised
  2011-10-25 17:33               ` Eric Paris
@ 2011-10-25 20:09                 ` Serge E. Hallyn
  0 siblings, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-25 20:09 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andrew G. Morgan, Serge E. Hallyn, David Howells, linux-kernel,
	ebiederm, akpm, oleg, richard, mikevs, segoon, gregkh

Quoting Eric Paris (eparis@redhat.com):
> On Mon, 2011-10-24 at 22:03 -0500, Serge E. Hallyn wrote:
> > Quoting Andrew G. Morgan (morgan@kernel.org):
> > > On Mon, Oct 24, 2011 at 10:28 AM, Serge E. Hallyn
> > > <serge.hallyn@canonical.com> wrote:
> > > > Quoting Andrew G. Morgan (morgan@kernel.org):
> > > >> Serge,
> > > >>
> > > >> It seems as if this whole thing is really idiomatic. How about?
> > > >>
> > > >> #define IN_ROOT_USER_NS_CAPABLE(cap)  \
> > > >>    ((current_user_ns() == &init_user_ns) && cap_raised(current_cap(), cap))
> > > >
> > > > My objection to this was that it seems to encourage others to use it :)  I'm
> > > > not sure we want that.  Also, IN_ROOT_USER_NS seems more generally useful.
> > > 
> > > What is driving the choice of when its appropriate? How can a
> > 
> > I'd like to say it's never appropriate.  The reason is that it bypasses
> > the whole security_ops->capable() sequence, so for instance SELinux is
> > kept in the dark.
> > 
> > > developer determine this? If you make it hard, presumably folk won't
> > > do it by default, but will that create a burdon on others to go round
> > > patching things like this up?
> > > 
> > > > But if I'm the only one who feels this way I'll go ahead and do it...
> > > 
> > > I'm more of a optimize for a human to read the source code (ie. debug
> > > a problem) kind of person. If IN_ROOT_USER_NS is useful, you could
> > > always define IN_ROOT_USER_NS_CAPABLE in terms of IN_ROOT_USER_NS &&
> > 
> > My other objection is that, in contrast to IN_ROOT_USER_NS(), which is
> > very clear, IN_ROOT_USER_NS_CAPABLE() is not as helpful.  I'm sure a
> > better name is out there somewhere, though.
> > 
> > > ... and provide both.
> > > 
> > > I guess I'm unclear, however, when you want developers to use one or
> > > the other variant of the basic capable() functionality. Since I'm not
> > > clear, I'm suspecting this is a fragile situation.
> > 
> > I think only security code (LSMs) should be using cap_raised directly.
> > Everything else should go through the capable()/has_capability() family
> > of functions.  Which, incidentally, have been (or are about to be) made
> > less of a mess and thus less fragile by Eric Paris' patchset starting at
> > http://www.spinics.net/linux/fedora/linux-security-module/msg11896.html
> 
> (sorry out all last week)
> 
> I was going to ask why we have these user at all.  Is there a reason
> they are bypassing the LSM and not setting PF_PRIV?  Is the best
> solution to just bring them back into the capable fold?

Probably, but I didn't really want to tackle that - and risk
regressions - just then.  The most important issue for now is
that anyone who can create a new user namespace can pass this
test, so I wanted to fix that first.  (Creating a new user ns
takes quite a bit of privilege, but of course userspace loves
to bypass that with setuid)

Maybe, since everyone is calling me on it, I'm wrong and we should
consider it after all :)

-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 06/10] Add Documentation/namespaces/user_namespace.txt (v4)
  2011-10-19  9:36 ` [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3) David Howells
  2011-10-20 12:58   ` Serge E. Hallyn
@ 2011-10-26 20:33   ` Serge E. Hallyn
  1 sibling, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2011-10-26 20:33 UTC (permalink / raw)
  To: David Howells
  Cc: linux-kernel, ebiederm, akpm, oleg, richard, mikevs, segoon,
	gregkh, eparis, Serge E. Hallyn, Randy Dunlap

(Thanks for the feedback, David)

Provide a description of user namespaces

Changelog:
   jul 26: incorporate feedback from David Howells.
   jul 29: incorporate feedback from Randy Dunlap.
   sep 15: remove information which is not yet certain.
   Oct 26: add changes suggested by David on Oct 19

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
---
 Documentation/namespaces/user_namespace.txt |   54 +++++++++++++++++++++++++++
 1 files changed, 54 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/namespaces/user_namespace.txt

diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
new file mode 100644
index 0000000..805f0ef
--- /dev/null
+++ b/Documentation/namespaces/user_namespace.txt
@@ -0,0 +1,54 @@
+Description
+===========
+
+Traditionally, each task is owned by a user ID (UID) and belongs to one or more
+groups (GID).  Both are simple numeric IDs, though userspace usually translates
+them to names.  The user namespace allows tasks to have different views of the
+UIDs and GIDs associated with tasks and other resources.  (See 'UID mapping'
+below for more.)
+
+The user namespace is a simple hierarchical one.  The system starts with all
+tasks belonging to the initial user namespace.  A task creates a new user
+namespace by passing the CLONE_NEWUSER flag to clone(2).  This requires the
+creating task to have the CAP_SETUID, CAP_SETGID, and CAP_CHOWN capabilities,
+but it does not need to be running as root.  The clone(2) call will result in a
+new task which to itself appears to be running as UID and GID 0, but to its
+creator seems to have the creator's credentials.
+
+To this new task, any resource belonging to the initial user namespace will
+appear to belong to user and group 'nobody', which are UID and GID -1.
+Permission to open such files will be granted according to world access
+permissions.  UID comparisons and group membership checks will fail, and
+privilege will be denied.
+
+When a task belonging to (for example) UID 500 in the initial user namespace
+creates a new user namespace, even though the new task will see itself as
+belonging to UID 0, any task in the initial user namespace will see it as
+belonging to UID 500.  Therefore, UID 500 in the initial user namespace will be
+able to kill the new task.
+
+UID mapping for the VFS is not yet implemented, though prototypes exist.
+
+Relationship between the User namespace and other namespaces
+============================================================
+
+Other namespaces, such as UTS and network, are owned by a user namespace.  When
+such a namespace is created, it is assigned to the user namespace of the task
+by which it was created.  Therefore attempts to exercise privilege over a
+resource can be properly validated by checking for privilege targeted to the
+user namespace which owns the resource.  For instance, a check for the special
+privilege to change a network interface address could be done by checking for
+CAP_NET_ADMIN against the user namespace which created the network namespace
+owning the network interface.
+
+( XXX TODO: add a list of capabilities corresponding to different namespaces)
+
+As an example, if a new task is cloned with a private user namespace but
+not a private network namespace, then the task's network namespace is owned
+by the parent user namespace.  The new task has no special privilege over the
+parent user namespace, so it will not be able to create or configure the
+network devices therein.  If instead the task were cloned with both private
+user and network namespaces, then the private network namespace is owned
+by the private user namespace, and so root in the new user namespace
+will have privilege over resources owned by the network namespace.  It will
+be able to create and configure network devices.
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-10-26 20:32 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-18 21:54 user namespaces: fix some uid/privilege leaks Serge Hallyn
2011-10-18 21:54 ` [PATCH 1/9] pid_ns: ensure pid is not freed during kill_pid_info_as_uid Serge Hallyn
2011-10-18 21:54 ` [PATCH 2/9] user namespace: usb: make usb urbs user namespace aware (v2) Serge Hallyn
2011-10-18 21:54 ` [PATCH 3/9] user namespace: make signal.c respect user namespaces (v4) Serge Hallyn
2011-10-18 21:54 ` [PATCH 4/9] User namespace: don't allow sysctl in non-init user ns (v2) Serge Hallyn
2011-10-18 21:54 ` [PATCH 5/9] user namespace: clamp down users of cap_raised Serge Hallyn
2011-10-19  4:33   ` Andrew G. Morgan
2011-10-20 13:01     ` Serge E. Hallyn
2011-10-19  9:01   ` David Howells
2011-10-20 13:16     ` Serge E. Hallyn
2011-10-24 14:43     ` [PATCH 05/10] " Serge E. Hallyn
2011-10-24 15:47       ` Andrew G. Morgan
2011-10-24 17:28         ` Serge E. Hallyn
2011-10-25  0:43           ` Andrew G. Morgan
2011-10-25  3:03             ` Serge E. Hallyn
2011-10-25 17:33               ` Eric Paris
2011-10-25 20:09                 ` Serge E. Hallyn
2011-10-18 21:54 ` [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
2011-10-18 21:54 ` [PATCH 7/9] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
2011-10-18 21:54 ` [PATCH 8/9] protect cap_netlink_recv from user namespaces Serge Hallyn
2011-10-18 21:54 ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Serge Hallyn
2011-10-18 22:14   ` Joe Perches
2011-10-18 23:22     ` Serge E. Hallyn
2011-10-19  2:25       ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware (v2) Serge E. Hallyn
2011-10-19 13:52   ` [PATCH 9/9] make net/core/scm.c uid comparisons user namespace aware Eric W. Biederman
2011-10-20 12:58     ` Serge E. Hallyn
2011-10-20 13:35       ` Eric W. Biederman
2011-10-20 14:14         ` Serge E. Hallyn
2011-10-24  4:15           ` Serge E. Hallyn
2011-10-24  4:27             ` Eric W. Biederman
2011-10-20 14:24         ` Serge E. Hallyn
2011-10-19  9:36 ` [PATCH 6/9] Add Documentation/namespaces/user_namespace.txt (v3) David Howells
2011-10-20 12:58   ` Serge E. Hallyn
2011-10-26 20:33   ` [PATCH 06/10] Add Documentation/namespaces/user_namespace.txt (v4) Serge E. Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).