* [PATCH RFC] User namespaces: general cleanups
@ 2008-10-10 1:19 Serge E. Hallyn
[not found] ` <20081010011917.GA8046-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-10 12:58 ` Keys and namespaces David Howells
0 siblings, 2 replies; 14+ messages in thread
From: Serge E. Hallyn @ 2008-10-10 1:19 UTC (permalink / raw)
To: David Howells; +Cc: Linux Containers, Eric W. Biederman
David, here is a new version taking into account your feedback from
yesterday.
Eric, it has changed quite a bit since you last looked at it, so if you
have a chance to take another look that'd be great. It's still just the
general cleanup, no handling of cross-userns capabilities or file
accesses :)
thanks,
-serge
From da249dbe92c1b218f8194a7fd386e5fba91049c1 Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Thu, 9 Oct 2008 13:21:07 -0500
Subject: [PATCH 1/1] User namespaces: general cleanups
Move the user namespace from the nsproxy to the struct user_struct, because
1. otherwise a struct cred or struct user_struct is insufficient to
unambiguously determine uid equivalence
2. this nicely fits the hierarchical user namespace+struct user_struct
model.
When a user_namespace is created, the user which created it is
marked as its 'creator'. The user_namespace pins the creator.
Each userid in a user_ns pins the user_ns. This keeps refcounting
nice and simple. Refcounting rules are as follows:
The task pins the user struct.
The user struct pins its user namespace.
The user namespace pins the struct user which created it.
User namespaces are cloned during copy_creds(). Unsharing a new user_ns
is no longer possible. (We could re-add that, but it'll cause code
duplication and doesn't seem useful if PAM doesn't need to clone user
namespaces).
Open question: do we need to sched_switch_user(task) for the new
user namespace? My inclination is to say that ideally we want
scheduling to be based on the userid in the top level user_ns
anyway, so ignore the question for now.
Todo: remove debug printks
Changelog
Oct 9: Move create_new_userns from create_new_namespaces
to copy_creds().
Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
include/linux/init_task.h | 1 -
include/linux/nsproxy.h | 1 -
include/linux/sched.h | 1 +
include/linux/user_namespace.h | 13 ++------
kernel/cred.c | 28 ++++++++++++++++++-
kernel/fork.c | 5 +--
kernel/nsproxy.c | 15 +--------
kernel/sys.c | 4 +-
kernel/user.c | 52 ++++++++++++----------------------
kernel/user_namespace.c | 60 +++++++++++----------------------------
10 files changed, 74 insertions(+), 106 deletions(-)
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index edf0285..126c3c4 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -57,7 +57,6 @@ extern struct nsproxy init_nsproxy;
.mnt_ns = NULL, \
INIT_NET_NS(net_ns) \
INIT_IPC_NS(ipc_ns) \
- .user_ns = &init_user_ns, \
}
#define INIT_SIGHAND(sighand) { \
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index c8a768e..afad7de 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -27,7 +27,6 @@ struct nsproxy {
struct ipc_namespace *ipc_ns;
struct mnt_namespace *mnt_ns;
struct pid_namespace *pid_ns;
- struct user_namespace *user_ns;
struct net *net_ns;
};
extern struct nsproxy init_nsproxy;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 86ba19c..86acc5f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -595,6 +595,7 @@ struct user_struct {
/* Hash table maintenance information */
struct hlist_node uidhash_node;
uid_t uid;
+ struct user_namespace *user_ns;
#ifdef CONFIG_USER_SCHED
struct task_group *tg;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index b5f41d4..d6e61a2 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -12,7 +12,7 @@
struct user_namespace {
struct kref kref;
struct hlist_head uidhash_table[UIDHASH_SZ];
- struct user_struct *root_user;
+ struct user_struct *creator;
};
extern struct user_namespace init_user_ns;
@@ -26,8 +26,7 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
return ns;
}
-extern struct user_namespace *copy_user_ns(int flags,
- struct user_namespace *old_ns);
+extern struct user_struct *create_new_userns(struct task_struct *tsk);
extern void free_user_ns(struct kref *kref);
static inline void put_user_ns(struct user_namespace *ns)
@@ -43,13 +42,9 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
return &init_user_ns;
}
-static inline struct user_namespace *copy_user_ns(int flags,
- struct user_namespace *old_ns)
+static inline struct user_struct *create_new_userns(struct task_struct *tsk)
{
- if (flags & CLONE_NEWUSER)
- return ERR_PTR(-EINVAL);
-
- return old_ns;
+ return ERR_PTR(-EINVAL);
}
static inline void put_user_ns(struct user_namespace *ns)
diff --git a/kernel/cred.c b/kernel/cred.c
index 13697ca..abaf318 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -274,6 +274,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
struct thread_group_cred *tgcred;
#endif
struct cred *new;
+ struct user_struct *new_root = NULL;
mutex_init(&p->cred_exec_mutex);
@@ -289,9 +290,34 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
return 0;
}
+ if (clone_flags & CLONE_NEWUSER) {
+ /*
+ * hopefully the capability check goes away when userns support
+ * is complete
+ */
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ if (clone_flags & CLONE_THREAD)
+ return -EINVAL;
+ new_root = create_new_userns(p);
+ if (IS_ERR(new_root))
+ return PTR_ERR(new_root);
+ }
+
new = prepare_creds();
- if (!new)
+ if (!new) {
+ free_uid(new_root);
return -ENOMEM;
+ }
+
+ /* If we created a new user_ns, make its root user
+ * our user */
+ if (new_root) {
+ new->uid = new->euid = new->suid = new->fsuid = 0;
+ new->gid = new->egid = new->sgid = new->fsgid = 0;
+ free_uid(new->user);
+ new->user = new_root;
+ }
#ifdef CONFIG_KEYS
/* new threads get their own thread keyrings if their parent already
diff --git a/kernel/fork.c b/kernel/fork.c
index e65f424..b533278 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -933,7 +933,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
if (atomic_read(&p->real_cred->user->processes) >=
p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
- p->real_cred->user != current->nsproxy->user_ns->root_user)
+ p->real_cred->user != INIT_USER)
goto bad_fork_free;
}
@@ -1553,8 +1553,7 @@ asmlinkage long sys_unshare(unsigned long unshare_flags)
err = -EINVAL;
if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
- CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|
- CLONE_NEWNET))
+ CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET))
goto bad_unshare_out;
/*
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 1d3ef29..63598dc 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -80,12 +80,6 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
goto out_pid;
}
- new_nsp->user_ns = copy_user_ns(flags, tsk->nsproxy->user_ns);
- if (IS_ERR(new_nsp->user_ns)) {
- err = PTR_ERR(new_nsp->user_ns);
- goto out_user;
- }
-
new_nsp->net_ns = copy_net_ns(flags, tsk->nsproxy->net_ns);
if (IS_ERR(new_nsp->net_ns)) {
err = PTR_ERR(new_nsp->net_ns);
@@ -95,9 +89,6 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
return new_nsp;
out_net:
- if (new_nsp->user_ns)
- put_user_ns(new_nsp->user_ns);
-out_user:
if (new_nsp->pid_ns)
put_pid_ns(new_nsp->pid_ns);
out_pid:
@@ -130,7 +121,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
get_nsproxy(old_ns);
if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNET)))
+ CLONE_NEWPID | CLONE_NEWNET)))
return 0;
if (!capable(CAP_SYS_ADMIN)) {
@@ -173,8 +164,6 @@ void free_nsproxy(struct nsproxy *ns)
put_ipc_ns(ns->ipc_ns);
if (ns->pid_ns)
put_pid_ns(ns->pid_ns);
- if (ns->user_ns)
- put_user_ns(ns->user_ns);
put_net(ns->net_ns);
kmem_cache_free(nsproxy_cachep, ns);
}
@@ -189,7 +178,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
int err = 0;
if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWUSER | CLONE_NEWNET)))
+ CLONE_NEWNET)))
return 0;
if (!capable(CAP_SYS_ADMIN))
diff --git a/kernel/sys.c b/kernel/sys.c
index 68ce096..d2bed52 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -565,13 +565,13 @@ static int set_user(struct cred *new)
{
struct user_struct *new_user;
- new_user = alloc_uid(current->nsproxy->user_ns, new->uid);
+ new_user = alloc_uid(current_user()->user_ns, new->uid);
if (!new_user)
return -EAGAIN;
if (atomic_read(&new_user->processes) >=
current->signal->rlim[RLIMIT_NPROC].rlim_cur &&
- new_user != current->nsproxy->user_ns->root_user) {
+ new_user != INIT_USER) {
free_uid(new_user);
return -EAGAIN;
}
diff --git a/kernel/user.c b/kernel/user.c
index ed4dc57..1390387 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -20,9 +20,9 @@
struct user_namespace init_user_ns = {
.kref = {
- .refcount = ATOMIC_INIT(2),
+ .refcount = ATOMIC_INIT(1),
},
- .root_user = &root_user,
+ .creator = &root_user,
};
EXPORT_SYMBOL_GPL(init_user_ns);
@@ -48,12 +48,14 @@ static struct kmem_cache *uid_cachep;
*/
static DEFINE_SPINLOCK(uidhash_lock);
+/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->creator */
struct user_struct root_user = {
- .__count = ATOMIC_INIT(1),
+ .__count = ATOMIC_INIT(2),
.processes = ATOMIC_INIT(1),
.files = ATOMIC_INIT(0),
.sigpending = ATOMIC_INIT(0),
.locked_shm = 0,
+ .user_ns = &init_user_ns,
#ifdef CONFIG_USER_SCHED
.tg = &init_task_group,
#endif
@@ -314,12 +316,15 @@ done:
* IRQ state (as stored in flags) is restored and uidhash_lock released
* upon function exit.
*/
-static inline void free_user(struct user_struct *up, unsigned long flags)
+static void free_user(struct user_struct *up, unsigned long flags)
{
+ printk(KERN_NOTICE "%s: freeing uid (%d) in ns (%p)\n",
+ __func__, up->uid, up->user_ns);
/* restore back the count */
atomic_inc(&up->__count);
spin_unlock_irqrestore(&uidhash_lock, flags);
+ put_user_ns(up->user_ns);
INIT_WORK(&up->work, remove_user_sysfs_dir);
schedule_work(&up->work);
}
@@ -335,13 +340,16 @@ static inline void uids_mutex_unlock(void) { }
* IRQ state (as stored in flags) is restored and uidhash_lock released
* upon function exit.
*/
-static inline void free_user(struct user_struct *up, unsigned long flags)
+static void free_user(struct user_struct *up, unsigned long flags)
{
+ printk(KERN_NOTICE "%s: freeing uid (%d) in ns (%p)\n",
+ __func__, up->uid, up->user_ns);
uid_hash_remove(up);
spin_unlock_irqrestore(&uidhash_lock, flags);
sched_destroy_user(up);
key_put(up->uid_keyring);
key_put(up->session_keyring);
+ put_user_ns(up->user_ns);
kmem_cache_free(uid_cachep, up);
}
@@ -357,7 +365,7 @@ struct user_struct *find_user(uid_t uid)
{
struct user_struct *ret;
unsigned long flags;
- struct user_namespace *ns = current->nsproxy->user_ns;
+ struct user_namespace *ns = current_user()->user_ns;
spin_lock_irqsave(&uidhash_lock, flags);
ret = uid_hash_find(uid, uidhashentry(ns, uid));
@@ -404,6 +412,8 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
if (sched_create_user(new) < 0)
goto out_free_user;
+ new->user_ns = get_user_ns(ns);
+
if (uids_user_create(new))
goto out_destoy_sched;
@@ -432,10 +442,13 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
uids_mutex_unlock();
+ printk(KERN_NOTICE "%s: allocated a uid (%d) in user_ns (%p)\n",
+ __func__, uid, ns);
return up;
out_destoy_sched:
sched_destroy_user(new);
+ put_user_ns(new->user_ns);
out_free_user:
kmem_cache_free(uid_cachep, new);
out_unlock:
@@ -443,33 +456,6 @@ out_unlock:
return NULL;
}
-#ifdef CONFIG_USER_NS
-void release_uids(struct user_namespace *ns)
-{
- int i;
- unsigned long flags;
- struct hlist_head *head;
- struct hlist_node *nd;
-
- spin_lock_irqsave(&uidhash_lock, flags);
- /*
- * collapse the chains so that the user_struct-s will
- * be still alive, but not in hashes. subsequent free_uid()
- * will free them.
- */
- for (i = 0; i < UIDHASH_SZ; i++) {
- head = ns->uidhash_table + i;
- while (!hlist_empty(head)) {
- nd = head->first;
- hlist_del_init(nd);
- }
- }
- spin_unlock_irqrestore(&uidhash_lock, flags);
-
- free_uid(ns->root_user);
-}
-#endif
-
static int __init uid_cache_init(void)
{
int n;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 0d9c51d..86200b1 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -9,17 +9,16 @@
#include <linux/nsproxy.h>
#include <linux/slab.h>
#include <linux/user_namespace.h>
+#include <linux/cred.h>
/*
- * Clone a new ns copying an original user ns, setting refcount to 1
- * @old_ns: namespace to clone
- * Return NULL on error (failure to kmalloc), new ns otherwise
+ * Return a new user_struct which is root in a new user_ns. This is called by
+ * copy_creds(), which will finish setting the target task's credentials.
*/
-static struct user_namespace *clone_user_ns(struct user_namespace *old_ns)
+struct user_struct *create_new_userns(struct task_struct *tsk)
{
struct user_namespace *ns;
- struct user_struct *new_user;
- struct cred *new;
+ struct user_struct *root_user;
int n;
ns = kmalloc(sizeof(struct user_namespace), GFP_KERNEL);
@@ -31,48 +30,22 @@ static struct user_namespace *clone_user_ns(struct user_namespace *old_ns)
for (n = 0; n < UIDHASH_SZ; ++n)
INIT_HLIST_HEAD(ns->uidhash_table + n);
- /* Insert new root user. */
- ns->root_user = alloc_uid(ns, 0);
- if (!ns->root_user) {
+ /* Alloc new root user. */
+ root_user = alloc_uid(ns, 0);
+ if (!root_user) {
kfree(ns);
return ERR_PTR(-ENOMEM);
}
- /* Reset current->user with a new one */
- new_user = alloc_uid(ns, current_uid());
- if (!new_user) {
- free_uid(ns->root_user);
- kfree(ns);
- return ERR_PTR(-ENOMEM);
- }
-
- /* Install the new user */
- new = prepare_creds();
- if (!new) {
- free_uid(new_user);
- free_uid(ns->root_user);
- kfree(ns);
- }
- free_uid(new->user);
- new->user = new_user;
- commit_creds(new);
- return ns;
-}
-
-struct user_namespace * copy_user_ns(int flags, struct user_namespace *old_ns)
-{
- struct user_namespace *new_ns;
-
- BUG_ON(!old_ns);
- get_user_ns(old_ns);
-
- if (!(flags & CLONE_NEWUSER))
- return old_ns;
+ /* save away and pin the creating user */
+ ns->creator = tsk->cred->user; /* tsk is still being created */
+ get_uid(ns->creator);
- new_ns = clone_user_ns(old_ns);
+ /* alloc_uid() incremented the userns refcount. Just set it to 1 */
+ kref_set(&ns->kref, 1);
- put_user_ns(old_ns);
- return new_ns;
+ printk(KERN_NOTICE "allocated a user_ns (%p)\n", ns);
+ return root_user;
}
void free_user_ns(struct kref *kref)
@@ -80,7 +53,8 @@ void free_user_ns(struct kref *kref)
struct user_namespace *ns;
ns = container_of(kref, struct user_namespace, kref);
- release_uids(ns);
+ free_uid(ns->creator);
+ printk(KERN_NOTICE "freeing a user_ns (%p)\n", ns);
kfree(ns);
}
EXPORT_SYMBOL(free_user_ns);
--
1.5.4.3
^ permalink raw reply related [flat|nested] 14+ messages in thread[parent not found: <20081010011917.GA8046-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <20081010011917.GA8046-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-10-10 10:06 ` David Howells [not found] ` <30854.1223633214-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: David Howells @ 2008-10-10 10:06 UTC (permalink / raw) To: Serge E. Hallyn Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, Eric W. Biederman Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > + new->uid = new->euid = new->suid = new->fsuid = 0; > + new->gid = new->egid = new->sgid = new->fsgid = 0; Should the supplementary groups be zapped too? Do the GIDs therein still have meaning in the new user namespace? Note also that eCryptFS is broken by your patch. I suggest adding the attached incremental patch. It makes the following changes: (1) Provides a current_user_ns() macro to wrap accesses to current's user namespace. (2) Fixes eCryptFS. (3) Renames create_new_userns() to create_user_ns() to be more consistent with the other associated functions and because the 'new' in the name is superfluous. (4) Moves the argument and permission checks made for CLONE_NEWUSER to the beginning of do_fork() so that they're done prior to making any attempts at allocation. (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds to fill in rather than have it return the new root user. I don't imagine the new root user being used for anything other than filling in a cred struct. This also permits me to get rid of a get_uid() and a free_uid(), as the reference the creds were holding on the old user_struct can just be transferred to the new namespace's creator pointer. (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under preparation rather than doing it in copy_creds(). David --- diff --git a/fs/ecryptfs/messaging.c b/fs/ecryptfs/messaging.c index 92bf606..eecb8b5 100644 --- a/fs/ecryptfs/messaging.c +++ b/fs/ecryptfs/messaging.c @@ -376,7 +376,7 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid, struct ecryptfs_msg_ctx *msg_ctx; size_t msg_size; struct nsproxy *nsproxy; - struct user_namespace *current_user_ns; + struct user_namespace *tsk_user_ns; uid_t ctx_euid; int rc; @@ -401,9 +401,9 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid, mutex_unlock(&ecryptfs_daemon_hash_mux); goto wake_up; } - current_user_ns = nsproxy->user_ns; + tsk_user_ns = __task_cred(msg_ctx->task)->user->user_ns; ctx_euid = task_euid(msg_ctx->task); - rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, current_user_ns); + rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, tsk_user_ns); rcu_read_unlock(); mutex_unlock(&ecryptfs_daemon_hash_mux); if (rc) { @@ -421,11 +421,11 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid, euid, ctx_euid); goto unlock; } - if (current_user_ns != user_ns) { + if (tsk_user_ns != user_ns) { rc = -EBADMSG; printk(KERN_WARNING "%s: Received message from user_ns " "[0x%p]; expected message from user_ns [0x%p]\n", - __func__, user_ns, nsproxy->user_ns); + __func__, user_ns, tsk_user_ns); goto unlock; } if (daemon->pid != pid) { @@ -486,8 +486,7 @@ ecryptfs_send_message_locked(unsigned int transport, char *data, int data_len, uid_t euid = current_euid(); int rc; - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, - current->nsproxy->user_ns); + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); if (rc || !daemon) { rc = -ENOTCONN; printk(KERN_ERR "%s: User [%d] does not have a daemon " diff --git a/fs/ecryptfs/miscdev.c b/fs/ecryptfs/miscdev.c index 047ac60..efd95a0 100644 --- a/fs/ecryptfs/miscdev.c +++ b/fs/ecryptfs/miscdev.c @@ -47,8 +47,7 @@ ecryptfs_miscdev_poll(struct file *file, poll_table *pt) mutex_lock(&ecryptfs_daemon_hash_mux); /* TODO: Just use file->private_data? */ - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, - current->nsproxy->user_ns); + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); BUG_ON(rc || !daemon); mutex_lock(&daemon->mux); mutex_unlock(&ecryptfs_daemon_hash_mux); @@ -95,11 +94,9 @@ ecryptfs_miscdev_open(struct inode *inode, struct file *file) "count; rc = [%d]\n", __func__, rc); goto out_unlock_daemon_list; } - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, - current->nsproxy->user_ns); + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); if (rc || !daemon) { - rc = ecryptfs_spawn_daemon(&daemon, euid, - current->nsproxy->user_ns, + rc = ecryptfs_spawn_daemon(&daemon, euid, current_user_ns(), task_pid(current)); if (rc) { printk(KERN_ERR "%s: Error attempting to spawn daemon; " @@ -153,8 +150,7 @@ ecryptfs_miscdev_release(struct inode *inode, struct file *file) int rc; mutex_lock(&ecryptfs_daemon_hash_mux); - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, - current->nsproxy->user_ns); + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); BUG_ON(rc || !daemon); mutex_lock(&daemon->mux); BUG_ON(daemon->pid != task_pid(current)); @@ -254,8 +250,7 @@ ecryptfs_miscdev_read(struct file *file, char __user *buf, size_t count, mutex_lock(&ecryptfs_daemon_hash_mux); /* TODO: Just use file->private_data? */ - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, - current->nsproxy->user_ns); + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); BUG_ON(rc || !daemon); mutex_lock(&daemon->mux); if (daemon->flags & ECRYPTFS_DAEMON_ZOMBIE) { @@ -295,7 +290,7 @@ check_list: goto check_list; } BUG_ON(euid != daemon->euid); - BUG_ON(current->nsproxy->user_ns != daemon->user_ns); + BUG_ON(current_user_ns() != daemon->user_ns); BUG_ON(task_pid(current) != daemon->pid); msg_ctx = list_first_entry(&daemon->msg_ctx_out_queue, struct ecryptfs_msg_ctx, daemon_out_list); @@ -468,7 +463,7 @@ ecryptfs_miscdev_write(struct file *file, const char __user *buf, goto out_free; } rc = ecryptfs_miscdev_response(&data[i], packet_size, - euid, current->nsproxy->user_ns, + euid, current_user_ns(), task_pid(current), seq); if (rc) printk(KERN_WARNING "%s: Failed to deliver miscdev " diff --git a/include/linux/cred.h b/include/linux/cred.h index 26c1ab1..7db0049 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -315,6 +315,7 @@ static inline void put_cred(const struct cred *_cred) #define current_fsgid() (current_cred_xxx(fsgid)) #define current_cap() (current_cred_xxx(cap_effective)) #define current_user() (current_cred_xxx(user)) +#define current_user_ns() (current_cred_xxx(user)->user_ns) #define current_security() (current_cred_xxx(security)) #define current_uid_gid(_uid, _gid) \ diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index d6e61a2..315bcd3 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -26,7 +26,7 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns) return ns; } -extern struct user_struct *create_new_userns(struct task_struct *tsk); +extern int create_user_ns(struct cred *new); extern void free_user_ns(struct kref *kref); static inline void put_user_ns(struct user_namespace *ns) @@ -42,9 +42,9 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns) return &init_user_ns; } -static inline struct user_struct *create_new_userns(struct task_struct *tsk) +static inline int create_user_ns(struct cred *new) { - return ERR_PTR(-EINVAL); + return -EINVAL; } static inline void put_user_ns(struct user_namespace *ns) diff --git a/kernel/cred.c b/kernel/cred.c index e98106e..d0f99d8 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -274,7 +274,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) struct thread_group_cred *tgcred; #endif struct cred *new; - struct user_struct *new_root = NULL; + int ret; mutex_init(&p->cred_exec_mutex); @@ -289,33 +289,14 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) return 0; } - if (clone_flags & CLONE_NEWUSER) { - /* - * hopefully the capability check goes away when userns support - * is complete - */ - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - if (clone_flags & CLONE_THREAD) - return -EINVAL; - new_root = create_new_userns(p); - if (IS_ERR(new_root)) - return PTR_ERR(new_root); - } - new = prepare_creds(); - if (!new) { - free_uid(new_root); + if (!new) return -ENOMEM; - } - /* If we created a new user_ns, make its root user - * our user */ - if (new_root) { - new->uid = new->euid = new->suid = new->fsuid = 0; - new->gid = new->egid = new->sgid = new->fsgid = 0; - free_uid(new->user); - new->user = new_root; + if (clone_flags & CLONE_NEWUSER) { + ret = create_user_ns(new); + if (ret < 0) + goto error_put; } #ifdef CONFIG_KEYS @@ -333,10 +314,8 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) * bit */ if (!(clone_flags & CLONE_THREAD)) { tgcred = kmalloc(sizeof(*tgcred), GFP_KERNEL); - if (!tgcred) { - put_cred(new); - return -ENOMEM; - } + if (!tgcred) + goto nomem_put; atomic_set(&tgcred->usage, 1); spin_lock_init(&tgcred->lock); tgcred->process_keyring = NULL; @@ -350,6 +329,12 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) atomic_inc(&new->user->processes); p->cred = p->real_cred = get_cred(new); return 0; + +nomem_put: + ret = -ENOMEM; +error_put: + put_cred(new); + return ret; } /** diff --git a/kernel/fork.c b/kernel/fork.c index c3bb673..2e167d5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1318,6 +1318,20 @@ long do_fork(unsigned long clone_flags, long nr; /* + * Do some preliminary argument and permissions checking before we + * actually start allocating stuff + */ + if (clone_flags & CLONE_NEWUSER) { + if (clone_flags & CLONE_THREAD) + return -EINVAL; + /* hopefully this check will go away when userns support is + * complete + */ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + } + + /* * We hope to recycle these flags after 2.6.26 */ if (unlikely(clone_flags & CLONE_STOPPED)) { diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 86200b1..5da3c41 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -12,10 +12,14 @@ #include <linux/cred.h> /* - * Return a new user_struct which is root in a new user_ns. This is called by - * copy_creds(), which will finish setting the target task's credentials. + * Create a new user namespace, deriving the creator from the user in the + * passed credentials, and replacing that user with the new root user for the + * new namespace. + * + * This is called by copy_creds(), which will finish setting the target task's + * credentials. */ -struct user_struct *create_new_userns(struct task_struct *tsk) +int create_user_ns(struct cred *new) { struct user_namespace *ns; struct user_struct *root_user; @@ -23,7 +27,7 @@ struct user_struct *create_new_userns(struct task_struct *tsk) ns = kmalloc(sizeof(struct user_namespace), GFP_KERNEL); if (!ns) - return ERR_PTR(-ENOMEM); + return -ENOMEM; kref_init(&ns->kref); @@ -34,18 +38,20 @@ struct user_struct *create_new_userns(struct task_struct *tsk) root_user = alloc_uid(ns, 0); if (!root_user) { kfree(ns); - return ERR_PTR(-ENOMEM); + return -ENOMEM; } - /* save away and pin the creating user */ - ns->creator = tsk->cred->user; /* tsk is still being created */ - get_uid(ns->creator); + /* set the new root user in the credentials under preparation */ + ns->creator = new->user; + new->user = root_user; + new->uid = new->euid = new->suid = new->fsuid = 0; + new->gid = new->egid = new->sgid = new->fsgid = 0; /* alloc_uid() incremented the userns refcount. Just set it to 1 */ kref_set(&ns->kref, 1); printk(KERN_NOTICE "allocated a user_ns (%p)\n", ns); - return root_user; + return 0; } void free_user_ns(struct kref *kref) ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <30854.1223633214-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <30854.1223633214-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-10-10 17:06 ` Serge E. Hallyn 2008-10-13 16:01 ` Serge E. Hallyn 1 sibling, 0 replies; 14+ messages in thread From: Serge E. Hallyn @ 2008-10-10 17:06 UTC (permalink / raw) To: David Howells; +Cc: Linux Containers, Eric W. Biederman Quoting David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org): > Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > > > + new->uid = new->euid = new->suid = new->fsuid = 0; > > + new->gid = new->egid = new->sgid = new->fsgid = 0; > > Should the supplementary groups be zapped too? Yup. At one point I was doing that at least in the user_struct, but seem to have lost that, plus need to do it in struct cred. > Do the GIDs therein still have > meaning in the new user namespace? undefined > Note also that eCryptFS is broken by your patch. Ouch. > I suggest adding the attached incremental patch. It makes the following > changes: > > (1) Provides a current_user_ns() macro to wrap accesses to current's user > namespace. > > (2) Fixes eCryptFS. > > (3) Renames create_new_userns() to create_user_ns() to be more consistent > with the other associated functions and because the 'new' in the name is > superfluous. > > (4) Moves the argument and permission checks made for CLONE_NEWUSER to the > beginning of do_fork() so that they're done prior to making any attempts > at allocation. > > (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds > to fill in rather than have it return the new root user. I don't imagine > the new root user being used for anything other than filling in a cred > struct. > > This also permits me to get rid of a get_uid() and a free_uid(), as the > reference the creds were holding on the old user_struct can just be > transferred to the new namespace's creator pointer. > > (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under > preparation rather than doing it in copy_creds(). > > David > --- > diff --git a/fs/ecryptfs/messaging.c b/fs/ecryptfs/messaging.c > index 92bf606..eecb8b5 100644 > --- a/fs/ecryptfs/messaging.c > +++ b/fs/ecryptfs/messaging.c > @@ -376,7 +376,7 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid, > struct ecryptfs_msg_ctx *msg_ctx; > size_t msg_size; > struct nsproxy *nsproxy; > - struct user_namespace *current_user_ns; > + struct user_namespace *tsk_user_ns; > uid_t ctx_euid; > int rc; > > @@ -401,9 +401,9 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid, > mutex_unlock(&ecryptfs_daemon_hash_mux); > goto wake_up; > } > - current_user_ns = nsproxy->user_ns; > + tsk_user_ns = __task_cred(msg_ctx->task)->user->user_ns; > ctx_euid = task_euid(msg_ctx->task); > - rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, current_user_ns); > + rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, tsk_user_ns); > rcu_read_unlock(); > mutex_unlock(&ecryptfs_daemon_hash_mux); > if (rc) { > @@ -421,11 +421,11 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid, > euid, ctx_euid); > goto unlock; > } > - if (current_user_ns != user_ns) { > + if (tsk_user_ns != user_ns) { > rc = -EBADMSG; > printk(KERN_WARNING "%s: Received message from user_ns " > "[0x%p]; expected message from user_ns [0x%p]\n", > - __func__, user_ns, nsproxy->user_ns); > + __func__, user_ns, tsk_user_ns); > goto unlock; > } > if (daemon->pid != pid) { > @@ -486,8 +486,7 @@ ecryptfs_send_message_locked(unsigned int transport, char *data, int data_len, > uid_t euid = current_euid(); > int rc; > > - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, > - current->nsproxy->user_ns); > + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); > if (rc || !daemon) { > rc = -ENOTCONN; > printk(KERN_ERR "%s: User [%d] does not have a daemon " > diff --git a/fs/ecryptfs/miscdev.c b/fs/ecryptfs/miscdev.c > index 047ac60..efd95a0 100644 > --- a/fs/ecryptfs/miscdev.c > +++ b/fs/ecryptfs/miscdev.c > @@ -47,8 +47,7 @@ ecryptfs_miscdev_poll(struct file *file, poll_table *pt) > > mutex_lock(&ecryptfs_daemon_hash_mux); > /* TODO: Just use file->private_data? */ > - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, > - current->nsproxy->user_ns); > + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); > BUG_ON(rc || !daemon); > mutex_lock(&daemon->mux); > mutex_unlock(&ecryptfs_daemon_hash_mux); > @@ -95,11 +94,9 @@ ecryptfs_miscdev_open(struct inode *inode, struct file *file) > "count; rc = [%d]\n", __func__, rc); > goto out_unlock_daemon_list; > } > - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, > - current->nsproxy->user_ns); > + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); > if (rc || !daemon) { > - rc = ecryptfs_spawn_daemon(&daemon, euid, > - current->nsproxy->user_ns, > + rc = ecryptfs_spawn_daemon(&daemon, euid, current_user_ns(), > task_pid(current)); > if (rc) { > printk(KERN_ERR "%s: Error attempting to spawn daemon; " > @@ -153,8 +150,7 @@ ecryptfs_miscdev_release(struct inode *inode, struct file *file) > int rc; > > mutex_lock(&ecryptfs_daemon_hash_mux); > - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, > - current->nsproxy->user_ns); > + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); > BUG_ON(rc || !daemon); > mutex_lock(&daemon->mux); > BUG_ON(daemon->pid != task_pid(current)); > @@ -254,8 +250,7 @@ ecryptfs_miscdev_read(struct file *file, char __user *buf, size_t count, > > mutex_lock(&ecryptfs_daemon_hash_mux); > /* TODO: Just use file->private_data? */ > - rc = ecryptfs_find_daemon_by_euid(&daemon, euid, > - current->nsproxy->user_ns); > + rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); > BUG_ON(rc || !daemon); > mutex_lock(&daemon->mux); > if (daemon->flags & ECRYPTFS_DAEMON_ZOMBIE) { > @@ -295,7 +290,7 @@ check_list: > goto check_list; > } > BUG_ON(euid != daemon->euid); > - BUG_ON(current->nsproxy->user_ns != daemon->user_ns); > + BUG_ON(current_user_ns() != daemon->user_ns); > BUG_ON(task_pid(current) != daemon->pid); > msg_ctx = list_first_entry(&daemon->msg_ctx_out_queue, > struct ecryptfs_msg_ctx, daemon_out_list); > @@ -468,7 +463,7 @@ ecryptfs_miscdev_write(struct file *file, const char __user *buf, > goto out_free; > } > rc = ecryptfs_miscdev_response(&data[i], packet_size, > - euid, current->nsproxy->user_ns, > + euid, current_user_ns(), > task_pid(current), seq); > if (rc) > printk(KERN_WARNING "%s: Failed to deliver miscdev " > diff --git a/include/linux/cred.h b/include/linux/cred.h > index 26c1ab1..7db0049 100644 > --- a/include/linux/cred.h > +++ b/include/linux/cred.h > @@ -315,6 +315,7 @@ static inline void put_cred(const struct cred *_cred) > #define current_fsgid() (current_cred_xxx(fsgid)) > #define current_cap() (current_cred_xxx(cap_effective)) > #define current_user() (current_cred_xxx(user)) > +#define current_user_ns() (current_cred_xxx(user)->user_ns) > #define current_security() (current_cred_xxx(security)) > > #define current_uid_gid(_uid, _gid) \ > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index d6e61a2..315bcd3 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -26,7 +26,7 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns) > return ns; > } > > -extern struct user_struct *create_new_userns(struct task_struct *tsk); > +extern int create_user_ns(struct cred *new); > extern void free_user_ns(struct kref *kref); > > static inline void put_user_ns(struct user_namespace *ns) > @@ -42,9 +42,9 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns) > return &init_user_ns; > } > > -static inline struct user_struct *create_new_userns(struct task_struct *tsk) > +static inline int create_user_ns(struct cred *new) > { > - return ERR_PTR(-EINVAL); > + return -EINVAL; > } > > static inline void put_user_ns(struct user_namespace *ns) > diff --git a/kernel/cred.c b/kernel/cred.c > index e98106e..d0f99d8 100644 > --- a/kernel/cred.c > +++ b/kernel/cred.c > @@ -274,7 +274,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) > struct thread_group_cred *tgcred; > #endif > struct cred *new; > - struct user_struct *new_root = NULL; > + int ret; > > mutex_init(&p->cred_exec_mutex); > > @@ -289,33 +289,14 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) > return 0; > } > > - if (clone_flags & CLONE_NEWUSER) { > - /* > - * hopefully the capability check goes away when userns support > - * is complete > - */ > - if (!capable(CAP_SYS_ADMIN)) > - return -EPERM; > - if (clone_flags & CLONE_THREAD) > - return -EINVAL; > - new_root = create_new_userns(p); > - if (IS_ERR(new_root)) > - return PTR_ERR(new_root); > - } > - > new = prepare_creds(); > - if (!new) { > - free_uid(new_root); > + if (!new) > return -ENOMEM; > - } > > - /* If we created a new user_ns, make its root user > - * our user */ > - if (new_root) { > - new->uid = new->euid = new->suid = new->fsuid = 0; > - new->gid = new->egid = new->sgid = new->fsgid = 0; > - free_uid(new->user); > - new->user = new_root; > + if (clone_flags & CLONE_NEWUSER) { > + ret = create_user_ns(new); I keep thinking that I need to pass the task_struct to set some of it's credentials :) > + if (ret < 0) > + goto error_put; > } > > #ifdef CONFIG_KEYS > @@ -333,10 +314,8 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) > * bit */ > if (!(clone_flags & CLONE_THREAD)) { > tgcred = kmalloc(sizeof(*tgcred), GFP_KERNEL); > - if (!tgcred) { > - put_cred(new); > - return -ENOMEM; > - } > + if (!tgcred) > + goto nomem_put; > atomic_set(&tgcred->usage, 1); > spin_lock_init(&tgcred->lock); > tgcred->process_keyring = NULL; > @@ -350,6 +329,12 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) > atomic_inc(&new->user->processes); > p->cred = p->real_cred = get_cred(new); > return 0; > + > +nomem_put: > + ret = -ENOMEM; > +error_put: > + put_cred(new); > + return ret; > } > > /** > diff --git a/kernel/fork.c b/kernel/fork.c > index c3bb673..2e167d5 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1318,6 +1318,20 @@ long do_fork(unsigned long clone_flags, > long nr; > > /* > + * Do some preliminary argument and permissions checking before we > + * actually start allocating stuff > + */ > + if (clone_flags & CLONE_NEWUSER) { > + if (clone_flags & CLONE_THREAD) > + return -EINVAL; > + /* hopefully this check will go away when userns support is > + * complete > + */ > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + } > + > + /* > * We hope to recycle these flags after 2.6.26 > */ > if (unlikely(clone_flags & CLONE_STOPPED)) { > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 86200b1..5da3c41 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -12,10 +12,14 @@ > #include <linux/cred.h> > > /* > - * Return a new user_struct which is root in a new user_ns. This is called by > - * copy_creds(), which will finish setting the target task's credentials. > + * Create a new user namespace, deriving the creator from the user in the > + * passed credentials, and replacing that user with the new root user for the > + * new namespace. > + * > + * This is called by copy_creds(), which will finish setting the target task's > + * credentials. > */ > -struct user_struct *create_new_userns(struct task_struct *tsk) > +int create_user_ns(struct cred *new) > { > struct user_namespace *ns; > struct user_struct *root_user; > @@ -23,7 +27,7 @@ struct user_struct *create_new_userns(struct task_struct *tsk) > > ns = kmalloc(sizeof(struct user_namespace), GFP_KERNEL); > if (!ns) > - return ERR_PTR(-ENOMEM); > + return -ENOMEM; > > kref_init(&ns->kref); > > @@ -34,18 +38,20 @@ struct user_struct *create_new_userns(struct task_struct *tsk) > root_user = alloc_uid(ns, 0); > if (!root_user) { > kfree(ns); > - return ERR_PTR(-ENOMEM); > + return -ENOMEM; > } > > - /* save away and pin the creating user */ > - ns->creator = tsk->cred->user; /* tsk is still being created */ > - get_uid(ns->creator); Sigh i was about to comment about this get_uid not being needed anymore. The leading - is too small in this font... Patch looks great. Thanks, David. I'll give it a test and do the clearing of group_info on top of it this weekend. > + /* set the new root user in the credentials under preparation */ > + ns->creator = new->user; > + new->user = root_user; > + new->uid = new->euid = new->suid = new->fsuid = 0; > + new->gid = new->egid = new->sgid = new->fsgid = 0; > > /* alloc_uid() incremented the userns refcount. Just set it to 1 */ > kref_set(&ns->kref, 1); > > printk(KERN_NOTICE "allocated a user_ns (%p)\n", ns); > - return root_user; > + return 0; > } > > void free_user_ns(struct kref *kref) thanks, -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <30854.1223633214-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-10-10 17:06 ` Serge E. Hallyn @ 2008-10-13 16:01 ` Serge E. Hallyn [not found] ` <20081013160144.GA10359-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> [not found] ` <20081013214108.GA4701-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 1 sibling, 2 replies; 14+ messages in thread From: Serge E. Hallyn @ 2008-10-13 16:01 UTC (permalink / raw) To: David Howells; +Cc: Linux Containers, Eric W. Biederman Quoting David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org): > Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > > > + new->uid = new->euid = new->suid = new->fsuid = 0; > > + new->gid = new->egid = new->sgid = new->fsgid = 0; > > Should the supplementary groups be zapped too? Do the GIDs therein still have > meaning in the new user namespace? > > Note also that eCryptFS is broken by your patch. > > I suggest adding the attached incremental patch. It makes the following > changes: > > (1) Provides a current_user_ns() macro to wrap accesses to current's user > namespace. > > (2) Fixes eCryptFS. > > (3) Renames create_new_userns() to create_user_ns() to be more consistent > with the other associated functions and because the 'new' in the name is > superfluous. > > (4) Moves the argument and permission checks made for CLONE_NEWUSER to the > beginning of do_fork() so that they're done prior to making any attempts > at allocation. > > (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds > to fill in rather than have it return the new root user. I don't imagine > the new root user being used for anything other than filling in a cred > struct. > > This also permits me to get rid of a get_uid() and a free_uid(), as the > reference the creds were holding on the old user_struct can just be > transferred to the new namespace's creator pointer. > > (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under > preparation rather than doing it in copy_creds(). Hmm, with this patch, with CONFIG_KEYS=y users in child user_namespaces never get freed. Ones in the init_user_ns do, and with CONFIG_KEYS=n, those in child user_namespaces do as well. I don't see anything obvious in copy_creds() that would cause this... (also, when CONFIG_KEYS=n, then the enomem label in copy_creds() is unused - it might be kind of ugly to put just those two lines under #ifdef, though) -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20081013160144.GA10359-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <20081013160144.GA10359-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-10-13 21:41 ` Serge E. Hallyn 2008-10-14 17:50 ` David Howells 1 sibling, 0 replies; 14+ messages in thread From: Serge E. Hallyn @ 2008-10-13 21:41 UTC (permalink / raw) To: David Howells; +Cc: Linux Containers, Eric W. Biederman Quoting Serge E. Hallyn (serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org): > Quoting David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org): > > Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > > > > > + new->uid = new->euid = new->suid = new->fsuid = 0; > > > + new->gid = new->egid = new->sgid = new->fsgid = 0; > > > > Should the supplementary groups be zapped too? Do the GIDs therein still have > > meaning in the new user namespace? > > > > Note also that eCryptFS is broken by your patch. > > > > I suggest adding the attached incremental patch. It makes the following > > changes: > > > > (1) Provides a current_user_ns() macro to wrap accesses to current's user > > namespace. > > > > (2) Fixes eCryptFS. > > > > (3) Renames create_new_userns() to create_user_ns() to be more consistent > > with the other associated functions and because the 'new' in the name is > > superfluous. > > > > (4) Moves the argument and permission checks made for CLONE_NEWUSER to the > > beginning of do_fork() so that they're done prior to making any attempts > > at allocation. > > > > (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds > > to fill in rather than have it return the new root user. I don't imagine > > the new root user being used for anything other than filling in a cred > > struct. > > > > This also permits me to get rid of a get_uid() and a free_uid(), as the > > reference the creds were holding on the old user_struct can just be > > transferred to the new namespace's creator pointer. > > > > (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under > > preparation rather than doing it in copy_creds(). > > Hmm, with this patch, with CONFIG_KEYS=y users in child user_namespaces > never get freed. Ones in the init_user_ns do, and with CONFIG_KEYS=n, > those in child user_namespaces do as well. > > I don't see anything obvious in copy_creds() that would cause this... Ok, actually I was mis-interpreting through inconsistent testing. User namespaces have nothing to do with it. With the following patch applied to the base next-creds-subsys branch, you can track that if you login as root, then do 'su hallyn', where hallyn is uid 500, then uid 500 gets allocated twice. So free is never called on it. I assume the problem is somewhere in kernel/sys.c and kernel/cred.c, but haven't quite figured it out yet. -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <20081013160144.GA10359-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-10-13 21:41 ` Serge E. Hallyn @ 2008-10-14 17:50 ` David Howells [not found] ` <29703.1224006618-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> [not found] ` <20081014214327.GA28545-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 1 sibling, 2 replies; 14+ messages in thread From: David Howells @ 2008-10-14 17:50 UTC (permalink / raw) To: Serge E. Hallyn Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, Eric W. Biederman Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > Hmm, with this patch, with CONFIG_KEYS=y users in child user_namespaces > never get freed. Ones in the init_user_ns do, and with CONFIG_KEYS=n, > those in child user_namespaces do as well. > > I don't see anything obvious in copy_creds() that would cause this... Try looking in lookup_user_key(). Also, can you try the attached patch? I've also attached a better version of your debugging patch, one that differentiates between allocated and reused user_structs. David --- From: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> CRED: Fix creds refcounting in lookup_user_key() Make lookup_user_key() drop at all return points the reference to the current creds that it took at the top of the function Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- security/keys/process_keys.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c index e40f61d..2d6076d 100644 --- a/security/keys/process_keys.c +++ b/security/keys/process_keys.c @@ -667,6 +667,7 @@ try_again: goto invalid_key; error: + put_cred(cred); return key_ref; invalid_key: --- From e00a2d98dd1086b0c863d8b416df33280c7c2574 Mon Sep 17 00:00:00 2001 From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Date: Mon, 13 Oct 2008 16:36:05 -0500 Subject: [PATCH 1/1] creds: print user_struct refcounts print user_struct refcounts at alloc, and print msg at uid free. Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- kernel/user.c | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/kernel/user.c b/kernel/user.c index d476307..073296e 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -317,6 +317,7 @@ done: static inline void free_user(struct user_struct *up, unsigned long flags) { /* restore back the count */ + printk(KERN_NOTICE "%s: freeing a uid (%d)\n", __func__, up->uid); atomic_inc(&up->__count); spin_unlock_irqrestore(&uidhash_lock, flags); @@ -337,6 +338,7 @@ static inline void uids_mutex_unlock(void) { } */ static inline void free_user(struct user_struct *up, unsigned long flags) { + printk(KERN_NOTICE "%s: freeing a uid (%d)\n", __func__, up->uid); uid_hash_remove(up); spin_unlock_irqrestore(&uidhash_lock, flags); sched_destroy_user(up); @@ -422,16 +424,24 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid) key_put(new->uid_keyring); key_put(new->session_keyring); kmem_cache_free(uid_cachep, new); + printk(KERN_NOTICE "%s: reuse a uid (%d) (cnt %u)\n", + __func__, uid, atomic_read(&up->__count)); + } else { uid_hash_insert(new, hashent); up = new; + printk(KERN_NOTICE "%s: alloced a uid (%d) (cnt %u)\n", + __func__, uid, atomic_read(&up->__count)); + } spin_unlock_irq(&uidhash_lock); + } else { + printk(KERN_NOTICE "%s: reuse a uid (%d) (cnt %u)\n", + __func__, uid, atomic_read(&up->__count)); } uids_mutex_unlock(); - return up; out_destoy_sched: ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <29703.1224006618-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <29703.1224006618-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-10-14 21:43 ` Serge E. Hallyn 0 siblings, 0 replies; 14+ messages in thread From: Serge E. Hallyn @ 2008-10-14 21:43 UTC (permalink / raw) To: David Howells; +Cc: Linux Containers, Eric W. Biederman Quoting David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org): > Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > > > Hmm, with this patch, with CONFIG_KEYS=y users in child user_namespaces > > never get freed. Ones in the init_user_ns do, and with CONFIG_KEYS=n, > > those in child user_namespaces do as well. > > > > I don't see anything obvious in copy_creds() that would cause this... > > Try looking in lookup_user_key(). Also, can you try the attached patch? That patch seems to fix it! Also seemed fine under ltp. > I've also attached a better version of your debugging patch, one that > differentiates between allocated and reused user_structs. > > David > --- > From: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > CRED: Fix creds refcounting in lookup_user_key() > > Make lookup_user_key() drop at all return points the reference to the current > creds that it took at the top of the function > > Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Tested-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> > --- > > security/keys/process_keys.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > > diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c > index e40f61d..2d6076d 100644 > --- a/security/keys/process_keys.c > +++ b/security/keys/process_keys.c > @@ -667,6 +667,7 @@ try_again: > goto invalid_key; > > error: > + put_cred(cred); > return key_ref; > > invalid_key: > > --- > >From e00a2d98dd1086b0c863d8b416df33280c7c2574 Mon Sep 17 00:00:00 2001 > From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> > Date: Mon, 13 Oct 2008 16:36:05 -0500 > Subject: [PATCH 1/1] creds: print user_struct refcounts > > print user_struct refcounts at alloc, and print msg at uid free. Yup, except you've tossed the user namespace refcounting checks :) (But that's ok, we all know user namespaces are bug-free.) Anyway, will you push the refcounting fix to James? Then I'll resend the user namespaces patch merged with your patch (if you don't mind) plus the clearing of groups and keyrings for a new user-namespace tomorrow. thanks, -serge > Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> > Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > > kernel/user.c | 12 +++++++++++- > 1 files changed, 11 insertions(+), 1 deletions(-) > > > diff --git a/kernel/user.c b/kernel/user.c > index d476307..073296e 100644 > --- a/kernel/user.c > +++ b/kernel/user.c > @@ -317,6 +317,7 @@ done: > static inline void free_user(struct user_struct *up, unsigned long flags) > { > /* restore back the count */ > + printk(KERN_NOTICE "%s: freeing a uid (%d)\n", __func__, up->uid); > atomic_inc(&up->__count); > spin_unlock_irqrestore(&uidhash_lock, flags); > > @@ -337,6 +338,7 @@ static inline void uids_mutex_unlock(void) { } > */ > static inline void free_user(struct user_struct *up, unsigned long flags) > { > + printk(KERN_NOTICE "%s: freeing a uid (%d)\n", __func__, up->uid); > uid_hash_remove(up); > spin_unlock_irqrestore(&uidhash_lock, flags); > sched_destroy_user(up); > @@ -422,16 +424,24 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid) > key_put(new->uid_keyring); > key_put(new->session_keyring); > kmem_cache_free(uid_cachep, new); > + printk(KERN_NOTICE "%s: reuse a uid (%d) (cnt %u)\n", > + __func__, uid, atomic_read(&up->__count)); > + > } else { > uid_hash_insert(new, hashent); > up = new; > + printk(KERN_NOTICE "%s: alloced a uid (%d) (cnt %u)\n", > + __func__, uid, atomic_read(&up->__count)); > + > } > spin_unlock_irq(&uidhash_lock); > > + } else { > + printk(KERN_NOTICE "%s: reuse a uid (%d) (cnt %u)\n", > + __func__, uid, atomic_read(&up->__count)); > } > > uids_mutex_unlock(); > - > return up; > > out_destoy_sched: ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20081014214327.GA28545-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <20081014214327.GA28545-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-10-14 21:47 ` David Howells 0 siblings, 0 replies; 14+ messages in thread From: David Howells @ 2008-10-14 21:47 UTC (permalink / raw) To: Serge E. Hallyn Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, Eric W. Biederman Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > That patch seems to fix it! Also seemed fine under ltp. Also for me. > Yup, except you've tossed the user namespace refcounting checks :) What checks? Your patch just had three printks in it. > Anyway, will you push the refcounting fix to James? Yes. David ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20081013214108.GA4701-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <20081013214108.GA4701-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-10-13 23:10 ` David Howells [not found] ` <5306.1223939456-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: David Howells @ 2008-10-13 23:10 UTC (permalink / raw) To: Serge E. Hallyn Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, Eric W. Biederman Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > With the following patch applied to the base next-creds-subsys branch, > you can track that if you login as root, then do 'su hallyn', where > hallyn is uid 500, then uid 500 gets allocated twice. So free is never > called on it. Which following patch? Actually, I've suspected that the user_struct accounting is not quite right for a while. Even before I did my creds stuff, I'd occasionally multiple per-UID keyrings cropping up with the same ID - indicating multiple user_structs for the same UID. David ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <5306.1223939456-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH RFC] User namespaces: general cleanups [not found] ` <5306.1223939456-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-10-14 14:33 ` Serge E. Hallyn 0 siblings, 0 replies; 14+ messages in thread From: Serge E. Hallyn @ 2008-10-14 14:33 UTC (permalink / raw) To: David Howells; +Cc: Linux Containers, Eric W. Biederman Quoting David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org): > Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > > > With the following patch applied to the base next-creds-subsys branch, > > you can track that if you login as root, then do 'su hallyn', where > > hallyn is uid 500, then uid 500 gets allocated twice. So free is never > > called on it. > > Which following patch? Argh. The one below :) > Actually, I've suspected that the user_struct accounting is not quite right for > a while. Even before I did my creds stuff, I'd occasionally multiple per-UID > keyrings cropping up with the same ID - indicating multiple user_structs for > the same UID. > > David From e00a2d98dd1086b0c863d8b416df33280c7c2574 Mon Sep 17 00:00:00 2001 From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Date: Mon, 13 Oct 2008 16:36:05 -0500 Subject: [PATCH 1/1] creds: print user_struct refcounts print user_struct refcounts at alloc, and print msg at uid free. Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> --- kernel/user.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/kernel/user.c b/kernel/user.c index ed4dc57..3b9fd14 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -317,6 +317,7 @@ done: static inline void free_user(struct user_struct *up, unsigned long flags) { /* restore back the count */ + printk(KERN_NOTICE "%s: freeing a uid (%d)\n", __func__, up->uid); atomic_inc(&up->__count); spin_unlock_irqrestore(&uidhash_lock, flags); @@ -337,6 +338,7 @@ static inline void uids_mutex_unlock(void) { } */ static inline void free_user(struct user_struct *up, unsigned long flags) { + printk(KERN_NOTICE "%s: freeing a uid (%d)\n", __func__, up->uid); uid_hash_remove(up); spin_unlock_irqrestore(&uidhash_lock, flags); sched_destroy_user(up); @@ -431,6 +433,8 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid) } uids_mutex_unlock(); + printk(KERN_NOTICE "%s: alloced a uid (%d) (cnt %lu)\n", __func__, + uid, atomic_read(&up->__count)); return up; -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Keys and namespaces 2008-10-10 1:19 [PATCH RFC] User namespaces: general cleanups Serge E. Hallyn [not found] ` <20081010011917.GA8046-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-10-10 12:58 ` David Howells [not found] ` <414.1223643503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 14+ messages in thread From: David Howells @ 2008-10-10 12:58 UTC (permalink / raw) To: Serge E. Hallyn Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, Eric W. Biederman On the subject of namespaces: I still need to look at providing a key ID and keyring name namespace. Is it worth me just using the user_namespace? A number of parameters are per-UID (such as the key quotas), so it might very well make sense to do that. That way, user_namespace could actually be a credentials namespace. If that is the case, CLONE_NEWUSER should also set up (clone?) the keys and keyrings attached to the parent. This possibly needs to be done anyway as the keys have UID and GID references that may be invalid in the new namespace. How do the UIDs and GIDs in different namespaces map, anyway? Furthermore, some keys may actually represent foreign user details; perhaps NTFS or CIFS user IDs for example. Should those be discarded on CLONE_NEWUSER? David ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <414.1223643503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Keys and namespaces [not found] ` <414.1223643503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2008-10-10 16:46 ` Serge E. Hallyn 2008-10-10 22:30 ` Eric W. Biederman 1 sibling, 0 replies; 14+ messages in thread From: Serge E. Hallyn @ 2008-10-10 16:46 UTC (permalink / raw) To: David Howells; +Cc: Linux Containers, Eric W. Biederman Quoting David Howells (dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org): > > On the subject of namespaces: I still need to look at providing a key ID and > keyring name namespace. > > Is it worth me just using the user_namespace? A number of parameters are > per-UID (such as the key quotas), so it might very well make sense to do that. I think it is. The semantics will be interesting though. > That way, user_namespace could actually be a credentials namespace. > > If that is the case, CLONE_NEWUSER should also set up (clone?) the keys and > keyrings attached to the parent. This possibly needs to be done anyway as the > keys have UID and GID references that may be invalid in the new namespace. > > How do the UIDs and GIDs in different namespaces map, anyway? Here is how we want it to work. I will refer to the initial user namespace as 0, and (500,0) will refer to uid 500 in the initial user namespace. If (500,0) creates a new user namespace and that user namespace is 1, then the cloned task will be owned by uid (0,1). The creator of userns 1 is (500,0). The cloned task, being root, will be able to exercise capabilities over objects in its own namespace, but not, of course, over objects outside its own namespace. So let's say it clones a task which does setuid(501). Now the task owned by (0,1) can kill the task owned by (501,1). But it can't kill tasks in user_ns 0, even those owned by (500,0), its creator. > Furthermore, some keys may actually represent foreign user details; perhaps > NTFS or CIFS user IDs for example. Should those be discarded on CLONE_NEWUSER? The intent is to implement roughly the following for file accesses: * If (501,1) creates a file in the above scenario, then it is owned by both (501,1) and (500,0), just as the task was. * File access may be granted at any level in the userns hierarchy, so (500,0) may access the above file. If (0,0) executs a setuid-(0,1) file, it will do so as a setuid(500,0) file. * Cross-userns access defaults to user-other. So (0,1) will receive the 'other' permissions to any files owned by (0,0) which it happens to be able to get to. I've taken a stab at the file accesses part, but will be starting over from scratch after I try to address the capabilities part. A great deal of discussion, mainly with Subject including 'user namespaces' or user_ns, is in the containers mailing list archive. So now the question is how should we treat keys :) I'm actually confused about the current (in next-creds) approach. The user_struct has uid_keyring and session_keyring. But then struct cred has its own 4 fields under CONFIG_KEYS - what are they? Semantically, I think it makes sense at clone(CLONE_NEWUSER) to give the new root user_struct in the new user_ns a copy of the userns->creator's keychains. Then that task can decide whether to clear out the keychains or not. This way the user can start a container in his ecryptfs partition, and, if he wants to, unlink the keys out before starting a user shell in his container, for instance (not that that really makes sense). Does that make sense? (I imagine it'll have to be more complicated than that :) -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Keys and namespaces [not found] ` <414.1223643503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2008-10-10 16:46 ` Serge E. Hallyn @ 2008-10-10 22:30 ` Eric W. Biederman [not found] ` <m1hc7k13s2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> 1 sibling, 1 reply; 14+ messages in thread From: Eric W. Biederman @ 2008-10-10 22:30 UTC (permalink / raw) To: David Howells; +Cc: Linux Containers David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > On the subject of namespaces: I still need to look at providing a key ID and > keyring name namespace. > > Is it worth me just using the user_namespace? A number of parameters are > per-UID (such as the key quotas), so it might very well make sense to do that. Yes. > That way, user_namespace could actually be a credentials namespace. I have always thought of it as a security token namespace. I'm not certain about the term credential as that usually means something like password. In the context of keys where you are caching something like a kerberos token it makes more sense. Even there it isn't a secret key that is being cached but more of a capability that says that the secret key was known at some point. > If that is the case, CLONE_NEWUSER should also set up (clone?) the keys and > keyrings attached to the parent. This possibly needs to be done anyway as the > keys have UID and GID references that may be invalid in the new namespace. > > How do the UIDs and GIDs in different namespaces map, anyway? At the simplest level they do not map between namespaces. At a practical level the users in a user namespace should have no more capabilities than the creator of the user namespace and the creator of the user namespace should have the power to kill all processes and remove all objects created by users in that user namespace. For filesystem access where we have talked about this in some detail. The current idea is to have a user namespace parameter on struct inode. All security credential comparisons will need to become comparisons of both user_namespace and uid to match. > Furthermore, some keys may actually represent foreign user details; perhaps > NTFS or CIFS user IDs for example. Should those be discarded on CLONE_NEWUSER? After CLONE_NEWUSER the user id and guid should be 0, and no keys should be held. With respect to holding credentials it should be essentially the same state as when init starts. At least as a general rule. If keys are the kind of thing that can be passed between namespaces allowing user space to control which keys get passed makes sense. We allow the controlling tty and process groups to remain the same across pid namespace creation and require processes to change them explicitly if needed. It is definitely a requirement to get to the state init starts in, either by kernel fiat or normal user space operations. A word about foreign user details. In my conception a user namespace is essentially a single administration domain of uids and gids, and as such can span and does today span several machines. user namespaces are unique in that their tokens are assigned administratively and the kernel has no control over lifetime of tokens. In that concept foreign users doesn't make a lot of sense to me. Keys certainly provide the ability to several different implementations of user identification. How that interacts with a web of administration of user permissions to objects not living in the kernel I don't know. Ultimately I think putting keys in the user namespace is a good match. As it is a way to talk about stored user identification either in memory or on disk, and how that maps to what a user is allowed to do. Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <m1hc7k13s2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>]
* Re: Keys and namespaces [not found] ` <m1hc7k13s2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org> @ 2008-10-13 16:27 ` Serge E. Hallyn 0 siblings, 0 replies; 14+ messages in thread From: Serge E. Hallyn @ 2008-10-13 16:27 UTC (permalink / raw) To: Eric W. Biederman; +Cc: David Howells, Linux Containers Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > > > On the subject of namespaces: I still need to look at providing a key ID and > > keyring name namespace. > > > > Is it worth me just using the user_namespace? A number of parameters are > > per-UID (such as the key quotas), so it might very well make sense to do that. > > Yes. > > > That way, user_namespace could actually be a credentials namespace. > > I have always thought of it as a security token namespace. > > I'm not certain about the term credential as that usually means something > like password. In the context of keys where you are caching something > like a kerberos token it makes more sense. Even there it isn't a > secret key that is being cached but more of a capability that says > that the secret key was known at some point. > > > If that is the case, CLONE_NEWUSER should also set up (clone?) the keys and > > keyrings attached to the parent. This possibly needs to be done anyway as the > > keys have UID and GID references that may be invalid in the new namespace. > > > > How do the UIDs and GIDs in different namespaces map, anyway? > > At the simplest level they do not map between namespaces. > > At a practical level the users in a user namespace should have > no more capabilities than the creator of the user namespace and > the creator of the user namespace should have the power to kill > all processes and remove all objects created by users in that > user namespace. > > For filesystem access where we have talked about this in some detail. > The current idea is to have a user namespace parameter on struct inode. > All security credential comparisons will need to become comparisons > of both user_namespace and uid to match. > > > Furthermore, some keys may actually represent foreign user details; perhaps > > NTFS or CIFS user IDs for example. Should those be discarded on CLONE_NEWUSER? > > After CLONE_NEWUSER the user id and guid should be 0, and no keys should be > held. With respect to holding credentials it should be essentially the same > state as when init starts. At least as a general rule. > > If keys are the kind of thing that can be passed between namespaces > allowing user space to control which keys get passed makes sense. We > allow the controlling tty and process groups to remain the same across > pid namespace creation and require processes to change them explicitly > if needed. It is definitely a requirement to get to the state init > starts in, either by kernel fiat or normal user space operations. Ok so if I read this right, you're ok with either one of: 1. on clone(CLONE_NEWUSER), the new task get empty keyrings. If userspace wants the new task to have its creator's keys, it reloads them. 2. on clone(CLONE_NEWUSER), the new task gets a copy of the creator's keys. If userspace wants the new task to have an empty keyring, it can clear it. I had been thinking 2 might be easier in the face of ecryptfs, but I think 1 is the way to go. > A word about foreign user details. > > In my conception a user namespace is essentially a single administration > domain of uids and gids, and as such can span and does today span several machines. > user namespaces are unique in that their tokens are assigned administratively > and the kernel has no control over lifetime of tokens. > > In that concept foreign users doesn't make a lot of sense to me. > > Keys certainly provide the ability to several different implementations > of user identification. How that interacts with a web of administration > of user permissions to objects not living in the kernel I don't know. > > Ultimately I think putting keys in the user namespace is a good match. > As it is a way to talk about stored user identification either in memory > or on disk, and how that maps to what a user is allowed to do. > > Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-10-14 21:47 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-10 1:19 [PATCH RFC] User namespaces: general cleanups Serge E. Hallyn
[not found] ` <20081010011917.GA8046-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-10 10:06 ` David Howells
[not found] ` <30854.1223633214-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-10-10 17:06 ` Serge E. Hallyn
2008-10-13 16:01 ` Serge E. Hallyn
[not found] ` <20081013160144.GA10359-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-13 21:41 ` Serge E. Hallyn
2008-10-14 17:50 ` David Howells
[not found] ` <29703.1224006618-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-10-14 21:43 ` Serge E. Hallyn
[not found] ` <20081014214327.GA28545-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-14 21:47 ` David Howells
[not found] ` <20081013214108.GA4701-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-13 23:10 ` David Howells
[not found] ` <5306.1223939456-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-10-14 14:33 ` Serge E. Hallyn
2008-10-10 12:58 ` Keys and namespaces David Howells
[not found] ` <414.1223643503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-10-10 16:46 ` Serge E. Hallyn
2008-10-10 22:30 ` Eric W. Biederman
[not found] ` <m1hc7k13s2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-13 16:27 ` Serge E. Hallyn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.