* [PATCH 0/6] user namespaces: introduction
@ 2008-07-26 0:27 Serge E. Hallyn
[not found] ` <20080726002700.GA29686-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-26 0:27 UTC (permalink / raw)
To: Linux Containers
Following is a set of user namespace patches I've been playing with
this week.
The first two patches are I believe fixes which should go in regardless
of which direction user namespaces take.
The rest of the patches are one approach to providing default cross-userns
isolation for files. Any filesystem can provide its own intelligent
cross-userns userid equivalence checks by defining its own permission
function, which is what Eric and I have been talking about doing.
The next step is probably to handle some of the task-to-task
cross-userns checks.
Comments appreciated.
thanks,
-serge
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/6] user namespaces: introduce user_struct->user_namespace relationship
[not found] ` <20080726002700.GA29686-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-07-26 0:27 ` Serge E. Hallyn
[not found] ` <20080726002725.GA29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-26 0:27 ` [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct Serge E. Hallyn
` (4 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-26 0:27 UTC (permalink / raw)
To: Linux Containers
From 9382f22a6c751e90baa4e7f3ba24c509e50a47a8 Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Tue, 22 Jul 2008 13:31:37 -0500
Subject: [PATCH 1/6] user namespaces: introduce user_struct->user_namespace relationship
When a task does clone(CLONE_NEWNS), the task's user is the 'creator' of the
new user_namespace, and the user_namespace is tacked onto a list of those
created by this user.
Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
include/linux/sched.h | 1 +
include/linux/user_namespace.h | 1 +
kernel/user.c | 7 +++++++
kernel/user_namespace.c | 20 +++++++++++---------
4 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dc7e592..cf36e14 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -600,6 +600,7 @@ struct user_struct {
/* Hash table maintenance information */
struct hlist_node uidhash_node;
uid_t uid;
+ struct user_namespace *user_namespace;
#ifdef CONFIG_USER_SCHED
struct task_group *tg;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index b5f41d4..f9477c3 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -13,6 +13,7 @@ struct user_namespace {
struct kref kref;
struct hlist_head uidhash_table[UIDHASH_SZ];
struct user_struct *root_user;
+ struct user_struct *creator;
};
extern struct user_namespace init_user_ns;
diff --git a/kernel/user.c b/kernel/user.c
index 865ecf5..cfe8309 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -22,6 +22,7 @@ struct user_namespace init_user_ns = {
.refcount = ATOMIC_INIT(2),
},
.root_user = &root_user,
+ .creator = &root_user,
};
EXPORT_SYMBOL_GPL(init_user_ns);
@@ -53,6 +54,7 @@ struct user_struct root_user = {
.files = ATOMIC_INIT(0),
.sigpending = ATOMIC_INIT(0),
.locked_shm = 0,
+ .user_namespace = &init_user_ns,
#ifdef CONFIG_USER_SCHED
.tg = &init_task_group,
#endif
@@ -321,6 +323,7 @@ done:
*/
static inline void free_user(struct user_struct *up, unsigned long flags)
{
+ put_user_ns(up->user_namespace);
/* restore back the count */
atomic_inc(&up->__count);
spin_unlock_irqrestore(&uidhash_lock, flags);
@@ -347,6 +350,7 @@ static inline void free_user(struct user_struct *up, unsigned long flags)
sched_destroy_user(up);
key_put(up->uid_keyring);
key_put(up->session_keyring);
+ put_user_ns(up->user_namespace);
kmem_cache_free(uid_cachep, up);
}
@@ -409,6 +413,8 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
if (sched_create_user(new) < 0)
goto out_free_user;
+ new->user_namespace = get_user_ns(ns);
+
if (uids_user_create(new))
goto out_destoy_sched;
@@ -441,6 +447,7 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
out_destoy_sched:
sched_destroy_user(new);
+ put_user_ns(new->user_namespace);
out_free_user:
kmem_cache_free(uid_cachep, new);
out_unlock:
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index a9ab059..e8db443 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -19,7 +19,6 @@
static struct user_namespace *clone_user_ns(struct user_namespace *old_ns)
{
struct user_namespace *ns;
- struct user_struct *new_user;
int n;
ns = kmalloc(sizeof(struct user_namespace), GFP_KERNEL);
@@ -38,15 +37,17 @@ static struct user_namespace *clone_user_ns(struct user_namespace *old_ns)
return ERR_PTR(-ENOMEM);
}
- /* Reset current->user with a new one */
- new_user = alloc_uid(ns, current->uid);
- if (!new_user) {
- free_uid(ns->root_user);
- kfree(ns);
- return ERR_PTR(-ENOMEM);
- }
+ /* pin the creating user */
+ ns->creator = current->user;
+ atomic_inc(&ns->creator->__count);
+
+ /*
+ * The alloc_uid() incremented the userns refcount,
+ * so drop it again
+ */
+ put_user_ns(ns);
- switch_uid(new_user);
+ switch_uid(ns->root_user);
return ns;
}
@@ -72,6 +73,7 @@ void free_user_ns(struct kref *kref)
ns = container_of(kref, struct user_namespace, kref);
release_uids(ns);
+ free_uid(ns->creator);
kfree(ns);
}
EXPORT_SYMBOL(free_user_ns);
--
1.5.4.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct
[not found] ` <20080726002700.GA29686-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-26 0:27 ` [PATCH 1/6] user namespaces: introduce user_struct->user_namespace relationship Serge E. Hallyn
@ 2008-07-26 0:27 ` Serge E. Hallyn
[not found] ` <20080726002735.GB29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-26 0:27 ` [PATCH 3/6] user namespaces: rig generic_permission for simple userns check Serge E. Hallyn
` (3 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-26 0:27 UTC (permalink / raw)
To: Linux Containers
From ec5f54faf5afd16cb6cef40ebaaf3da25989d185 Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Thu, 24 Jul 2008 17:52:41 -0500
Subject: [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct
When we get the sysfs support needed to support fair user scheduling
along with user namespaces, then we will need to be able to get the
user namespace from the user struct.
So we need the user_ns to be a part of struct user. Once we can
access it from tsk->user, we no longer have a use for
tsk->nsproxy->user_ns.
When a user_namespace is created, the user which created it is
marked as its 'creator'. The user_namespace pins the creator.
Each userid in a user_ns pins the user_ns. This keeps refcounting
nice and simple.
At the same time, this patch simplifies the refcounting. The current
user and userns locking works as follows:
The task pins the user struct.
The task pins the nsproxy, the nsproxy pins the user_ns.
When a new user_ns is created, it creates a root user for
it, and pins it. When the nsproxy releases the user_ns,
the userns tries to release all its user structs.
So you see that the refcounting "works" for now, but only
because the nsproxy (and therefore usr_ns) and user structs
will be freed at the same time - when the last task using
them is released.
Now we need to put the user_ns in the struct user. You can
see that will mess up the refcounting.
Fortunately, once the user_ns is available from tsk->user,
we don't need it in nsproxy.
So here is how the refcounting *should* be done:
The task pins the user struct.
The user struct pins its user namespace.
The user namespace pins the struct user which created it.
A user namespace now doesn't need to release its userids,
because it is only released when its last user disappears.
This patch makes those changes.
Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
include/linux/init_task.h | 1 -
include/linux/key.h | 3 ++
include/linux/nsproxy.h | 1 -
include/linux/sched.h | 3 +-
include/linux/user_namespace.h | 10 +++----
kernel/fork.c | 3 +-
kernel/nsproxy.c | 10 +------
kernel/sys.c | 4 +-
kernel/user.c | 52 +++++++++++----------------------------
kernel/user_namespace.c | 36 +++++++--------------------
security/keys/process_keys.c | 7 ++++-
11 files changed, 45 insertions(+), 85 deletions(-)
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 93c45ac..ce87b8a 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -57,7 +57,6 @@ extern struct nsproxy init_nsproxy;
.mnt_ns = NULL, \
INIT_NET_NS(net_ns) \
INIT_IPC_NS(ipc_ns) \
- .user_ns = &init_user_ns, \
}
#define INIT_SIGHAND(sighand) { \
diff --git a/include/linux/key.h b/include/linux/key.h
index c45c962..ba53aef 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -277,6 +277,8 @@ extern ctl_table key_sysctls[];
* the userspace interface
*/
extern void switch_uid_keyring(struct user_struct *new_user);
+extern void switch_uid_keyring_task(struct task_struct *tsk,
+ struct user_struct *new_user);
extern int copy_keys(unsigned long clone_flags, struct task_struct *tsk);
extern int copy_thread_group_keys(struct task_struct *tsk);
extern void exit_keys(struct task_struct *tsk);
@@ -305,6 +307,7 @@ extern void key_init(void);
#define key_ref_to_ptr(k) ({ NULL; })
#define is_key_possessed(k) 0
#define switch_uid_keyring(u) do { } while(0)
+#define switch_uid_keyring_task(t,u) do { } while(0)
#define __install_session_keyring(t, k) ({ NULL; })
#define copy_keys(f,t) 0
#define copy_thread_group_keys(t) 0
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 0e66b57..7c7fb20 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -27,7 +27,6 @@ struct nsproxy {
struct ipc_namespace *ipc_ns;
struct mnt_namespace *mnt_ns;
struct pid_namespace *pid_ns;
- struct user_namespace *user_ns;
struct net *net_ns;
};
extern struct nsproxy init_nsproxy;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index cf36e14..2c2f036 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -600,7 +600,7 @@ struct user_struct {
/* Hash table maintenance information */
struct hlist_node uidhash_node;
uid_t uid;
- struct user_namespace *user_namespace;
+ struct user_namespace *user_ns;
#ifdef CONFIG_USER_SCHED
struct task_group *tg;
@@ -1744,6 +1744,7 @@ static inline struct user_struct *get_uid(struct user_struct *u)
}
extern void free_uid(struct user_struct *);
extern void switch_uid(struct user_struct *);
+extern void task_switch_uid(struct task_struct *tsk, struct user_struct *);
extern void release_uids(struct user_namespace *ns);
#include <asm/current.h>
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index f9477c3..1b4959d 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,8 +27,7 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
return ns;
}
-extern struct user_namespace *copy_user_ns(int flags,
- struct user_namespace *old_ns);
+extern int create_new_userns(int flags, struct task_struct *tsk);
extern void free_user_ns(struct kref *kref);
static inline void put_user_ns(struct user_namespace *ns)
@@ -44,13 +43,12 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
return &init_user_ns;
}
-static inline struct user_namespace *copy_user_ns(int flags,
- struct user_namespace *old_ns)
+static inline int create_new_userns(int flags, struct task_struct *tsk)
{
if (flags & CLONE_NEWUSER)
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
- return old_ns;
+ return 0;
}
static inline void put_user_ns(struct user_namespace *ns)
diff --git a/kernel/fork.c b/kernel/fork.c
index adefc11..42cc45f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -917,8 +917,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
retval = -EAGAIN;
if (atomic_read(&p->user->processes) >=
p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
- if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
- p->user != current->nsproxy->user_ns->root_user)
+ if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE))
goto bad_fork_free;
}
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index adc7851..861d6f4 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -81,11 +81,9 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
goto out_pid;
}
- new_nsp->user_ns = copy_user_ns(flags, tsk->nsproxy->user_ns);
- if (IS_ERR(new_nsp->user_ns)) {
- err = PTR_ERR(new_nsp->user_ns);
+ err = create_new_userns(flags, tsk);
+ if (err)
goto out_user;
- }
new_nsp->net_ns = copy_net_ns(flags, tsk->nsproxy->net_ns);
if (IS_ERR(new_nsp->net_ns)) {
@@ -96,8 +94,6 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
return new_nsp;
out_net:
- if (new_nsp->user_ns)
- put_user_ns(new_nsp->user_ns);
out_user:
if (new_nsp->pid_ns)
put_pid_ns(new_nsp->pid_ns);
@@ -180,8 +176,6 @@ void free_nsproxy(struct nsproxy *ns)
put_ipc_ns(ns->ipc_ns);
if (ns->pid_ns)
put_pid_ns(ns->pid_ns);
- if (ns->user_ns)
- put_user_ns(ns->user_ns);
put_net(ns->net_ns);
kmem_cache_free(nsproxy_cachep, ns);
}
diff --git a/kernel/sys.c b/kernel/sys.c
index 14e9728..b8623be 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -568,13 +568,13 @@ static int set_user(uid_t new_ruid, int dumpclear)
{
struct user_struct *new_user;
- new_user = alloc_uid(current->nsproxy->user_ns, new_ruid);
+ new_user = alloc_uid(current->user->user_ns, new_ruid);
if (!new_user)
return -EAGAIN;
if (atomic_read(&new_user->processes) >=
current->signal->rlim[RLIMIT_NPROC].rlim_cur &&
- new_user != current->nsproxy->user_ns->root_user) {
+ !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) {
free_uid(new_user);
return -EAGAIN;
}
diff --git a/kernel/user.c b/kernel/user.c
index cfe8309..034dae5 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -54,7 +54,7 @@ struct user_struct root_user = {
.files = ATOMIC_INIT(0),
.sigpending = ATOMIC_INIT(0),
.locked_shm = 0,
- .user_namespace = &init_user_ns,
+ .user_ns = &init_user_ns,
#ifdef CONFIG_USER_SCHED
.tg = &init_task_group,
#endif
@@ -323,7 +323,7 @@ done:
*/
static inline void free_user(struct user_struct *up, unsigned long flags)
{
- put_user_ns(up->user_namespace);
+ put_user_ns(up->user_ns);
/* restore back the count */
atomic_inc(&up->__count);
spin_unlock_irqrestore(&uidhash_lock, flags);
@@ -350,7 +350,7 @@ static inline void free_user(struct user_struct *up, unsigned long flags)
sched_destroy_user(up);
key_put(up->uid_keyring);
key_put(up->session_keyring);
- put_user_ns(up->user_namespace);
+ put_user_ns(up->user_ns);
kmem_cache_free(uid_cachep, up);
}
@@ -366,7 +366,7 @@ struct user_struct *find_user(uid_t uid)
{
struct user_struct *ret;
unsigned long flags;
- struct user_namespace *ns = current->nsproxy->user_ns;
+ struct user_namespace *ns = current->user->user_ns;
spin_lock_irqsave(&uidhash_lock, flags);
ret = uid_hash_find(uid, uidhashentry(ns, uid));
@@ -413,7 +413,7 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
if (sched_create_user(new) < 0)
goto out_free_user;
- new->user_namespace = get_user_ns(ns);
+ new->user_ns = get_user_ns(ns);
if (uids_user_create(new))
goto out_destoy_sched;
@@ -447,7 +447,7 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
out_destoy_sched:
sched_destroy_user(new);
- put_user_ns(new->user_namespace);
+ put_user_ns(new->user_ns);
out_free_user:
kmem_cache_free(uid_cachep, new);
out_unlock:
@@ -455,7 +455,7 @@ out_unlock:
return NULL;
}
-void switch_uid(struct user_struct *new_user)
+void task_switch_uid(struct task_struct *tsk, struct user_struct *new_user)
{
struct user_struct *old_user;
@@ -464,12 +464,12 @@ void switch_uid(struct user_struct *new_user)
* cheaply with the new uid cache, so if it matters
* we should be checking for it. -DaveM
*/
- old_user = current->user;
+ old_user = tsk->user;
atomic_inc(&new_user->processes);
atomic_dec(&old_user->processes);
- switch_uid_keyring(new_user);
- current->user = new_user;
- sched_switch_user(current);
+ switch_uid_keyring_task(tsk, new_user);
+ tsk->user = new_user;
+ sched_switch_user(tsk);
/*
* We need to synchronize with __sigqueue_alloc()
@@ -479,38 +479,16 @@ void switch_uid(struct user_struct *new_user)
* structure.
*/
smp_mb();
- spin_unlock_wait(¤t->sighand->siglock);
+ spin_unlock_wait(&tsk->sighand->siglock);
free_uid(old_user);
- suid_keys(current);
+ suid_keys(tsk);
}
-#ifdef CONFIG_USER_NS
-void release_uids(struct user_namespace *ns)
+void switch_uid(struct user_struct *new_user)
{
- int i;
- unsigned long flags;
- struct hlist_head *head;
- struct hlist_node *nd;
-
- spin_lock_irqsave(&uidhash_lock, flags);
- /*
- * collapse the chains so that the user_struct-s will
- * be still alive, but not in hashes. subsequent free_uid()
- * will free them.
- */
- for (i = 0; i < UIDHASH_SZ; i++) {
- head = ns->uidhash_table + i;
- while (!hlist_empty(head)) {
- nd = head->first;
- hlist_del_init(nd);
- }
- }
- spin_unlock_irqrestore(&uidhash_lock, flags);
-
- free_uid(ns->root_user);
+ task_switch_uid(current, new_user);
}
-#endif
static int __init uid_cache_init(void)
{
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index e8db443..0045dd0 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -16,14 +16,17 @@
* @old_ns: namespace to clone
* Return NULL on error (failure to kmalloc), new ns otherwise
*/
-static struct user_namespace *clone_user_ns(struct user_namespace *old_ns)
+int create_new_userns(int flags, struct task_struct *tsk)
{
struct user_namespace *ns;
int n;
+ if (!(flags & CLONE_NEWUSER))
+ return 0;
+
ns = kmalloc(sizeof(struct user_namespace), GFP_KERNEL);
if (!ns)
- return ERR_PTR(-ENOMEM);
+ return -ENOMEM;
kref_init(&ns->kref);
@@ -34,37 +37,19 @@ static struct user_namespace *clone_user_ns(struct user_namespace *old_ns)
ns->root_user = alloc_uid(ns, 0);
if (!ns->root_user) {
kfree(ns);
- return ERR_PTR(-ENOMEM);
+ return -ENOMEM;
}
/* pin the creating user */
- ns->creator = current->user;
+ ns->creator = tsk->user;
atomic_inc(&ns->creator->__count);
- /*
- * The alloc_uid() incremented the userns refcount,
- * so drop it again
- */
+ /* alloc_uid() incremented the userns refcount. drop it again */
put_user_ns(ns);
- switch_uid(ns->root_user);
- return ns;
-}
-
-struct user_namespace * copy_user_ns(int flags, struct user_namespace *old_ns)
-{
- struct user_namespace *new_ns;
-
- BUG_ON(!old_ns);
- get_user_ns(old_ns);
-
- if (!(flags & CLONE_NEWUSER))
- return old_ns;
-
- new_ns = clone_user_ns(old_ns);
+ task_switch_uid(tsk, ns->root_user);
- put_user_ns(old_ns);
- return new_ns;
+ return 0;
}
void free_user_ns(struct kref *kref)
@@ -72,7 +57,6 @@ void free_user_ns(struct kref *kref)
struct user_namespace *ns;
ns = container_of(kref, struct user_namespace, kref);
- release_uids(ns);
free_uid(ns->creator);
kfree(ns);
}
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 5be6d01..8d1cfb2 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -119,7 +119,8 @@ error:
/*
* deal with the UID changing
*/
-void switch_uid_keyring(struct user_struct *new_user)
+void switch_uid_keyring_task(struct task_struct *task,
+ struct user_struct *new_user)
{
#if 0 /* do nothing for now */
struct key *old;
@@ -142,6 +143,10 @@ void switch_uid_keyring(struct user_struct *new_user)
} /* end switch_uid_keyring() */
+void switch_uid_keyring(struct user_struct *new_user)
+{
+ switch_uid_keyring_task(current, new_user);
+}
/*****************************************************************************/
/*
* install a fresh thread keyring, discarding the old one
--
1.5.4.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/6] user namespaces: rig generic_permission for simple userns check
[not found] ` <20080726002700.GA29686-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-26 0:27 ` [PATCH 1/6] user namespaces: introduce user_struct->user_namespace relationship Serge E. Hallyn
2008-07-26 0:27 ` [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct Serge E. Hallyn
@ 2008-07-26 0:27 ` Serge E. Hallyn
2008-07-26 0:27 ` [PATCH 4/6] user namespaces: add user_ns to super block Serge E. Hallyn
` (2 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-26 0:27 UTC (permalink / raw)
To: Linux Containers
From f6d09b06a1106936010bffd420267f5b7ee66238 Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Wed, 23 Jul 2008 17:01:09 -0500
Subject: [PATCH 3/6] user namespaces: rig generic_permission for simple userns check
Filesystems can provide their own permission() functions to do
advanced inter-user_namespace userid equivalence checks.
For those filesystems which do not support that, we will do
a simple check that current's user namespace is equivalent to
the user_namespace which mounted the filesystem. If it is
not equivalent, then the task can only have user nobody (that
is, the 'other') permissions to a file.
For now, we actually just compare the user's user_ns to the
init_user_ns. Next we will set the sb->user_ns to that of
the task mounting a filesystem, and use inode->i_sb->user_ns
instead of init_user_ns. By punting even on that, the
implications, and therefore (in)correctness of this patch should
be all the easier to verify.
Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
fs/namei.c | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 01e67dd..d5336fd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -31,6 +31,7 @@
#include <linux/file.h>
#include <linux/fcntl.h>
#include <linux/device_cgroup.h>
+#include <linux/nsproxy.h>
#include <asm/namei.h>
#include <asm/uaccess.h>
@@ -168,7 +169,7 @@ void putname(const char *name)
EXPORT_SYMBOL(putname);
#endif
-
+extern struct user_namespace init_user_ns;
/**
* generic_permission - check for access rights on a Posix-like filesystem
* @inode: inode to check access rights for
@@ -184,7 +185,15 @@ int generic_permission(struct inode *inode, int mask,
int (*check_acl)(struct inode *inode, int mask))
{
umode_t mode = inode->i_mode;
+ int same_userns = (current->user->user_ns == &init_user_ns);
+ /*
+ * If we're not in the inode's user namespace, we get
+ * user nobody permissions, and we ignore acls
+ * (bc serge doesn't know how to handle acls in this case)
+ */
+ if (!same_userns)
+ goto check;
if (current->fsuid == inode->i_uid)
mode >>= 6;
else {
@@ -200,11 +209,14 @@ int generic_permission(struct inode *inode, int mask,
mode >>= 3;
}
+check:
/*
* If the DACs are ok we don't need any capability check.
*/
if (((mode & mask & (MAY_READ|MAY_WRITE|MAY_EXEC)) == mask))
return 0;
+ if (!same_userns)
+ return -EACCES;
check_capabilities:
/*
--
1.5.4.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <20080726002700.GA29686-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
` (2 preceding siblings ...)
2008-07-26 0:27 ` [PATCH 3/6] user namespaces: rig generic_permission for simple userns check Serge E. Hallyn
@ 2008-07-26 0:27 ` Serge E. Hallyn
[not found] ` <20080726002754.GD29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-26 0:28 ` [PATCH 5/6] user namespaces: refuse create in other user_ns Serge E. Hallyn
2008-07-26 0:28 ` [PATCH 6/6] user_namespace: move put_user_ns outside lock Serge E. Hallyn
5 siblings, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-26 0:27 UTC (permalink / raw)
To: Linux Containers
From 420d6e81ce29d7a6fe3ab7b43c1171e105f8b697 Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Thu, 24 Jul 2008 18:00:54 -0500
Subject: [PATCH 4/6] user namespaces: add user_ns to super block
Add a user_ns to the super_block, and set it to the user_ns of
the process which mounted the fs.
In generic_permission() compare the current user_ns to that
of the user_ns which mounted the inode's filesystem.
Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
fs/namei.c | 6 ++++--
fs/super.c | 3 +++
include/linux/fs.h | 5 +++++
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index d5336fd..adf5f1b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -169,7 +169,6 @@ void putname(const char *name)
EXPORT_SYMBOL(putname);
#endif
-extern struct user_namespace init_user_ns;
/**
* generic_permission - check for access rights on a Posix-like filesystem
* @inode: inode to check access rights for
@@ -185,7 +184,10 @@ int generic_permission(struct inode *inode, int mask,
int (*check_acl)(struct inode *inode, int mask))
{
umode_t mode = inode->i_mode;
- int same_userns = (current->user->user_ns == &init_user_ns);
+ int same_userns = 1;
+
+ if (current->user->user_ns != inode->i_sb->user_ns)
+ same_userns = 0;
/*
* If we're not in the inode's user namespace, we get
diff --git a/fs/super.c b/fs/super.c
index 453877c..12865ac 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -39,6 +39,7 @@
#include <linux/mutex.h>
#include <linux/file.h>
#include <asm/uaccess.h>
+#include <linux/user_namespace.h>
#include "internal.h"
@@ -92,6 +93,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
s->s_qcop = sb_quotactl_ops;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
+ s->user_ns = get_user_ns(current->user->user_ns);
}
out:
return s;
@@ -108,6 +110,7 @@ static inline void destroy_super(struct super_block *s)
security_sb_free(s);
kfree(s->s_subtype);
kfree(s->s_options);
+ put_user_ns(s->user_ns);
kfree(s);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9c2ac5c..68066a4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1092,6 +1092,11 @@ struct super_block {
* generic_show_options()
*/
char *s_options;
+
+ /*
+ * namespace of the user which mounted the sb
+ */
+ struct user_namespace *user_ns;
};
extern struct timespec current_fs_time(struct super_block *sb);
--
1.5.4.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/6] user namespaces: refuse create in other user_ns
[not found] ` <20080726002700.GA29686-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
` (3 preceding siblings ...)
2008-07-26 0:27 ` [PATCH 4/6] user namespaces: add user_ns to super block Serge E. Hallyn
@ 2008-07-26 0:28 ` Serge E. Hallyn
2008-07-26 0:28 ` [PATCH 6/6] user_namespace: move put_user_ns outside lock Serge E. Hallyn
5 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-26 0:28 UTC (permalink / raw)
To: Linux Containers
From 4d2c23452a67e25856893ab16fefd0f6e5aa58df Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Thu, 24 Jul 2008 06:37:43 -0500
Subject: [PATCH 5/6] user namespaces: refuse create in other user_ns
Refuse writing to a directory in another user_ns. We can't
support file creation because we wouldn't know who should own
the file. This refuses file deletion as well - which I think
is the sensible thing to do.
File writing is still allowed if the 'user other' permissions
include write. That again probably makes sense for logging
and such.
Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
fs/namei.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index adf5f1b..b39a990 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -213,6 +213,12 @@ int generic_permission(struct inode *inode, int mask,
check:
/*
+ * Can't write to a directory in another user_ns
+ * We wouldn't know who to make the owner!
+ */
+ if (!same_userns && S_ISDIR(inode->i_mode) && (mask&MAY_WRITE))
+ return -EACCES;
+ /*
* If the DACs are ok we don't need any capability check.
*/
if (((mode & mask & (MAY_READ|MAY_WRITE|MAY_EXEC)) == mask))
--
1.5.4.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 6/6] user_namespace: move put_user_ns outside lock
[not found] ` <20080726002700.GA29686-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
` (4 preceding siblings ...)
2008-07-26 0:28 ` [PATCH 5/6] user namespaces: refuse create in other user_ns Serge E. Hallyn
@ 2008-07-26 0:28 ` Serge E. Hallyn
5 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-26 0:28 UTC (permalink / raw)
To: Linux Containers
Oops, this one was supposed to get folded into the first patch,
obviously. Sorry.
-serge
From daa76939c72adf146ded8a39e0211886e2bc943a Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Fri, 25 Jul 2008 14:24:32 -0500
Subject: [PATCH 6/6] user_namespace: move put_user_ns outside lock
Since put_user_ns calls free_user on the creator, we shouldn't
call it from inside the lock in free_user.
Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
kernel/user.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/user.c b/kernel/user.c
index 034dae5..d3dd353 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -323,11 +323,11 @@ done:
*/
static inline void free_user(struct user_struct *up, unsigned long flags)
{
- put_user_ns(up->user_ns);
/* restore back the count */
atomic_inc(&up->__count);
spin_unlock_irqrestore(&uidhash_lock, flags);
+ put_user_ns(up->user_ns);
INIT_WORK(&up->work, remove_user_sysfs_dir);
schedule_work(&up->work);
}
--
1.5.4.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Devel] [PATCH 1/6] user namespaces: introduce user_struct->user_namespace relationship
[not found] ` <20080726002725.GA29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-07-26 2:07 ` Alexey Dobriyan
[not found] ` <20080726020731.GA5115-QDJVlCTZ4KWTKS93B3g+7KFoa47nwP16@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Alexey Dobriyan @ 2008-07-26 2:07 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: Linux Containers
On Fri, Jul 25, 2008 at 07:27:25PM -0500, Serge E. Hallyn wrote:
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -600,6 +600,7 @@ struct user_struct {
> /* Hash table maintenance information */
> struct hlist_node uidhash_node;
> uid_t uid;
> + struct user_namespace *user_namespace;
Call it "user_ns" please, and file "user_ns.c".
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Devel] [PATCH 1/6] user namespaces: introduce user_struct->user_namespace relationship
[not found] ` <20080726020731.GA5115-QDJVlCTZ4KWTKS93B3g+7KFoa47nwP16@public.gmane.org>
@ 2008-07-26 3:31 ` Serge E. Hallyn
0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-26 3:31 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: Linux Containers
Quoting Alexey Dobriyan (adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> On Fri, Jul 25, 2008 at 07:27:25PM -0500, Serge E. Hallyn wrote:
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -600,6 +600,7 @@ struct user_struct {
> > /* Hash table maintenance information */
> > struct hlist_node uidhash_node;
> > uid_t uid;
> > + struct user_namespace *user_namespace;
>
> Call it "user_ns" please,
Odd, I had the same thought, but seem to have applied at the wrong
patch (please see patch 2).
Thanks.
> and file "user_ns.c".
I'm ok with that, but that file has existed for quite some time
by now, so I'm not introducing it.
If noone objects, I'll rename the file (as well as
include/linux/user_namespace.h).
thanks,
-serge
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct
[not found] ` <20080726002735.GB29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-07-28 21:41 ` Eric W. Biederman
[not found] ` <m1k5f5it4i.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2008-07-28 21:41 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: Linux Containers
"Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
>>From ec5f54faf5afd16cb6cef40ebaaf3da25989d185 Mon Sep 17 00:00:00 2001
> From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Date: Thu, 24 Jul 2008 17:52:41 -0500
> Subject: [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct
>
> When we get the sysfs support needed to support fair user scheduling
> along with user namespaces, then we will need to be able to get the
> user namespace from the user struct.
>
> So we need the user_ns to be a part of struct user. Once we can
> access it from tsk->user, we no longer have a use for
> tsk->nsproxy->user_ns.
Is this true? Even in the general case of supporting setuid and setgid
and everything else that potentially is in the user namespace?
I certainly support the cleanups you have made for the reasons you describe.
I think however that there is there are no technical reasons not to have
nsproxy->user_ns after the changes have been made. I also agree that
there are no technical reasons for keeping nsproxy->user_ns at the moment.
> When a user_namespace is created, the user which created it is
> marked as its 'creator'. The user_namespace pins the creator.
> Each userid in a user_ns pins the user_ns. This keeps refcounting
> nice and simple.
Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <20080726002754.GD29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-07-28 21:53 ` Eric W. Biederman
[not found] ` <m13altislf.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2008-07-28 21:53 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: Linux Containers
"Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
>>From 420d6e81ce29d7a6fe3ab7b43c1171e105f8b697 Mon Sep 17 00:00:00 2001
> From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Date: Thu, 24 Jul 2008 18:00:54 -0500
> Subject: [PATCH 4/6] user namespaces: add user_ns to super block
>
> Add a user_ns to the super_block, and set it to the user_ns of
> the process which mounted the fs.
>
> In generic_permission() compare the current user_ns to that
> of the user_ns which mounted the inode's filesystem.
I don't think this is the right approach.
When we had the conversation earlier this was conceptually rejected
as it prevents nfs superblock unification.
We really want to store this in the vfsmount and pass the user namespace down
from there to where we are going to use it if at all possible.
The vfsmount also appears necessary if we are ever going to support multiple
user namespaces per filesystem as the filesystem still need to know which
user namespace to interpret it's data in.
Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <m13altislf.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
@ 2008-07-28 22:47 ` Matt Helsley
[not found] ` <1217285230.25300.19.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Matt Helsley @ 2008-07-28 22:47 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux Containers
On Mon, 2008-07-28 at 14:53 -0700, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
>
> >>From 420d6e81ce29d7a6fe3ab7b43c1171e105f8b697 Mon Sep 17 00:00:00 2001
> > From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > Date: Thu, 24 Jul 2008 18:00:54 -0500
> > Subject: [PATCH 4/6] user namespaces: add user_ns to super block
> >
> > Add a user_ns to the super_block, and set it to the user_ns of
> > the process which mounted the fs.
> >
> > In generic_permission() compare the current user_ns to that
> > of the user_ns which mounted the inode's filesystem.
>
> I don't think this is the right approach.
>
> When we had the conversation earlier this was conceptually rejected
> as it prevents nfs superblock unification.
>
> We really want to store this in the vfsmount and pass the user namespace down
> from there to where we are going to use it if at all possible.
>
> The vfsmount also appears necessary if we are ever going to support multiple
> user namespaces per filesystem as the filesystem still need to know which
> user namespace to interpret it's data in.
Would this require passing the vfsmount to the filesystems themselves,
or would they be within the VFS code only? If not wholly within the VFS
I wonder if Al Viro would object to this. He's resisted past attempts to
pass the vfsmount structs into more filesystem code paths and I'm
guessing that could affect whether or not this approach can be
implemented.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <1217285230.25300.19.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2008-07-28 23:03 ` Eric W. Biederman
[not found] ` <m1skttehm6.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-07-29 18:05 ` Serge E. Hallyn
1 sibling, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2008-07-28 23:03 UTC (permalink / raw)
To: Matt Helsley; +Cc: Linux Containers
Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
> Would this require passing the vfsmount to the filesystems themselves,
> or would they be within the VFS code only?
The interesting bit is the user_namespace contained in the vfsmount. We
can pass that down. I think semantically it makes sense for a filesystem
mount to only operate in a single mount namespace.
> If not wholly within the VFS
> I wonder if Al Viro would object to this. He's resisted past attempts to
> pass the vfsmount structs into more filesystem code paths and I'm
> guessing that could affect whether or not this approach can be
> implemented.
Dave Hansen raised that concern when we were talking about it earlier. Since
we just care about a property of the mount it isn't a big deal.
Actually thinking about this a little farther it may be simplest to have the
mnt_namespace capture the user_namespace, although that doesn't seem to map
semantically very well with cloning of the filesystem.
This is very much a question of how do we map the uid/gids store in the filesystem
into the uids/gids in the kernel. Which user namespace do they belong in.
Especially in the case of read only mounts we can safely share a filesystem between
user_namespaces with no changes to the filesystem. Which I suspect is the
first case we want to allow as that is a tremendous savings in space if you have
lots of instances of the same distro, and people have been doing it with /usr
for years.
Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct
[not found] ` <m1k5f5it4i.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
@ 2008-07-29 17:59 ` Serge E. Hallyn
0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-29 17:59 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux Containers
Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
>
> >>From ec5f54faf5afd16cb6cef40ebaaf3da25989d185 Mon Sep 17 00:00:00 2001
> > From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > Date: Thu, 24 Jul 2008 17:52:41 -0500
> > Subject: [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct
> >
> > When we get the sysfs support needed to support fair user scheduling
> > along with user namespaces, then we will need to be able to get the
> > user namespace from the user struct.
> >
> > So we need the user_ns to be a part of struct user. Once we can
> > access it from tsk->user, we no longer have a use for
> > tsk->nsproxy->user_ns.
>
> Is this true? Even in the general case of supporting setuid and setgid
> and everything else that potentially is in the user namespace?
Sure. At any time we can get tsk->user->user_ns, and from that we can
get tsk->user->user_ns->creator.
> I certainly support the cleanups you have made for the reasons you describe.
> I think however that there is there are no technical reasons not to have
> nsproxy->user_ns after the changes have been made.
Well I ended up tossing it to clarify my thinking about the refcounting.
With that done, I think we could safely not have nsproxy pin the
user_ns, trusting the tsk->user to pin it instead.
But keeping it in two places just seems needlessly complicated.
> I also agree that
> there are no technical reasons for keeping nsproxy->user_ns at the moment.
What is your preference?
> > When a user_namespace is created, the user which created it is
> > marked as its 'creator'. The user_namespace pins the creator.
> > Each userid in a user_ns pins the user_ns. This keeps refcounting
> > nice and simple.
>
> Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <1217285230.25300.19.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-07-28 23:03 ` Eric W. Biederman
@ 2008-07-29 18:05 ` Serge E. Hallyn
[not found] ` <20080729180515.GB365-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
1 sibling, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-29 18:05 UTC (permalink / raw)
To: Matt Helsley; +Cc: Linux Containers, Eric W. Biederman
Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
>
> On Mon, 2008-07-28 at 14:53 -0700, Eric W. Biederman wrote:
> > "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
> >
> > >>From 420d6e81ce29d7a6fe3ab7b43c1171e105f8b697 Mon Sep 17 00:00:00 2001
> > > From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > > Date: Thu, 24 Jul 2008 18:00:54 -0500
> > > Subject: [PATCH 4/6] user namespaces: add user_ns to super block
> > >
> > > Add a user_ns to the super_block, and set it to the user_ns of
> > > the process which mounted the fs.
> > >
> > > In generic_permission() compare the current user_ns to that
> > > of the user_ns which mounted the inode's filesystem.
> >
> > I don't think this is the right approach.
> >
> > When we had the conversation earlier this was conceptually rejected
> > as it prevents nfs superblock unification.
> >
> > We really want to store this in the vfsmount and pass the user namespace down
> > from there to where we are going to use it if at all possible.
> >
> > The vfsmount also appears necessary if we are ever going to support multiple
> > user namespaces per filesystem as the filesystem still need to know which
> > user namespace to interpret it's data in.
The filesystem can figure that out based on current's context, no?
With the per-sb user_ns, the default behavior is indeed very limited,
but since you want to move all the user_ns functionality into the
filesystem, the fs can tag vfsmounts based on the "new remount" you
had talked about before.
> Would this require passing the vfsmount to the filesystems themselves,
> or would they be within the VFS code only? If not wholly within the VFS
> I wonder if Al Viro would object to this. He's resisted past attempts to
> pass the vfsmount structs into more filesystem code paths and I'm
> guessing that could affect whether or not this approach can be
> implemented.
Right, that's the main reason we might want to pursue the per-sb
approach. Otherwise I would prefer the per-vfsmount approach.
Eric, if you think the per-vfsmount fight is worth fighting, then by all
means let's do it and see what happens. So in that case ignore patches
3-5 from this set :)
-serge
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <m1skttehm6.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
@ 2008-07-29 18:09 ` Serge E. Hallyn
0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-07-29 18:09 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux Containers
Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
>
> > Would this require passing the vfsmount to the filesystems themselves,
> > or would they be within the VFS code only?
>
> The interesting bit is the user_namespace contained in the vfsmount. We
> can pass that down. I think semantically it makes sense for a filesystem
> mount to only operate in a single mount namespace.
>
> > If not wholly within the VFS
> > I wonder if Al Viro would object to this. He's resisted past attempts to
> > pass the vfsmount structs into more filesystem code paths and I'm
> > guessing that could affect whether or not this approach can be
> > implemented.
>
> Dave Hansen raised that concern when we were talking about it earlier. Since
> we just care about a property of the mount it isn't a big deal.
>
> Actually thinking about this a little farther it may be simplest to have the
> mnt_namespace capture the user_namespace, although that doesn't seem to map
> semantically very well with cloning of the filesystem.
Interesting idea. I'm going to pursue that.
So at a do_new_mount(), mnt->user_ns = current->user_ns. At
do_loopback(), we ask the fs whether the new_mnt->user_ns can be set to
current->user_ns. If not, it keeps the original, meaning that current
will always receive user nobody access to the fs. Otherwise, the
fs is saying that it knows how to properly convert userids from
current->user->user_ns to ones which make sense in the
original_mnt->user_ns.
> This is very much a question of how do we map the uid/gids store in the filesystem
> into the uids/gids in the kernel. Which user namespace do they belong in.
>
> Especially in the case of read only mounts we can safely share a filesystem between
> user_namespaces with no changes to the filesystem. Which I suspect is the
> first case we want to allow as that is a tremendous savings in space if you have
> lots of instances of the same distro, and people have been doing it with /usr
> for years.
>
> Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <20080729180515.GB365-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-07-29 19:22 ` Eric W. Biederman
[not found] ` <m13alscx7e.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2008-07-29 19:22 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: Linux Containers
"Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
>
> The filesystem can figure that out based on current's context, no?
>
> With the per-sb user_ns, the default behavior is indeed very limited,
> but since you want to move all the user_ns functionality into the
> filesystem, the fs can tag vfsmounts based on the "new remount" you
> had talked about before.
I guess I want the filesystem to coordinate.
>> Would this require passing the vfsmount to the filesystems themselves,
>> or would they be within the VFS code only? If not wholly within the VFS
>> I wonder if Al Viro would object to this. He's resisted past attempts to
>> pass the vfsmount structs into more filesystem code paths and I'm
>> guessing that could affect whether or not this approach can be
>> implemented.
>
> Right, that's the main reason we might want to pursue the per-sb
> approach. Otherwise I would prefer the per-vfsmount approach.
>
> Eric, if you think the per-vfsmount fight is worth fighting, then by all
> means let's do it and see what happens. So in that case ignore patches
> 3-5 from this set :)
My intuitive sense is that the treating the handling of different
user namespaces in the same filesystem is a trivial case of the
superblock merging that nfs performs. And that we will preserve
existing semantics much better if the user namespace is stored
in the vfsmount. This allows mount propagation and friends to work
without surprises.
The practical limitation I see of storing things outside of the
vfsmount is when do you setup the mapping to handle a new user
namespace?
So yes. I think it is worth the discussion. Let's not
move the vfsmount down, and just move the user namespace pointer
down as that is fundamentally what we care about.
Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <m13alscx7e.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
@ 2008-08-02 0:06 ` Serge E. Hallyn
[not found] ` <20080802000609.GA10211-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-08-02 0:06 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux Containers
Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
>
> >
> > The filesystem can figure that out based on current's context, no?
> >
> > With the per-sb user_ns, the default behavior is indeed very limited,
> > but since you want to move all the user_ns functionality into the
> > filesystem, the fs can tag vfsmounts based on the "new remount" you
> > had talked about before.
>
> I guess I want the filesystem to coordinate.
>
> >> Would this require passing the vfsmount to the filesystems themselves,
> >> or would they be within the VFS code only? If not wholly within the VFS
> >> I wonder if Al Viro would object to this. He's resisted past attempts to
> >> pass the vfsmount structs into more filesystem code paths and I'm
> >> guessing that could affect whether or not this approach can be
> >> implemented.
> >
> > Right, that's the main reason we might want to pursue the per-sb
> > approach. Otherwise I would prefer the per-vfsmount approach.
> >
> > Eric, if you think the per-vfsmount fight is worth fighting, then by all
> > means let's do it and see what happens. So in that case ignore patches
> > 3-5 from this set :)
>
> My intuitive sense is that the treating the handling of different
> user namespaces in the same filesystem is a trivial case of the
> superblock merging that nfs performs. And that we will preserve
> existing semantics much better if the user namespace is stored
> in the vfsmount. This allows mount propagation and friends to work
> without surprises.
>
> The practical limitation I see of storing things outside of the
> vfsmount is when do you setup the mapping to handle a new user
> namespace?
>
> So yes. I think it is worth the discussion. Let's not
> move the vfsmount down, and just move the user namespace pointer
> down as that is fundamentally what we care about.
>
> Eric
Ok I wasn't thinking right. We still can't get to a user_ns from
an inode *.
So playing with this a bit tonight, it seems like the best way
to pass the user_namespace up to the fs is just to define new
super_operations which handle the conversions. Something like
the following.
(This sits on top of the two patches I'm about to send out as
replacements for patches 1 and 2 from my previous posting.)
-serge
From 0156ae52302a70302b1725450fae7f565195410a Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Fri, 1 Aug 2008 18:55:59 -0500
Subject: [PATCH 3/3] user namespaces: enforce user namespaces for file permission
Add a user_ns to the sb. It is always set to the user_ns of the task which
mounted the sb.
Define 3 new super_operations. convert_uid() and convert_gid() take a uid
or gid from an inode on the sb's fs, and attempt to convert them into ids
meaningful in the user namespace passed in, which presumably is current's.
is_capable() checks whether current has the requested capability with respect
to the sb's fs, which is dependent upon the relationship between current's
user_ns and those which the sb knows about.
If the sb isn't user ns aware - which none are right now - then the new
super_operations won't be defined. If in that case current and sb have
different user namespaces, then the userids can't be compared.
If current's and sb's userids can't be compared, then current will get
'user other' (we usually say 'nobody') permissions to the inode.
Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
fs/namei.c | 33 ++++++++++++++++++++++---
fs/super.c | 3 ++
include/linux/fs.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 100 insertions(+), 4 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 4ea63ed..e17bc96 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -183,10 +183,32 @@ int generic_permission(struct inode *inode, int mask,
int (*check_acl)(struct inode *inode, int mask))
{
umode_t mode = inode->i_mode;
+ uid_t iuid;
+ gid_t igid;
+ int ret;
+ int good_userns = 1;
+ struct super_block *sb = inode->i_sb;
+
+ ret = s_convert_uid(sb, current->user->user_ns,
+ inode->i_uid, &iuid);
+ if (!ret)
+ good_userns = 0;
+ ret = s_convert_gid(sb, current->user->user_ns,
+ inode->i_gid, &igid);
+ if (!ret)
+ good_userns = 0;
mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
- if (current->fsuid == inode->i_uid)
+ /*
+ * If we're not in the inode's user namespace, we get
+ * user nobody permissions, and we ignore acls
+ * (bc serge doesn't know how to handle acls in this case)
+ */
+ if (!good_userns)
+ goto check;
+
+ if (current->fsuid == iuid)
mode >>= 6;
else {
if (IS_POSIXACL(inode) && (mode & S_IRWXG) && check_acl) {
@@ -197,15 +219,18 @@ int generic_permission(struct inode *inode, int mask,
return error;
}
- if (in_group_p(inode->i_gid))
+ if (in_group_p(igid))
mode >>= 3;
}
+check:
/*
* If the DACs are ok we don't need any capability check.
*/
if ((mask & ~mode) == 0)
return 0;
+ if (!good_userns)
+ return -EACCES;
check_capabilities:
/*
@@ -214,14 +239,14 @@ int generic_permission(struct inode *inode, int mask,
*/
if (!(mask & MAY_EXEC) ||
(inode->i_mode & S_IXUGO) || S_ISDIR(inode->i_mode))
- if (capable(CAP_DAC_OVERRIDE))
+ if (s_is_capable(sb, current->user->user_ns, CAP_DAC_OVERRIDE))
return 0;
/*
* Searching includes executable on directories, else just read.
*/
if (mask == MAY_READ || (S_ISDIR(inode->i_mode) && !(mask & MAY_WRITE)))
- if (capable(CAP_DAC_READ_SEARCH))
+ if (s_is_capable(sb, current->user->user_ns, CAP_DAC_READ_SEARCH))
return 0;
return -EACCES;
diff --git a/fs/super.c b/fs/super.c
index e931ae9..3559637 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -38,6 +38,7 @@
#include <linux/kobject.h>
#include <linux/mutex.h>
#include <linux/file.h>
+#include <linux/user_namespace.h>
#include <asm/uaccess.h>
#include "internal.h"
@@ -93,6 +94,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
s->s_qcop = sb_quotactl_ops;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
+ s->user_ns = get_user_ns(current->user->user_ns);
}
out:
return s;
@@ -109,6 +111,7 @@ static inline void destroy_super(struct super_block *s)
security_sb_free(s);
kfree(s->s_subtype);
kfree(s->s_options);
+ put_user_ns(s->user_ns);
kfree(s);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 580b513..a7cbcea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1128,6 +1128,11 @@ struct super_block {
* generic_show_options()
*/
char *s_options;
+
+ /*
+ * namespace of the user which mounted the sb
+ */
+ struct user_namespace *user_ns;
};
extern struct timespec current_fs_time(struct super_block *sb);
@@ -1320,6 +1325,9 @@ struct super_operations {
int (*remount_fs) (struct super_block *, int *, char *);
void (*clear_inode) (struct inode *);
void (*umount_begin) (struct super_block *);
+ int (*is_capable) (struct user_namespace *, int);
+ uid_t (*convert_uid)(struct user_namespace *, uid_t, uid_t *);
+ gid_t (*convert_gid)(struct user_namespace *, gid_t, gid_t *);
int (*show_options)(struct seq_file *, struct vfsmount *);
int (*show_stats)(struct seq_file *, struct vfsmount *);
@@ -1330,6 +1338,66 @@ struct super_operations {
};
/*
+ * In the next 3, i'm passing the user_ns in so that I don't need
+ * to know how to dereference struct user her (which would require
+ * #including sched.h).
+ * Note in particular that for instance in s_convert_uid, the uid is the
+ * inode->i_uid, while user_ns is current->user->user_ns.
+ */
+
+/*
+ * s_convert_uid: attempt to translate the inode's stored
+ * uid to a uid meaningful in user_ns passed in
+ * If possible, store the result in reuid and return 1
+ * Otherwise, return 0, meaningful the uid cannot be translated
+ */
+static inline int s_convert_uid(struct super_block *sb,
+ struct user_namespace *user_ns, uid_t uid, uid_t *retuid)
+{
+ if (sb->user_ns == user_ns) {
+ *retuid = uid;
+ return 1;
+ }
+ if (sb->s_op->convert_uid)
+ return sb->s_op->convert_uid(user_ns, uid, retuid);
+ return 0;
+}
+
+/*
+ * s_convert_gid: attempt to translate the inode's stored
+ * gid to a gid meaningful in user_ns passed in
+ * If possible, store the result in regid and return 1
+ * Otherwise, return 0, meaningful the gid cannot be translated
+ */
+static inline int s_convert_gid(struct super_block *sb,
+ struct user_namespace *user_ns, gid_t gid, gid_t *retgid)
+{
+ if (sb->user_ns == user_ns) {
+ *retgid = gid;
+ return 1;
+ }
+ if (sb->s_op->convert_gid)
+ return sb->s_op->convert_gid(user_ns, gid, retgid);
+ return 0;
+}
+
+/*
+ * s_is_capable: check whether current is capable with respect
+ * to the filesystem represented by sb.
+ *
+ * return 0 if false, 1 if true
+ */
+static inline int s_is_capable(struct super_block *sb,
+ struct user_namespace *user_ns, int cap)
+{
+ if (sb->user_ns == user_ns)
+ return capable(cap);
+ if (sb->s_op->is_capable)
+ return sb->s_op->is_capable(user_ns, cap);
+ return 0;
+}
+
+/*
* Inode state bits. Protected by inode_lock.
*
* Three bits determine the dirty state of the inode, I_DIRTY_SYNC,
--
1.5.4.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <20080802000609.GA10211-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-08-02 1:49 ` Eric W. Biederman
[not found] ` <m1wsj0i3td.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2008-08-02 1:49 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: Linux Containers, Eric W. Biederman
"Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
> Ok I wasn't thinking right. We still can't get to a user_ns from
> an inode *.
>
> So playing with this a bit tonight, it seems like the best way
> to pass the user_namespace up to the fs is just to define new
> super_operations which handle the conversions. Something like
> the following.
I'm pretty certain you want to pass in the entire inode instead
of just inode->i_uid. That would allow you to pass one less argument
and give the filesystems more information to work with.
Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] user namespaces: add user_ns to super block
[not found] ` <m1wsj0i3td.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
@ 2008-08-03 0:37 ` Serge E. Hallyn
0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-08-03 0:37 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux Containers
Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
>
> > Ok I wasn't thinking right. We still can't get to a user_ns from
> > an inode *.
> >
> > So playing with this a bit tonight, it seems like the best way
> > to pass the user_namespace up to the fs is just to define new
> > super_operations which handle the conversions. Something like
> > the following.
>
> I'm pretty certain you want to pass in the entire inode instead
> of just inode->i_uid. That would allow you to pass one less argument
> and give the filesystems more information to work with.
>
> Eric
Oh yes of course, the fs will want the whole inode. Will change that.
And I suppose in that case I may as well have just one function
to convert both uid and gid.
Thanks.
-serge
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-08-03 0:37 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-26 0:27 [PATCH 0/6] user namespaces: introduction Serge E. Hallyn
[not found] ` <20080726002700.GA29686-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-26 0:27 ` [PATCH 1/6] user namespaces: introduce user_struct->user_namespace relationship Serge E. Hallyn
[not found] ` <20080726002725.GA29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-26 2:07 ` [Devel] " Alexey Dobriyan
[not found] ` <20080726020731.GA5115-QDJVlCTZ4KWTKS93B3g+7KFoa47nwP16@public.gmane.org>
2008-07-26 3:31 ` Serge E. Hallyn
2008-07-26 0:27 ` [PATCH 2/6] user namespaces: move user_ns from nsproxy into user struct Serge E. Hallyn
[not found] ` <20080726002735.GB29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-28 21:41 ` Eric W. Biederman
[not found] ` <m1k5f5it4i.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-07-29 17:59 ` Serge E. Hallyn
2008-07-26 0:27 ` [PATCH 3/6] user namespaces: rig generic_permission for simple userns check Serge E. Hallyn
2008-07-26 0:27 ` [PATCH 4/6] user namespaces: add user_ns to super block Serge E. Hallyn
[not found] ` <20080726002754.GD29874-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-28 21:53 ` Eric W. Biederman
[not found] ` <m13altislf.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-07-28 22:47 ` Matt Helsley
[not found] ` <1217285230.25300.19.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-07-28 23:03 ` Eric W. Biederman
[not found] ` <m1skttehm6.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-07-29 18:09 ` Serge E. Hallyn
2008-07-29 18:05 ` Serge E. Hallyn
[not found] ` <20080729180515.GB365-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-29 19:22 ` Eric W. Biederman
[not found] ` <m13alscx7e.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-08-02 0:06 ` Serge E. Hallyn
[not found] ` <20080802000609.GA10211-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-02 1:49 ` Eric W. Biederman
[not found] ` <m1wsj0i3td.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-08-03 0:37 ` Serge E. Hallyn
2008-07-26 0:28 ` [PATCH 5/6] user namespaces: refuse create in other user_ns Serge E. Hallyn
2008-07-26 0:28 ` [PATCH 6/6] user_namespace: move put_user_ns outside lock Serge E. Hallyn
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.