* [RFC][v5] clone_with_pids() system call
@ 2009-09-07 21:13 ` Sukadev Bhattiprolu
0 siblings, 0 replies; 32+ messages in thread
From: Sukadev Bhattiprolu @ 2009-09-07 21:13 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
Alexey Dobriyan, Pavel Emelyanov
To support application checkpoint/restart, a task must have the same pid it
had when it was checkpointed. When containers are nested, the tasks within
the containers exist in multiple pid namespaces and hence have multiple pids
to specify during restart.
This patchset implements a new system call, clone_with_pids() that lets a
process specify the pids of the child process.
Patches 1 through 6 are helpers and we believe they are needed for application
restart, regardless of the kernel implementation of application restart.
Patch 8/8 defines a prototype of the new system call.
Changelog[v5]:
- Make 'pid_max' a property of pid_ns (Integrated Serge Hallyn's patch
into this set)
- (Eric Biederman): Avoid the new function, set_pidmap() - added
couple of checks on 'target_pid' in alloc_pidmap() itself.
=== IMPORTANT TODO:
clone() system call has another limitation - all available bits in clone-flags
are in use and any new clone-flag will need a variant of the clone() system
call.
It appears to make sense to try and extend this new system call to address
this limitation as well. The basic requirements of a new clone system call
could then be summarized as:
- do everything clone() does today, and
- give application an ability to choose pids for the child process
in all ancestor pid namespaces, and
- allow more clone_flags
Contstraints:
- system-calls are restricted to 6 parameters and clone() already
takes 5 parameters, any extension to clone() interface would require
one or more copy_from_user().
- does copy_from_user() of a few words have a significant impact on
the total cost of clone() ?
Based on these requirements and constraints, we have been exploring a couple
of system call interfaces and appreciate any iput.
1. =====
#if 64bit
#define CLONE_FLAGS_WORDS 1
#else
#define CLONE_FLAGS_WORDS 2
#endif
struct pid_set {
int num_pids;
pid_t *pids;
};
typedef struct {
unsigned long flags[CLONE_FLAGS_WORDS];
} clone_flags_t;
int clone_extended(clone_flags_t *flags, void *child_stack, int *unused,
int *parent_tid, int *child_tid, struct pid_set *pid_set);
Pros:
- extendible clone_flags (like sigset_t)
Cons:
- copy_from_user() needed on all architectures (we maybe able
to play some tricks with 'clone_flags_t' to avoid the copy
on 64-bit archtitectures till N_CLONE_FLAGS exceeds 64).
- Both applications and kernel must use interfaces equivalent
to sigsetops(3) to test/set/clear clone flags.
2. ======
struct clone_info {
int num_clone_high_words;
int *flags_high;
struct pid_set pid_set;
}
int clone_extended(int flags_low, void *child_stack, void *unused,
int *parent_tid, int *child_tid, struct clone_info *clone_info);
Pros:
- copy_from_user() needed only for new flags and pid_set
Cons:
- splitting the high and low clone-flags is awkward ?
Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
^ permalink raw reply [flat|nested] 32+ messages in thread* [RFC][v5] clone_with_pids() system call @ 2009-09-07 21:13 ` Sukadev Bhattiprolu 0 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:13 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev To support application checkpoint/restart, a task must have the same pid it had when it was checkpointed. When containers are nested, the tasks within the containers exist in multiple pid namespaces and hence have multiple pids to specify during restart. This patchset implements a new system call, clone_with_pids() that lets a process specify the pids of the child process. Patches 1 through 6 are helpers and we believe they are needed for application restart, regardless of the kernel implementation of application restart. Patch 8/8 defines a prototype of the new system call. Changelog[v5]: - Make 'pid_max' a property of pid_ns (Integrated Serge Hallyn's patch into this set) - (Eric Biederman): Avoid the new function, set_pidmap() - added couple of checks on 'target_pid' in alloc_pidmap() itself. === IMPORTANT TODO: clone() system call has another limitation - all available bits in clone-flags are in use and any new clone-flag will need a variant of the clone() system call. It appears to make sense to try and extend this new system call to address this limitation as well. The basic requirements of a new clone system call could then be summarized as: - do everything clone() does today, and - give application an ability to choose pids for the child process in all ancestor pid namespaces, and - allow more clone_flags Contstraints: - system-calls are restricted to 6 parameters and clone() already takes 5 parameters, any extension to clone() interface would require one or more copy_from_user(). - does copy_from_user() of a few words have a significant impact on the total cost of clone() ? Based on these requirements and constraints, we have been exploring a couple of system call interfaces and appreciate any iput. 1. ===== #if 64bit #define CLONE_FLAGS_WORDS 1 #else #define CLONE_FLAGS_WORDS 2 #endif struct pid_set { int num_pids; pid_t *pids; }; typedef struct { unsigned long flags[CLONE_FLAGS_WORDS]; } clone_flags_t; int clone_extended(clone_flags_t *flags, void *child_stack, int *unused, int *parent_tid, int *child_tid, struct pid_set *pid_set); Pros: - extendible clone_flags (like sigset_t) Cons: - copy_from_user() needed on all architectures (we maybe able to play some tricks with 'clone_flags_t' to avoid the copy on 64-bit archtitectures till N_CLONE_FLAGS exceeds 64). - Both applications and kernel must use interfaces equivalent to sigsetops(3) to test/set/clear clone flags. 2. ====== struct clone_info { int num_clone_high_words; int *flags_high; struct pid_set pid_set; } int clone_extended(int flags_low, void *child_stack, void *unused, int *parent_tid, int *child_tid, struct clone_info *clone_info); Pros: - copy_from_user() needed only for new flags and pid_set Cons: - splitting the high and low clone-flags is awkward ? Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 1/8]: Factor out code to allocate pidmap page 2009-09-07 21:13 ` Sukadev Bhattiprolu (?) @ 2009-09-07 21:13 ` Sukadev Bhattiprolu -1 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:13 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev Subject: [RFC][v5][PATCH 1/8]: Factor out code to allocate pidmap page To simplify alloc_pidmap(), move code to allocate a pid map page to a separate function. Changelog[v3]: - Earlier version of patchset called alloc_pidmap_page() from two places. But now its called from only one place. Even so, moving this code out into a separate function simplifies alloc_pidmap(). Changelog[v2]: - (Matt Helsley, Dave Hansen) Have alloc_pidmap_page() return -ENOMEM on error instead of -1. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: Oren Laadan <orenl@cs.columbia.edu> --- kernel/pid.c | 46 ++++++++++++++++++++++++++++++---------------- 1 files changed, 30 insertions(+), 16 deletions(-) Index: linux-2.6/kernel/pid.c =================================================================== --- linux-2.6.orig/kernel/pid.c 2009-09-07 13:08:50.000000000 -0700 +++ linux-2.6/kernel/pid.c 2009-09-07 13:11:30.000000000 -0700 @@ -122,9 +122,35 @@ atomic_inc(&map->nr_free); } +static int alloc_pidmap_page(struct pidmap *map) +{ + void *page; + + if (likely(map->page)) + return 0; + + page = kzalloc(PAGE_SIZE, GFP_KERNEL); + + /* + * Free the page if someone raced with us installing it: + */ + spin_lock_irq(&pidmap_lock); + if (map->page) + kfree(page); + else + map->page = page; + spin_unlock_irq(&pidmap_lock); + + if (unlikely(!map->page)) + return -ENOMEM; + + return 0; +} + static int alloc_pidmap(struct pid_namespace *pid_ns) { int i, offset, max_scan, pid, last = pid_ns->last_pid; + int rc; struct pidmap *map; pid = last + 1; @@ -134,21 +160,10 @@ map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset; for (i = 0; i <= max_scan; ++i) { - if (unlikely(!map->page)) { - void *page = kzalloc(PAGE_SIZE, GFP_KERNEL); - /* - * Free the page if someone raced with us - * installing it: - */ - spin_lock_irq(&pidmap_lock); - if (map->page) - kfree(page); - else - map->page = page; - spin_unlock_irq(&pidmap_lock); - if (unlikely(!map->page)) - break; - } + rc = alloc_pidmap_page(map); + if (rc) + break; + if (likely(atomic_read(&map->nr_free))) { do { if (!test_and_set_bit(offset, map->page)) { ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 3/8] Make pid_max a pid_ns property 2009-09-07 21:13 ` Sukadev Bhattiprolu (?) (?) @ 2009-09-07 21:14 ` Sukadev Bhattiprolu -1 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:14 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev From: Serge Hallyn <serue@us.ibm.com> Subject: [RFC][v5][PATCH 3/8] Make pid_max a pid_ns property Remove the pid_max global, and make it a property of the pid_namespace. When a pid_ns is created, it inherits the parent's pid_ns. Fixing up sysctl (trivial akin to ipc version, but potentially tedious to get right for all CONFIG* combinations) is left for later. Changelog[v2]: - Port to newer kernel - Make pid_max a local variable in alloc_pidmap() to simplify code/patch Signed-off-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> --- include/linux/pid_namespace.h | 1 + kernel/pid.c | 14 +++++++------- kernel/pid_namespace.c | 6 ++++-- kernel/sysctl.c | 4 ++-- 4 files changed, 14 insertions(+), 11 deletions(-) Index: linux-2.6/include/linux/pid_namespace.h =================================================================== --- linux-2.6.orig/include/linux/pid_namespace.h 2009-08-05 13:00:21.000000000 -0700 +++ linux-2.6/include/linux/pid_namespace.h 2009-09-07 13:12:04.000000000 -0700 @@ -30,6 +30,7 @@ #ifdef CONFIG_BSD_PROCESS_ACCT struct bsd_acct_struct *bacct; #endif + int pid_max; }; extern struct pid_namespace init_pid_ns; Index: linux-2.6/kernel/pid.c =================================================================== --- linux-2.6.orig/kernel/pid.c 2009-09-07 13:11:39.000000000 -0700 +++ linux-2.6/kernel/pid.c 2009-09-07 13:12:04.000000000 -0700 @@ -43,8 +43,6 @@ static int pidhash_shift; struct pid init_struct_pid = INIT_STRUCT_PID; -int pid_max = PID_MAX_DEFAULT; - #define RESERVED_PIDS 300 int pid_max_min = RESERVED_PIDS + 1; @@ -78,6 +76,7 @@ .last_pid = 0, .level = 0, .child_reaper = &init_task, + .pid_max = PID_MAX_DEFAULT, }; EXPORT_SYMBOL_GPL(init_pid_ns); @@ -151,6 +150,7 @@ { int i, offset, max_scan, pid, last = pid_ns->last_pid; int rc = -EAGAIN; + int pid_max = pid_ns->pid_max; struct pidmap *map; pid = last + 1; Index: linux-2.6/kernel/pid_namespace.c =================================================================== --- linux-2.6.orig/kernel/pid_namespace.c 2009-08-05 13:00:23.000000000 -0700 +++ linux-2.6/kernel/pid_namespace.c 2009-09-07 13:12:04.000000000 -0700 @@ -87,6 +87,7 @@ kref_init(&ns->kref); ns->level = level; + ns->pid_max = parent_pid_ns->pid_max; ns->parent = get_pid_ns(parent_pid_ns); set_bit(0, ns->pidmap[0].page); Index: linux-2.6/kernel/sysctl.c =================================================================== --- linux-2.6.orig/kernel/sysctl.c 2009-09-07 13:10:39.000000000 -0700 +++ linux-2.6/kernel/sysctl.c 2009-09-07 13:13:23.000000000 -0700 @@ -55,6 +55,7 @@ #include <asm/uaccess.h> #include <asm/processor.h> +#include <linux/pid_namespace.h> #ifdef CONFIG_X86 #include <asm/nmi.h> @@ -78,7 +79,6 @@ extern int core_uses_pid; extern int suid_dumpable; extern char core_pattern[]; -extern int pid_max; extern int min_free_kbytes; extern int pid_max_min, pid_max_max; extern int sysctl_drop_caches; @@ -670,7 +670,7 @@ { .ctl_name = KERN_PIDMAX, .procname = "pid_max", - .data = &pid_max, + .data = &init_pid_ns.pid_max, .maxlen = sizeof (int), .mode = 0644, .proc_handler = &proc_dointvec_minmax, ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 5/8]: Add target_pids parameter to alloc_pid() 2009-09-07 21:13 ` Sukadev Bhattiprolu ` (2 preceding siblings ...) (?) @ 2009-09-07 21:15 ` Sukadev Bhattiprolu -1 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:15 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev Subject: [RFC][v5][PATCH 5/8]: Add target_pids parameter to alloc_pid() This parameter is currently NULL, but will be used in a follow-on patch. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: Oren Laadan <orenl@cs.columbia.edu> --- include/linux/pid.h | 2 +- kernel/fork.c | 3 ++- kernel/pid.c | 9 +++++++-- 3 files changed, 10 insertions(+), 4 deletions(-) Index: linux-mmotm/include/linux/pid.h =================================================================== --- linux-mmotm.orig/include/linux/pid.h 2009-09-05 20:58:11.000000000 -0700 +++ linux-mmotm/include/linux/pid.h 2009-09-05 21:19:16.000000000 -0700 @@ -119,7 +119,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *); int next_pidmap(struct pid_namespace *pid_ns, int last); -extern struct pid *alloc_pid(struct pid_namespace *ns); +extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *target_pids); extern void free_pid(struct pid *pid); /* Index: linux-mmotm/kernel/fork.c =================================================================== --- linux-mmotm.orig/kernel/fork.c 2009-09-05 21:15:33.000000000 -0700 +++ linux-mmotm/kernel/fork.c 2009-09-05 21:19:16.000000000 -0700 @@ -983,6 +983,7 @@ int retval; struct task_struct *p; int cgroup_callbacks_done = 0; + pid_t *target_pids = NULL; if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS)) return ERR_PTR(-EINVAL); @@ -1159,7 +1160,7 @@ goto bad_fork_cleanup_io; if (pid != &init_struct_pid) { - pid = alloc_pid(p->nsproxy->pid_ns); + pid = alloc_pid(p->nsproxy->pid_ns, target_pids); if (IS_ERR(pid)) { retval = PTR_ERR(pid); goto bad_fork_cleanup_io; Index: linux-mmotm/kernel/pid.c =================================================================== --- linux-mmotm.orig/kernel/pid.c 2009-09-05 21:19:11.000000000 -0700 +++ linux-mmotm/kernel/pid.c 2009-09-05 21:19:16.000000000 -0700 @@ -268,13 +268,14 @@ call_rcu(&pid->rcu, delayed_put_pid); } -struct pid *alloc_pid(struct pid_namespace *ns) +struct pid *alloc_pid(struct pid_namespace *ns, pid_t *target_pids) { struct pid *pid; enum pid_type type; int i, nr; struct pid_namespace *tmp; struct upid *upid; + int tpid; pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL); if (!pid) { @@ -284,7 +285,11 @@ tmp = ns; for (i = ns->level; i >= 0; i--) { - nr = alloc_pidmap(tmp, 0); + tpid = 0; + if (target_pids) + tpid = target_pids[i]; + + nr = alloc_pidmap(tmp, tpid); if (nr < 0) goto out_free; ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <20090907211302.GA5892-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* [RFC][v5][PATCH 1/8]: Factor out code to allocate pidmap page [not found] ` <20090907211302.GA5892-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2009-09-07 21:13 ` Sukadev Bhattiprolu 2009-09-07 21:14 ` Sukadev Bhattiprolu ` (6 subsequent siblings) 7 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:13 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov Subject: [RFC][v5][PATCH 1/8]: Factor out code to allocate pidmap page To simplify alloc_pidmap(), move code to allocate a pid map page to a separate function. Changelog[v3]: - Earlier version of patchset called alloc_pidmap_page() from two places. But now its called from only one place. Even so, moving this code out into a separate function simplifies alloc_pidmap(). Changelog[v2]: - (Matt Helsley, Dave Hansen) Have alloc_pidmap_page() return -ENOMEM on error instead of -1. Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> --- kernel/pid.c | 46 ++++++++++++++++++++++++++++++---------------- 1 files changed, 30 insertions(+), 16 deletions(-) Index: linux-2.6/kernel/pid.c =================================================================== --- linux-2.6.orig/kernel/pid.c 2009-09-07 13:08:50.000000000 -0700 +++ linux-2.6/kernel/pid.c 2009-09-07 13:11:30.000000000 -0700 @@ -122,9 +122,35 @@ atomic_inc(&map->nr_free); } +static int alloc_pidmap_page(struct pidmap *map) +{ + void *page; + + if (likely(map->page)) + return 0; + + page = kzalloc(PAGE_SIZE, GFP_KERNEL); + + /* + * Free the page if someone raced with us installing it: + */ + spin_lock_irq(&pidmap_lock); + if (map->page) + kfree(page); + else + map->page = page; + spin_unlock_irq(&pidmap_lock); + + if (unlikely(!map->page)) + return -ENOMEM; + + return 0; +} + static int alloc_pidmap(struct pid_namespace *pid_ns) { int i, offset, max_scan, pid, last = pid_ns->last_pid; + int rc; struct pidmap *map; pid = last + 1; @@ -134,21 +160,10 @@ map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset; for (i = 0; i <= max_scan; ++i) { - if (unlikely(!map->page)) { - void *page = kzalloc(PAGE_SIZE, GFP_KERNEL); - /* - * Free the page if someone raced with us - * installing it: - */ - spin_lock_irq(&pidmap_lock); - if (map->page) - kfree(page); - else - map->page = page; - spin_unlock_irq(&pidmap_lock); - if (unlikely(!map->page)) - break; - } + rc = alloc_pidmap_page(map); + if (rc) + break; + if (likely(atomic_read(&map->nr_free))) { do { if (!test_and_set_bit(offset, map->page)) { ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 2/8]: Have alloc_pidmap() return actual error code 2009-09-07 21:13 ` Sukadev Bhattiprolu @ 2009-09-07 21:14 ` Sukadev Bhattiprolu -1 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:14 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov Subject: [RFC][v5][PATCH 2/8]: Have alloc_pidmap() return actual error code alloc_pidmap() can fail either because all pid numbers are in use or because memory allocation failed. With support for setting a specific pid number, alloc_pidmap() would also fail if either the given pid number is invalid or in use. Rather than have callers assume -ENOMEM, have alloc_pidmap() return the actual error. Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> --- kernel/fork.c | 5 +++-- kernel/pid.c | 9 ++++++--- 2 files changed, 9 insertions(+), 5 deletions(-) Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c 2009-09-07 13:10:39.000000000 -0700 +++ linux-2.6/kernel/fork.c 2009-09-07 13:11:39.000000000 -0700 @@ -1110,10 +1110,11 @@ goto bad_fork_cleanup_io; if (pid != &init_struct_pid) { - retval = -ENOMEM; pid = alloc_pid(p->nsproxy->pid_ns); - if (!pid) + if (IS_ERR(pid)) { + retval = PTR_ERR(pid); goto bad_fork_cleanup_io; + } if (clone_flags & CLONE_NEWPID) { retval = pid_ns_prepare_proc(p->nsproxy->pid_ns); Index: linux-2.6/kernel/pid.c =================================================================== --- linux-2.6.orig/kernel/pid.c 2009-09-07 13:11:30.000000000 -0700 +++ linux-2.6/kernel/pid.c 2009-09-07 13:11:39.000000000 -0700 @@ -150,7 +150,7 @@ static int alloc_pidmap(struct pid_namespace *pid_ns) { int i, offset, max_scan, pid, last = pid_ns->last_pid; - int rc; + int rc = -EAGAIN; struct pidmap *map; pid = last + 1; @@ -189,12 +189,14 @@ } else { map = &pid_ns->pidmap[0]; offset = RESERVED_PIDS; - if (unlikely(last == offset)) + if (unlikely(last == offset)) { + rc = -EAGAIN; break; + } } pid = mk_pid(pid_ns, map, offset); } - return -1; + return rc; } int next_pidmap(struct pid_namespace *pid_ns, int last) @@ -263,8 +265,10 @@ struct upid *upid; pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL); - if (!pid) + if (!pid) { + pid = ERR_PTR(-ENOMEM); goto out; + } tmp = ns; for (i = ns->level; i >= 0; i--) { @@ -299,7 +303,7 @@ free_pidmap(pid->numbers + i); kmem_cache_free(ns->pid_cachep, pid); - pid = NULL; + pid = ERR_PTR(nr); goto out; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 2/8]: Have alloc_pidmap() return actual error code @ 2009-09-07 21:14 ` Sukadev Bhattiprolu 0 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:14 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev Subject: [RFC][v5][PATCH 2/8]: Have alloc_pidmap() return actual error code alloc_pidmap() can fail either because all pid numbers are in use or because memory allocation failed. With support for setting a specific pid number, alloc_pidmap() would also fail if either the given pid number is invalid or in use. Rather than have callers assume -ENOMEM, have alloc_pidmap() return the actual error. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: Oren Laadan <orenl@cs.columbia.edu> --- kernel/fork.c | 5 +++-- kernel/pid.c | 9 ++++++--- 2 files changed, 9 insertions(+), 5 deletions(-) Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c 2009-09-07 13:10:39.000000000 -0700 +++ linux-2.6/kernel/fork.c 2009-09-07 13:11:39.000000000 -0700 @@ -1110,10 +1110,11 @@ goto bad_fork_cleanup_io; if (pid != &init_struct_pid) { - retval = -ENOMEM; pid = alloc_pid(p->nsproxy->pid_ns); - if (!pid) + if (IS_ERR(pid)) { + retval = PTR_ERR(pid); goto bad_fork_cleanup_io; + } if (clone_flags & CLONE_NEWPID) { retval = pid_ns_prepare_proc(p->nsproxy->pid_ns); Index: linux-2.6/kernel/pid.c =================================================================== --- linux-2.6.orig/kernel/pid.c 2009-09-07 13:11:30.000000000 -0700 +++ linux-2.6/kernel/pid.c 2009-09-07 13:11:39.000000000 -0700 @@ -150,7 +150,7 @@ static int alloc_pidmap(struct pid_namespace *pid_ns) { int i, offset, max_scan, pid, last = pid_ns->last_pid; - int rc; + int rc = -EAGAIN; struct pidmap *map; pid = last + 1; @@ -189,12 +189,14 @@ } else { map = &pid_ns->pidmap[0]; offset = RESERVED_PIDS; - if (unlikely(last == offset)) + if (unlikely(last == offset)) { + rc = -EAGAIN; break; + } } pid = mk_pid(pid_ns, map, offset); } - return -1; + return rc; } int next_pidmap(struct pid_namespace *pid_ns, int last) @@ -263,8 +265,10 @@ struct upid *upid; pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL); - if (!pid) + if (!pid) { + pid = ERR_PTR(-ENOMEM); goto out; + } tmp = ns; for (i = ns->level; i >= 0; i--) { @@ -299,7 +303,7 @@ free_pidmap(pid->numbers + i); kmem_cache_free(ns->pid_cachep, pid); - pid = NULL; + pid = ERR_PTR(nr); goto out; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 3/8] Make pid_max a pid_ns property [not found] ` <20090907211302.GA5892-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2009-09-07 21:13 ` [RFC][v5][PATCH 1/8]: Factor out code to allocate pidmap page Sukadev Bhattiprolu 2009-09-07 21:14 ` Sukadev Bhattiprolu @ 2009-09-07 21:14 ` Sukadev Bhattiprolu 2009-09-07 21:15 ` Sukadev Bhattiprolu ` (4 subsequent siblings) 7 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:14 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov From: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Subject: [RFC][v5][PATCH 3/8] Make pid_max a pid_ns property Remove the pid_max global, and make it a property of the pid_namespace. When a pid_ns is created, it inherits the parent's pid_ns. Fixing up sysctl (trivial akin to ipc version, but potentially tedious to get right for all CONFIG* combinations) is left for later. Changelog[v2]: - Port to newer kernel - Make pid_max a local variable in alloc_pidmap() to simplify code/patch Signed-off-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> --- include/linux/pid_namespace.h | 1 + kernel/pid.c | 14 +++++++------- kernel/pid_namespace.c | 6 ++++-- kernel/sysctl.c | 4 ++-- 4 files changed, 14 insertions(+), 11 deletions(-) Index: linux-2.6/include/linux/pid_namespace.h =================================================================== --- linux-2.6.orig/include/linux/pid_namespace.h 2009-08-05 13:00:21.000000000 -0700 +++ linux-2.6/include/linux/pid_namespace.h 2009-09-07 13:12:04.000000000 -0700 @@ -30,6 +30,7 @@ #ifdef CONFIG_BSD_PROCESS_ACCT struct bsd_acct_struct *bacct; #endif + int pid_max; }; extern struct pid_namespace init_pid_ns; Index: linux-2.6/kernel/pid.c =================================================================== --- linux-2.6.orig/kernel/pid.c 2009-09-07 13:11:39.000000000 -0700 +++ linux-2.6/kernel/pid.c 2009-09-07 13:12:04.000000000 -0700 @@ -43,8 +43,6 @@ static int pidhash_shift; struct pid init_struct_pid = INIT_STRUCT_PID; -int pid_max = PID_MAX_DEFAULT; - #define RESERVED_PIDS 300 int pid_max_min = RESERVED_PIDS + 1; @@ -78,6 +76,7 @@ .last_pid = 0, .level = 0, .child_reaper = &init_task, + .pid_max = PID_MAX_DEFAULT, }; EXPORT_SYMBOL_GPL(init_pid_ns); @@ -151,6 +150,7 @@ { int i, offset, max_scan, pid, last = pid_ns->last_pid; int rc = -EAGAIN; + int pid_max = pid_ns->pid_max; struct pidmap *map; pid = last + 1; Index: linux-2.6/kernel/pid_namespace.c =================================================================== --- linux-2.6.orig/kernel/pid_namespace.c 2009-08-05 13:00:23.000000000 -0700 +++ linux-2.6/kernel/pid_namespace.c 2009-09-07 13:12:04.000000000 -0700 @@ -87,6 +87,7 @@ kref_init(&ns->kref); ns->level = level; + ns->pid_max = parent_pid_ns->pid_max; ns->parent = get_pid_ns(parent_pid_ns); set_bit(0, ns->pidmap[0].page); Index: linux-2.6/kernel/sysctl.c =================================================================== --- linux-2.6.orig/kernel/sysctl.c 2009-09-07 13:10:39.000000000 -0700 +++ linux-2.6/kernel/sysctl.c 2009-09-07 13:13:23.000000000 -0700 @@ -55,6 +55,7 @@ #include <asm/uaccess.h> #include <asm/processor.h> +#include <linux/pid_namespace.h> #ifdef CONFIG_X86 #include <asm/nmi.h> @@ -78,7 +79,6 @@ extern int core_uses_pid; extern int suid_dumpable; extern char core_pattern[]; -extern int pid_max; extern int min_free_kbytes; extern int pid_max_min, pid_max_max; extern int sysctl_drop_caches; @@ -670,7 +670,7 @@ { .ctl_name = KERN_PIDMAX, .procname = "pid_max", - .data = &pid_max, + .data = &init_pid_ns.pid_max, .maxlen = sizeof (int), .mode = 0644, .proc_handler = &proc_dointvec_minmax, ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() 2009-09-07 21:13 ` Sukadev Bhattiprolu @ 2009-09-07 21:15 ` Sukadev Bhattiprolu -1 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:15 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov Subject: [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() With support for setting a specific pid number for a process, alloc_pidmap() will need a 'target_pid' parameter. Changelog[v3]: - (Eric Biederman): Avoid set_pidmap() function. Added couple of checks for target_pid in alloc_pidmap() itself. Changelog[v2]: - (Serge Hallyn) Check for 'pid < 0' in set_pidmap().(Code actually checks for 'pid <= 0' for completeness). Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Index: linux-mmotm/kernel/pid.c =================================================================== --- linux-mmotm.orig/kernel/pid.c 2009-09-05 21:17:36.000000000 -0700 +++ linux-mmotm/kernel/pid.c 2009-09-05 21:19:11.000000000 -0700 @@ -146,16 +146,22 @@ return 0; } -static int alloc_pidmap(struct pid_namespace *pid_ns) +static int alloc_pidmap(struct pid_namespace *pid_ns, int target_pid) { int i, offset, max_scan, pid, last = pid_ns->last_pid; int rc = -EAGAIN; int pid_max = pid_ns->pid_max; struct pidmap *map; - pid = last + 1; - if (pid >= pid_max) - pid = RESERVED_PIDS; + if (target_pid) { + pid = target_pid; + if (pid < 0 || pid >= pid_max) + return -EINVAL; + } else { + pid = last + 1; + if (pid >= pid_max) + pid = RESERVED_PIDS; + } offset = pid & BITS_PER_PAGE_MASK; map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset; @@ -168,9 +174,14 @@ do { if (!test_and_set_bit(offset, map->page)) { atomic_dec(&map->nr_free); - pid_ns->last_pid = pid; + if (!target_pid) + pid_ns->last_pid = pid; return pid; + } else if (target_pid) { + rc = -EBUSY; + break; } + offset = find_next_offset(map, offset); pid = mk_pid(pid_ns, map, offset); /* @@ -196,6 +207,7 @@ } pid = mk_pid(pid_ns, map, offset); } + return rc; } @@ -272,7 +284,7 @@ tmp = ns; for (i = ns->level; i >= 0; i--) { - nr = alloc_pidmap(tmp); + nr = alloc_pidmap(tmp, 0); if (nr < 0) goto out_free; ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() @ 2009-09-07 21:15 ` Sukadev Bhattiprolu 0 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:15 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev Subject: [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() With support for setting a specific pid number for a process, alloc_pidmap() will need a 'target_pid' parameter. Changelog[v3]: - (Eric Biederman): Avoid set_pidmap() function. Added couple of checks for target_pid in alloc_pidmap() itself. Changelog[v2]: - (Serge Hallyn) Check for 'pid < 0' in set_pidmap().(Code actually checks for 'pid <= 0' for completeness). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Index: linux-mmotm/kernel/pid.c =================================================================== --- linux-mmotm.orig/kernel/pid.c 2009-09-05 21:17:36.000000000 -0700 +++ linux-mmotm/kernel/pid.c 2009-09-05 21:19:11.000000000 -0700 @@ -146,16 +146,22 @@ return 0; } -static int alloc_pidmap(struct pid_namespace *pid_ns) +static int alloc_pidmap(struct pid_namespace *pid_ns, int target_pid) { int i, offset, max_scan, pid, last = pid_ns->last_pid; int rc = -EAGAIN; int pid_max = pid_ns->pid_max; struct pidmap *map; - pid = last + 1; - if (pid >= pid_max) - pid = RESERVED_PIDS; + if (target_pid) { + pid = target_pid; + if (pid < 0 || pid >= pid_max) + return -EINVAL; + } else { + pid = last + 1; + if (pid >= pid_max) + pid = RESERVED_PIDS; + } offset = pid & BITS_PER_PAGE_MASK; map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset; @@ -168,9 +174,14 @@ do { if (!test_and_set_bit(offset, map->page)) { atomic_dec(&map->nr_free); - pid_ns->last_pid = pid; + if (!target_pid) + pid_ns->last_pid = pid; return pid; + } else if (target_pid) { + rc = -EBUSY; + break; } + offset = find_next_offset(map, offset); pid = mk_pid(pid_ns, map, offset); /* @@ -196,6 +207,7 @@ } pid = mk_pid(pid_ns, map, offset); } + return rc; } @@ -272,7 +284,7 @@ tmp = ns; for (i = ns->level; i >= 0; i--) { - nr = alloc_pidmap(tmp); + nr = alloc_pidmap(tmp, 0); if (nr < 0) goto out_free; ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <20090907211506.GD6685-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() [not found] ` <20090907211506.GD6685-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2009-09-08 14:16 ` Serge E. Hallyn 0 siblings, 0 replies; 32+ messages in thread From: Serge E. Hallyn @ 2009-09-08 14:16 UTC (permalink / raw) To: Sukadev Bhattiprolu Cc: Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov Quoting Sukadev Bhattiprolu (sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org): > > > Subject: [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() > > With support for setting a specific pid number for a process, > alloc_pidmap() will need a 'target_pid' parameter. > > Changelog[v3]: > - (Eric Biederman): Avoid set_pidmap() function. Added couple of > checks for target_pid in alloc_pidmap() itself. > Changelog[v2]: > - (Serge Hallyn) Check for 'pid < 0' in set_pidmap().(Code > actually checks for 'pid <= 0' for completeness). > > Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> > > Index: linux-mmotm/kernel/pid.c > =================================================================== > --- linux-mmotm.orig/kernel/pid.c 2009-09-05 21:17:36.000000000 -0700 > +++ linux-mmotm/kernel/pid.c 2009-09-05 21:19:11.000000000 -0700 > @@ -146,16 +146,22 @@ > return 0; > } > > -static int alloc_pidmap(struct pid_namespace *pid_ns) > +static int alloc_pidmap(struct pid_namespace *pid_ns, int target_pid) > { > int i, offset, max_scan, pid, last = pid_ns->last_pid; > int rc = -EAGAIN; > int pid_max = pid_ns->pid_max; > struct pidmap *map; > > - pid = last + 1; > - if (pid >= pid_max) > - pid = RESERVED_PIDS; > + if (target_pid) { > + pid = target_pid; > + if (pid < 0 || pid >= pid_max) > + return -EINVAL; > + } else { > + pid = last + 1; > + if (pid >= pid_max) > + pid = RESERVED_PIDS; > + } > offset = pid & BITS_PER_PAGE_MASK; > map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; > max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset; > @@ -168,9 +174,14 @@ > do { > if (!test_and_set_bit(offset, map->page)) { > atomic_dec(&map->nr_free); > - pid_ns->last_pid = pid; > + if (!target_pid) > + pid_ns->last_pid = pid; > return pid; > + } else if (target_pid) { > + rc = -EBUSY; > + break; > } > + > offset = find_next_offset(map, offset); > pid = mk_pid(pid_ns, map, offset); > /* > @@ -196,6 +207,7 @@ > } > pid = mk_pid(pid_ns, map, offset); > } > + > return rc; > } > > @@ -272,7 +284,7 @@ > > tmp = ns; > for (i = ns->level; i >= 0; i--) { > - nr = alloc_pidmap(tmp); > + nr = alloc_pidmap(tmp, 0); > if (nr < 0) > goto out_free; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() 2009-09-07 21:15 ` Sukadev Bhattiprolu (?) (?) @ 2009-09-08 14:16 ` Serge E. Hallyn -1 siblings, 0 replies; 32+ messages in thread From: Serge E. Hallyn @ 2009-09-08 14:16 UTC (permalink / raw) To: Sukadev Bhattiprolu Cc: linux-kernel, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev Quoting Sukadev Bhattiprolu (sukadev@linux.vnet.ibm.com): > > > Subject: [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() > > With support for setting a specific pid number for a process, > alloc_pidmap() will need a 'target_pid' parameter. > > Changelog[v3]: > - (Eric Biederman): Avoid set_pidmap() function. Added couple of > checks for target_pid in alloc_pidmap() itself. > Changelog[v2]: > - (Serge Hallyn) Check for 'pid < 0' in set_pidmap().(Code > actually checks for 'pid <= 0' for completeness). > > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> > > Index: linux-mmotm/kernel/pid.c > =================================================================== > --- linux-mmotm.orig/kernel/pid.c 2009-09-05 21:17:36.000000000 -0700 > +++ linux-mmotm/kernel/pid.c 2009-09-05 21:19:11.000000000 -0700 > @@ -146,16 +146,22 @@ > return 0; > } > > -static int alloc_pidmap(struct pid_namespace *pid_ns) > +static int alloc_pidmap(struct pid_namespace *pid_ns, int target_pid) > { > int i, offset, max_scan, pid, last = pid_ns->last_pid; > int rc = -EAGAIN; > int pid_max = pid_ns->pid_max; > struct pidmap *map; > > - pid = last + 1; > - if (pid >= pid_max) > - pid = RESERVED_PIDS; > + if (target_pid) { > + pid = target_pid; > + if (pid < 0 || pid >= pid_max) > + return -EINVAL; > + } else { > + pid = last + 1; > + if (pid >= pid_max) > + pid = RESERVED_PIDS; > + } > offset = pid & BITS_PER_PAGE_MASK; > map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; > max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset; > @@ -168,9 +174,14 @@ > do { > if (!test_and_set_bit(offset, map->page)) { > atomic_dec(&map->nr_free); > - pid_ns->last_pid = pid; > + if (!target_pid) > + pid_ns->last_pid = pid; > return pid; > + } else if (target_pid) { > + rc = -EBUSY; > + break; > } > + > offset = find_next_offset(map, offset); > pid = mk_pid(pid_ns, map, offset); > /* > @@ -196,6 +207,7 @@ > } > pid = mk_pid(pid_ns, map, offset); > } > + > return rc; > } > > @@ -272,7 +284,7 @@ > > tmp = ns; > for (i = ns->level; i >= 0; i--) { > - nr = alloc_pidmap(tmp); > + nr = alloc_pidmap(tmp, 0); > if (nr < 0) > goto out_free; ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 5/8]: Add target_pids parameter to alloc_pid() [not found] ` <20090907211302.GA5892-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (3 preceding siblings ...) 2009-09-07 21:15 ` Sukadev Bhattiprolu @ 2009-09-07 21:15 ` Sukadev Bhattiprolu 2009-09-07 21:15 ` Sukadev Bhattiprolu ` (2 subsequent siblings) 7 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:15 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov Subject: [RFC][v5][PATCH 5/8]: Add target_pids parameter to alloc_pid() This parameter is currently NULL, but will be used in a follow-on patch. Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> --- include/linux/pid.h | 2 +- kernel/fork.c | 3 ++- kernel/pid.c | 9 +++++++-- 3 files changed, 10 insertions(+), 4 deletions(-) Index: linux-mmotm/include/linux/pid.h =================================================================== --- linux-mmotm.orig/include/linux/pid.h 2009-09-05 20:58:11.000000000 -0700 +++ linux-mmotm/include/linux/pid.h 2009-09-05 21:19:16.000000000 -0700 @@ -119,7 +119,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *); int next_pidmap(struct pid_namespace *pid_ns, int last); -extern struct pid *alloc_pid(struct pid_namespace *ns); +extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *target_pids); extern void free_pid(struct pid *pid); /* Index: linux-mmotm/kernel/fork.c =================================================================== --- linux-mmotm.orig/kernel/fork.c 2009-09-05 21:15:33.000000000 -0700 +++ linux-mmotm/kernel/fork.c 2009-09-05 21:19:16.000000000 -0700 @@ -983,6 +983,7 @@ int retval; struct task_struct *p; int cgroup_callbacks_done = 0; + pid_t *target_pids = NULL; if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS)) return ERR_PTR(-EINVAL); @@ -1159,7 +1160,7 @@ goto bad_fork_cleanup_io; if (pid != &init_struct_pid) { - pid = alloc_pid(p->nsproxy->pid_ns); + pid = alloc_pid(p->nsproxy->pid_ns, target_pids); if (IS_ERR(pid)) { retval = PTR_ERR(pid); goto bad_fork_cleanup_io; Index: linux-mmotm/kernel/pid.c =================================================================== --- linux-mmotm.orig/kernel/pid.c 2009-09-05 21:19:11.000000000 -0700 +++ linux-mmotm/kernel/pid.c 2009-09-05 21:19:16.000000000 -0700 @@ -268,13 +268,14 @@ call_rcu(&pid->rcu, delayed_put_pid); } -struct pid *alloc_pid(struct pid_namespace *ns) +struct pid *alloc_pid(struct pid_namespace *ns, pid_t *target_pids) { struct pid *pid; enum pid_type type; int i, nr; struct pid_namespace *tmp; struct upid *upid; + int tpid; pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL); if (!pid) { @@ -284,7 +285,11 @@ tmp = ns; for (i = ns->level; i >= 0; i--) { - nr = alloc_pidmap(tmp, 0); + tpid = 0; + if (target_pids) + tpid = target_pids[i]; + + nr = alloc_pidmap(tmp, tpid); if (nr < 0) goto out_free; ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 6/8]: Add target_pids parameter to copy_process() 2009-09-07 21:13 ` Sukadev Bhattiprolu @ 2009-09-07 21:15 ` Sukadev Bhattiprolu -1 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:15 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov Subject: [RFC][v5][PATCH 6/8]: Add target_pids parameter to copy_process() Add a 'target_pids' parameter to copy_process(). The new parameter will be used in a follow-on patch when clone_with_pids() is implemented. Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> --- kernel/fork.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) Index: linux-mmotm/kernel/fork.c =================================================================== --- linux-mmotm.orig/kernel/fork.c 2009-09-05 21:19:16.000000000 -0700 +++ linux-mmotm/kernel/fork.c 2009-09-05 21:19:58.000000000 -0700 @@ -978,12 +978,12 @@ unsigned long stack_size, int __user *child_tidptr, struct pid *pid, + pid_t *target_pids, int trace) { int retval; struct task_struct *p; int cgroup_callbacks_done = 0; - pid_t *target_pids = NULL; if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS)) return ERR_PTR(-EINVAL); @@ -1365,7 +1365,7 @@ struct pt_regs regs; task = copy_process(CLONE_VM, 0, idle_regs(®s), 0, NULL, - &init_struct_pid, 0); + &init_struct_pid, NULL, 0); if (!IS_ERR(task)) init_idle(task, cpu); @@ -1388,6 +1388,7 @@ struct task_struct *p; int trace = 0; long nr; + pid_t *target_pids = NULL; /* * Do some preliminary argument and permissions checking before we @@ -1428,7 +1429,7 @@ trace = tracehook_prepare_clone(clone_flags); p = copy_process(clone_flags, stack_start, regs, stack_size, - child_tidptr, NULL, trace); + child_tidptr, NULL, target_pids, trace); /* * Do this prior waking up the new thread - the thread pointer * might get invalid after that point, if the thread exits quickly. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 6/8]: Add target_pids parameter to copy_process() @ 2009-09-07 21:15 ` Sukadev Bhattiprolu 0 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:15 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev Subject: [RFC][v5][PATCH 6/8]: Add target_pids parameter to copy_process() Add a 'target_pids' parameter to copy_process(). The new parameter will be used in a follow-on patch when clone_with_pids() is implemented. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: Oren Laadan <orenl@cs.columbia.edu> --- kernel/fork.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) Index: linux-mmotm/kernel/fork.c =================================================================== --- linux-mmotm.orig/kernel/fork.c 2009-09-05 21:19:16.000000000 -0700 +++ linux-mmotm/kernel/fork.c 2009-09-05 21:19:58.000000000 -0700 @@ -978,12 +978,12 @@ unsigned long stack_size, int __user *child_tidptr, struct pid *pid, + pid_t *target_pids, int trace) { int retval; struct task_struct *p; int cgroup_callbacks_done = 0; - pid_t *target_pids = NULL; if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS)) return ERR_PTR(-EINVAL); @@ -1365,7 +1365,7 @@ struct pt_regs regs; task = copy_process(CLONE_VM, 0, idle_regs(®s), 0, NULL, - &init_struct_pid, 0); + &init_struct_pid, NULL, 0); if (!IS_ERR(task)) init_idle(task, cpu); @@ -1388,6 +1388,7 @@ struct task_struct *p; int trace = 0; long nr; + pid_t *target_pids = NULL; /* * Do some preliminary argument and permissions checking before we @@ -1428,7 +1429,7 @@ trace = tracehook_prepare_clone(clone_flags); p = copy_process(clone_flags, stack_start, regs, stack_size, - child_tidptr, NULL, trace); + child_tidptr, NULL, target_pids, trace); /* * Do this prior waking up the new thread - the thread pointer * might get invalid after that point, if the thread exits quickly. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 7/8]: Define do_fork_with_pids() [not found] ` <20090907211302.GA5892-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (5 preceding siblings ...) 2009-09-07 21:15 ` Sukadev Bhattiprolu @ 2009-09-07 21:16 ` Sukadev Bhattiprolu 2009-09-07 21:17 ` [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall Sukadev Bhattiprolu 7 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:16 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov Subject: [RFC][v5][PATCH 7/8]: Define do_fork_with_pids() do_fork_with_pids() is same as do_fork(), except that it takes an additional, 'pid_set', parameter. This parameter, currently unused, specifies the set of target pids of the process in each of its pid namespaces. Changelog[v4]: - Rename 'struct target_pid_set' to 'struct pid_set' since it may be useful in other contexts. Changelog[v3]: - Fix "long-line" warning from checkpatch.pl Changelog[v2]: - To facilitate moving architecture-inpdendent code to kernel/fork.c pass in 'struct target_pid_set __user *' to do_fork_with_pids() rather than 'pid_t *' (next patch moves the arch-independent code to kernel/fork.c) Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> --- include/linux/sched.h | 3 +++ include/linux/types.h | 5 +++++ kernel/fork.c | 16 ++++++++++++++-- 3 files changed, 22 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/sched.h =================================================================== --- linux-2.6.orig/include/linux/sched.h 2009-09-07 13:10:39.000000000 -0700 +++ linux-2.6/include/linux/sched.h 2009-09-07 13:13:33.000000000 -0700 @@ -2054,6 +2054,9 @@ extern int do_execve(char *, char __user * __user *, char __user * __user *, struct pt_regs *); extern long do_fork(unsigned long, unsigned long, struct pt_regs *, unsigned long, int __user *, int __user *); +extern long do_fork_with_pids(unsigned long, unsigned long, struct pt_regs *, + unsigned long, int __user *, int __user *, + struct pid_set __user *pid_set); struct task_struct *fork_idle(int); extern void set_task_comm(struct task_struct *tsk, char *from); Index: linux-2.6/include/linux/types.h =================================================================== --- linux-2.6.orig/include/linux/types.h 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/include/linux/types.h 2009-09-07 13:13:33.000000000 -0700 @@ -204,6 +204,11 @@ char f_fpack[6]; }; +struct pid_set { + int num_pids; + pid_t *pids; +}; + #endif /* __KERNEL__ */ #endif /* __ASSEMBLY__ */ #endif /* _LINUX_TYPES_H */ Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c 2009-09-07 13:13:33.000000000 -0700 +++ linux-2.6/kernel/fork.c 2009-09-07 13:13:33.000000000 -0700 @@ -1332,12 +1332,13 @@ * It copies the process, and if successful kick-starts * it and waits for it to finish using the VM if required. */ -long do_fork(unsigned long clone_flags, +long do_fork_with_pids(unsigned long clone_flags, unsigned long stack_start, struct pt_regs *regs, unsigned long stack_size, int __user *parent_tidptr, - int __user *child_tidptr) + int __user *child_tidptr, + struct pid_set __user *pid_setp) { struct task_struct *p; int trace = 0; @@ -1440,6 +1441,17 @@ return nr; } +long do_fork(unsigned long clone_flags, + unsigned long stack_start, + struct pt_regs *regs, + unsigned long stack_size, + int __user *parent_tidptr, + int __user *child_tidptr) +{ + return do_fork_with_pids(clone_flags, stack_start, regs, stack_size, + parent_tidptr, child_tidptr, NULL); +} + #ifndef ARCH_MIN_MMSTRUCT_ALIGN #define ARCH_MIN_MMSTRUCT_ALIGN 0 #endif ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall [not found] ` <20090907211302.GA5892-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (6 preceding siblings ...) 2009-09-07 21:16 ` [RFC][v5][PATCH 7/8]: Define do_fork_with_pids() Sukadev Bhattiprolu @ 2009-09-07 21:17 ` Sukadev Bhattiprolu 7 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:17 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov Subject: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall Container restart requires that a task have the same pid it had when it was checkpointed. When containers are nested the tasks within the containers exist in multiple pid namespaces and hence have multiple pids to specify during restart. clone_with_pids(), intended for use during restart, is the same as clone(), except that it takes a 'target_pid_set' paramter. This parameter lets caller choose specific pid numbers for the child process, in the process's active and ancestor pid namespaces. (Descendant pid namespaces in general don't matter since processes don't have pids in them anyway, but see comments in copy_target_pids() regarding CLONE_NEWPID). Unlike clone(), clone_with_pids() needs CAP_SYS_ADMIN, at least for now, to prevent unprivileged processes from misusing this interface. Call clone_with_pids as follows: pid_t pids[] = { 0, 77, 99 }; struct pid_set pid_set; pid_set.num_pids = sizeof(pids) / sizeof(int); pid_set.pids = &pids; syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set); If a target-pid is 0, the kernel continues to assign a pid for the process in that namespace. In the above example, pids[0] is 0, meaning the kernel will assign next available pid to the process in init_pid_ns. But kernel will assign pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either 77 or 99 are taken, the system call fails with -EBUSY. If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces, the system call fails with -EINVAL. Its mostly an exploratory patch seeking feedback on the interface. NOTE: Compared to clone(), clone_with_pids() needs to pass in two more pieces of information: - number of pids in the set - user buffer containing the list of pids. But since clone() already takes 5 parameters, use a 'struct pid_set'. TODO: - Gently tested. - May need additional sanity checks in do_fork_with_pids(). Changelog[v4]: - (Oren Laadan) rename 'struct target_pid_set' to 'struct pid_set' Changelog[v3]: - (Oren Laadan) Allow CLONE_NEWPID flag (by allocating an extra pid in the target_pids[] list and setting it 0. See copy_target_pids()). - (Oren Laadan) Specified target pids should apply only to youngest pid-namespaces (see copy_target_pids()) - (Matt Helsley) Update patch description. Changelog[v2]: - Remove unnecessary printk and add a note to callers of copy_target_pids() to free target_pids. - (Serge Hallyn) Mention CAP_SYS_ADMIN restriction in patch description. - (Oren Laadan) Add checks for 'num_pids < 0' (return -EINVAL) and 'num_pids == 0' (fall back to normal clone()). - Move arch-independent code (sanity checks and copy-in of target-pids) into kernel/fork.c and simplify sys_clone_with_pids() Changelog[v1]: - Fixed some compile errors (had fixed these errors earlier in my git tree but had not refreshed patches before emailing them) Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> --- arch/x86/include/asm/syscalls.h | 2 + arch/x86/include/asm/unistd_32.h | 1 + arch/x86/kernel/entry_32.S | 1 + arch/x86/kernel/process_32.c | 21 +++++++ arch/x86/kernel/syscall_table_32.S | 1 + kernel/fork.c | 108 +++++++++++++++++++++++++++++++++++- 6 files changed, 133 insertions(+), 1 deletions(-) Index: linux-2.6/arch/x86/include/asm/syscalls.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/syscalls.h 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/include/asm/syscalls.h 2009-09-07 13:13:42.000000000 -0700 @@ -40,6 +40,8 @@ /* kernel/process_32.c */ int sys_clone(struct pt_regs *); +int sys_clone_with_pids(struct pt_regs *); +int sys_vfork(struct pt_regs *); int sys_execve(struct pt_regs *); /* kernel/signal.c */ Index: linux-2.6/arch/x86/include/asm/unistd_32.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/unistd_32.h 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/include/asm/unistd_32.h 2009-09-07 13:15:05.000000000 -0700 @@ -342,6 +342,7 @@ #define __NR_pwritev 334 #define __NR_rt_tgsigqueueinfo 335 #define __NR_perf_counter_open 336 +#define __NR_clone_with_pids 337 #ifdef __KERNEL__ Index: linux-2.6/arch/x86/kernel/entry_32.S =================================================================== --- linux-2.6.orig/arch/x86/kernel/entry_32.S 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/kernel/entry_32.S 2009-09-07 13:13:42.000000000 -0700 @@ -718,6 +718,7 @@ PTREGSCALL(iopl) PTREGSCALL(fork) PTREGSCALL(clone) +PTREGSCALL(clone_with_pids) PTREGSCALL(vfork) PTREGSCALL(execve) PTREGSCALL(sigaltstack) Index: linux-2.6/arch/x86/kernel/process_32.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/process_32.c 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/kernel/process_32.c 2009-09-07 13:13:42.000000000 -0700 @@ -443,6 +443,27 @@ return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr); } +int sys_clone_with_pids(struct pt_regs *regs) +{ + unsigned long clone_flags; + unsigned long newsp; + int __user *parent_tidptr; + int __user *child_tidptr; + void __user *upid_setp; + + clone_flags = regs->bx; + newsp = regs->cx; + parent_tidptr = (int __user *)regs->dx; + child_tidptr = (int __user *)regs->di; + upid_setp = (void __user *)regs->bp; + + if (!newsp) + newsp = regs->sp; + + return do_fork_with_pids(clone_flags, newsp, regs, 0, parent_tidptr, + child_tidptr, upid_setp); +} + /* * sys_execve() executes a new program. */ Index: linux-2.6/arch/x86/kernel/syscall_table_32.S =================================================================== --- linux-2.6.orig/arch/x86/kernel/syscall_table_32.S 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/kernel/syscall_table_32.S 2009-09-07 13:23:20.000000000 -0700 @@ -336,3 +336,4 @@ .long sys_pwritev .long sys_rt_tgsigqueueinfo /* 335 */ .long sys_perf_counter_open + .long ptregs_clone_with_pids Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c 2009-09-07 13:13:33.000000000 -0700 +++ linux-2.6/kernel/fork.c 2009-09-07 13:13:42.000000000 -0700 @@ -1327,6 +1327,97 @@ } /* + * If user specified any 'target-pids' in @upid_setp, copy them from + * user and return a pointer to a local copy of the list of pids. The + * caller must free the list, when they are done using it. + * + * If user did not specify any target pids, return NULL (caller should + * treat this like normal clone). + * + * On any errors, return the error code + */ +static pid_t *copy_target_pids(void __user *upid_setp) +{ + int j; + int rc; + int size; + int unum_pids; /* # of pids specified by user */ + int knum_pids; /* # of pids needed in kernel */ + pid_t *target_pids; + struct pid_set pid_set; + + if (!upid_setp) + return NULL; + + rc = copy_from_user(&pid_set, upid_setp, sizeof(pid_set)); + if (rc) + return ERR_PTR(-EFAULT); + + unum_pids = pid_set.num_pids; + knum_pids = task_pid(current)->level + 1; + + if (!unum_pids) + return NULL; + + if (unum_pids < 0 || unum_pids > knum_pids) + return ERR_PTR(-EINVAL); + + /* + * To keep alloc_pid() simple, allocate an extra pid_t in target_pids[] + * and set it to 0. This last entry in target_pids[] corresponds to the + * (yet-to-be-created) descendant pid-namespace if CLONE_NEWPID was + * specified. If CLONE_NEWPID was not specified, this last entry will + * simply be ignored. + */ + target_pids = kzalloc((knum_pids + 1) * sizeof(pid_t), GFP_KERNEL); + if (!target_pids) + return ERR_PTR(-ENOMEM); + + /* + * A process running in a level 2 pid namespace has three pid namespaces + * and hence three pid numbers. If this process is checkpointed, + * information about these three namespaces are saved. We refer to these + * namespaces as 'known namespaces'. + * + * If this checkpointed process is however restarted in a level 3 pid + * namespace, the restarted process has an extra ancestor pid namespace + * (i.e 'unknown namespace') and 'knum_pids' exceeds 'unum_pids'. + * + * During restart, the process requests specific pids for its 'known + * namespaces' and lets kernel assign pids to its 'unknown namespaces'. + * + * Since the requested-pids correspond to 'known namespaces' and since + * 'known-namespaces' are younger than (i.e descendants of) 'unknown- + * namespaces', copy requested pids to the back-end of target_pids[] + * (i.e before the last entry for CLONE_NEWPID mentioned above). + * Any entries in target_pids[] not corresponding to a requested pid + * will be set to zero and kernel assigns a pid in those namespaces. + * + * NOTE: The order of pids in target_pids[] is oldest pid namespace to + * youngest (target_pids[0] corresponds to init_pid_ns). i.e. + * the order is: + * + * - pids for 'unknown-namespaces' (if any) + * - pids for 'known-namespaces' (requested pids) + * - 0 in the last entry (for CLONE_NEWPID). + */ + j = knum_pids - unum_pids; + size = unum_pids * sizeof(pid_t); + + rc = copy_from_user(&target_pids[j], pid_set.pids, size); + if (rc) { + rc = -EFAULT; + goto out_free; + } + + return target_pids; + +out_free: + kfree(target_pids); + return ERR_PTR(rc); +} + +/* * Ok, this is the main fork-routine. * * It copies the process, and if successful kick-starts @@ -1343,7 +1434,7 @@ struct task_struct *p; int trace = 0; long nr; - pid_t *target_pids = NULL; + pid_t *target_pids; /* * Do some preliminary argument and permissions checking before we @@ -1377,6 +1468,17 @@ } } + target_pids = copy_target_pids(pid_setp); + + if (target_pids) { + if (IS_ERR(target_pids)) + return PTR_ERR(target_pids); + + nr = -EPERM; + if (!capable(CAP_SYS_ADMIN)) + goto out_free; + } + /* * When called from kernel_thread, don't do user tracing stuff. */ @@ -1438,6 +1540,10 @@ } else { nr = PTR_ERR(p); } + +out_free: + kfree(target_pids); + return nr; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 7/8]: Define do_fork_with_pids() 2009-09-07 21:13 ` Sukadev Bhattiprolu ` (4 preceding siblings ...) (?) @ 2009-09-07 21:16 ` Sukadev Bhattiprolu -1 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:16 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev Subject: [RFC][v5][PATCH 7/8]: Define do_fork_with_pids() do_fork_with_pids() is same as do_fork(), except that it takes an additional, 'pid_set', parameter. This parameter, currently unused, specifies the set of target pids of the process in each of its pid namespaces. Changelog[v4]: - Rename 'struct target_pid_set' to 'struct pid_set' since it may be useful in other contexts. Changelog[v3]: - Fix "long-line" warning from checkpatch.pl Changelog[v2]: - To facilitate moving architecture-inpdendent code to kernel/fork.c pass in 'struct target_pid_set __user *' to do_fork_with_pids() rather than 'pid_t *' (next patch moves the arch-independent code to kernel/fork.c) Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: Oren Laadan <orenl@cs.columbia.edu> --- include/linux/sched.h | 3 +++ include/linux/types.h | 5 +++++ kernel/fork.c | 16 ++++++++++++++-- 3 files changed, 22 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/sched.h =================================================================== --- linux-2.6.orig/include/linux/sched.h 2009-09-07 13:10:39.000000000 -0700 +++ linux-2.6/include/linux/sched.h 2009-09-07 13:13:33.000000000 -0700 @@ -2054,6 +2054,9 @@ extern int do_execve(char *, char __user * __user *, char __user * __user *, struct pt_regs *); extern long do_fork(unsigned long, unsigned long, struct pt_regs *, unsigned long, int __user *, int __user *); +extern long do_fork_with_pids(unsigned long, unsigned long, struct pt_regs *, + unsigned long, int __user *, int __user *, + struct pid_set __user *pid_set); struct task_struct *fork_idle(int); extern void set_task_comm(struct task_struct *tsk, char *from); Index: linux-2.6/include/linux/types.h =================================================================== --- linux-2.6.orig/include/linux/types.h 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/include/linux/types.h 2009-09-07 13:13:33.000000000 -0700 @@ -204,6 +204,11 @@ char f_fpack[6]; }; +struct pid_set { + int num_pids; + pid_t *pids; +}; + #endif /* __KERNEL__ */ #endif /* __ASSEMBLY__ */ #endif /* _LINUX_TYPES_H */ Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c 2009-09-07 13:13:33.000000000 -0700 +++ linux-2.6/kernel/fork.c 2009-09-07 13:13:33.000000000 -0700 @@ -1332,12 +1332,13 @@ * It copies the process, and if successful kick-starts * it and waits for it to finish using the VM if required. */ -long do_fork(unsigned long clone_flags, +long do_fork_with_pids(unsigned long clone_flags, unsigned long stack_start, struct pt_regs *regs, unsigned long stack_size, int __user *parent_tidptr, - int __user *child_tidptr) + int __user *child_tidptr, + struct pid_set __user *pid_setp) { struct task_struct *p; int trace = 0; @@ -1440,6 +1441,17 @@ return nr; } +long do_fork(unsigned long clone_flags, + unsigned long stack_start, + struct pt_regs *regs, + unsigned long stack_size, + int __user *parent_tidptr, + int __user *child_tidptr) +{ + return do_fork_with_pids(clone_flags, stack_start, regs, stack_size, + parent_tidptr, child_tidptr, NULL); +} + #ifndef ARCH_MIN_MMSTRUCT_ALIGN #define ARCH_MIN_MMSTRUCT_ALIGN 0 #endif ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall 2009-09-07 21:13 ` Sukadev Bhattiprolu ` (5 preceding siblings ...) (?) @ 2009-09-07 21:17 ` Sukadev Bhattiprolu 2009-09-08 18:19 ` Nathan Lynch [not found] ` <20090907211700.GH6685-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> -1 siblings, 2 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-07 21:17 UTC (permalink / raw) To: linux-kernel Cc: serue, Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev Subject: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall Container restart requires that a task have the same pid it had when it was checkpointed. When containers are nested the tasks within the containers exist in multiple pid namespaces and hence have multiple pids to specify during restart. clone_with_pids(), intended for use during restart, is the same as clone(), except that it takes a 'target_pid_set' paramter. This parameter lets caller choose specific pid numbers for the child process, in the process's active and ancestor pid namespaces. (Descendant pid namespaces in general don't matter since processes don't have pids in them anyway, but see comments in copy_target_pids() regarding CLONE_NEWPID). Unlike clone(), clone_with_pids() needs CAP_SYS_ADMIN, at least for now, to prevent unprivileged processes from misusing this interface. Call clone_with_pids as follows: pid_t pids[] = { 0, 77, 99 }; struct pid_set pid_set; pid_set.num_pids = sizeof(pids) / sizeof(int); pid_set.pids = &pids; syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set); If a target-pid is 0, the kernel continues to assign a pid for the process in that namespace. In the above example, pids[0] is 0, meaning the kernel will assign next available pid to the process in init_pid_ns. But kernel will assign pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either 77 or 99 are taken, the system call fails with -EBUSY. If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces, the system call fails with -EINVAL. Its mostly an exploratory patch seeking feedback on the interface. NOTE: Compared to clone(), clone_with_pids() needs to pass in two more pieces of information: - number of pids in the set - user buffer containing the list of pids. But since clone() already takes 5 parameters, use a 'struct pid_set'. TODO: - Gently tested. - May need additional sanity checks in do_fork_with_pids(). Changelog[v4]: - (Oren Laadan) rename 'struct target_pid_set' to 'struct pid_set' Changelog[v3]: - (Oren Laadan) Allow CLONE_NEWPID flag (by allocating an extra pid in the target_pids[] list and setting it 0. See copy_target_pids()). - (Oren Laadan) Specified target pids should apply only to youngest pid-namespaces (see copy_target_pids()) - (Matt Helsley) Update patch description. Changelog[v2]: - Remove unnecessary printk and add a note to callers of copy_target_pids() to free target_pids. - (Serge Hallyn) Mention CAP_SYS_ADMIN restriction in patch description. - (Oren Laadan) Add checks for 'num_pids < 0' (return -EINVAL) and 'num_pids == 0' (fall back to normal clone()). - Move arch-independent code (sanity checks and copy-in of target-pids) into kernel/fork.c and simplify sys_clone_with_pids() Changelog[v1]: - Fixed some compile errors (had fixed these errors earlier in my git tree but had not refreshed patches before emailing them) Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> --- arch/x86/include/asm/syscalls.h | 2 + arch/x86/include/asm/unistd_32.h | 1 + arch/x86/kernel/entry_32.S | 1 + arch/x86/kernel/process_32.c | 21 +++++++ arch/x86/kernel/syscall_table_32.S | 1 + kernel/fork.c | 108 +++++++++++++++++++++++++++++++++++- 6 files changed, 133 insertions(+), 1 deletions(-) Index: linux-2.6/arch/x86/include/asm/syscalls.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/syscalls.h 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/include/asm/syscalls.h 2009-09-07 13:13:42.000000000 -0700 @@ -40,6 +40,8 @@ /* kernel/process_32.c */ int sys_clone(struct pt_regs *); +int sys_clone_with_pids(struct pt_regs *); +int sys_vfork(struct pt_regs *); int sys_execve(struct pt_regs *); /* kernel/signal.c */ Index: linux-2.6/arch/x86/include/asm/unistd_32.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/unistd_32.h 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/include/asm/unistd_32.h 2009-09-07 13:15:05.000000000 -0700 @@ -342,6 +342,7 @@ #define __NR_pwritev 334 #define __NR_rt_tgsigqueueinfo 335 #define __NR_perf_counter_open 336 +#define __NR_clone_with_pids 337 #ifdef __KERNEL__ Index: linux-2.6/arch/x86/kernel/entry_32.S =================================================================== --- linux-2.6.orig/arch/x86/kernel/entry_32.S 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/kernel/entry_32.S 2009-09-07 13:13:42.000000000 -0700 @@ -718,6 +718,7 @@ PTREGSCALL(iopl) PTREGSCALL(fork) PTREGSCALL(clone) +PTREGSCALL(clone_with_pids) PTREGSCALL(vfork) PTREGSCALL(execve) PTREGSCALL(sigaltstack) Index: linux-2.6/arch/x86/kernel/process_32.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/process_32.c 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/kernel/process_32.c 2009-09-07 13:13:42.000000000 -0700 @@ -443,6 +443,27 @@ return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr); } +int sys_clone_with_pids(struct pt_regs *regs) +{ + unsigned long clone_flags; + unsigned long newsp; + int __user *parent_tidptr; + int __user *child_tidptr; + void __user *upid_setp; + + clone_flags = regs->bx; + newsp = regs->cx; + parent_tidptr = (int __user *)regs->dx; + child_tidptr = (int __user *)regs->di; + upid_setp = (void __user *)regs->bp; + + if (!newsp) + newsp = regs->sp; + + return do_fork_with_pids(clone_flags, newsp, regs, 0, parent_tidptr, + child_tidptr, upid_setp); +} + /* * sys_execve() executes a new program. */ Index: linux-2.6/arch/x86/kernel/syscall_table_32.S =================================================================== --- linux-2.6.orig/arch/x86/kernel/syscall_table_32.S 2009-09-05 19:16:37.000000000 -0700 +++ linux-2.6/arch/x86/kernel/syscall_table_32.S 2009-09-07 13:23:20.000000000 -0700 @@ -336,3 +336,4 @@ .long sys_pwritev .long sys_rt_tgsigqueueinfo /* 335 */ .long sys_perf_counter_open + .long ptregs_clone_with_pids Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c 2009-09-07 13:13:33.000000000 -0700 +++ linux-2.6/kernel/fork.c 2009-09-07 13:13:42.000000000 -0700 @@ -1327,6 +1327,97 @@ } /* + * If user specified any 'target-pids' in @upid_setp, copy them from + * user and return a pointer to a local copy of the list of pids. The + * caller must free the list, when they are done using it. + * + * If user did not specify any target pids, return NULL (caller should + * treat this like normal clone). + * + * On any errors, return the error code + */ +static pid_t *copy_target_pids(void __user *upid_setp) +{ + int j; + int rc; + int size; + int unum_pids; /* # of pids specified by user */ + int knum_pids; /* # of pids needed in kernel */ + pid_t *target_pids; + struct pid_set pid_set; + + if (!upid_setp) + return NULL; + + rc = copy_from_user(&pid_set, upid_setp, sizeof(pid_set)); + if (rc) + return ERR_PTR(-EFAULT); + + unum_pids = pid_set.num_pids; + knum_pids = task_pid(current)->level + 1; + + if (!unum_pids) + return NULL; + + if (unum_pids < 0 || unum_pids > knum_pids) + return ERR_PTR(-EINVAL); + + /* + * To keep alloc_pid() simple, allocate an extra pid_t in target_pids[] + * and set it to 0. This last entry in target_pids[] corresponds to the + * (yet-to-be-created) descendant pid-namespace if CLONE_NEWPID was + * specified. If CLONE_NEWPID was not specified, this last entry will + * simply be ignored. + */ + target_pids = kzalloc((knum_pids + 1) * sizeof(pid_t), GFP_KERNEL); + if (!target_pids) + return ERR_PTR(-ENOMEM); + + /* + * A process running in a level 2 pid namespace has three pid namespaces + * and hence three pid numbers. If this process is checkpointed, + * information about these three namespaces are saved. We refer to these + * namespaces as 'known namespaces'. + * + * If this checkpointed process is however restarted in a level 3 pid + * namespace, the restarted process has an extra ancestor pid namespace + * (i.e 'unknown namespace') and 'knum_pids' exceeds 'unum_pids'. + * + * During restart, the process requests specific pids for its 'known + * namespaces' and lets kernel assign pids to its 'unknown namespaces'. + * + * Since the requested-pids correspond to 'known namespaces' and since + * 'known-namespaces' are younger than (i.e descendants of) 'unknown- + * namespaces', copy requested pids to the back-end of target_pids[] + * (i.e before the last entry for CLONE_NEWPID mentioned above). + * Any entries in target_pids[] not corresponding to a requested pid + * will be set to zero and kernel assigns a pid in those namespaces. + * + * NOTE: The order of pids in target_pids[] is oldest pid namespace to + * youngest (target_pids[0] corresponds to init_pid_ns). i.e. + * the order is: + * + * - pids for 'unknown-namespaces' (if any) + * - pids for 'known-namespaces' (requested pids) + * - 0 in the last entry (for CLONE_NEWPID). + */ + j = knum_pids - unum_pids; + size = unum_pids * sizeof(pid_t); + + rc = copy_from_user(&target_pids[j], pid_set.pids, size); + if (rc) { + rc = -EFAULT; + goto out_free; + } + + return target_pids; + +out_free: + kfree(target_pids); + return ERR_PTR(rc); +} + +/* * Ok, this is the main fork-routine. * * It copies the process, and if successful kick-starts @@ -1343,7 +1434,7 @@ struct task_struct *p; int trace = 0; long nr; - pid_t *target_pids = NULL; + pid_t *target_pids; /* * Do some preliminary argument and permissions checking before we @@ -1377,6 +1468,17 @@ } } + target_pids = copy_target_pids(pid_setp); + + if (target_pids) { + if (IS_ERR(target_pids)) + return PTR_ERR(target_pids); + + nr = -EPERM; + if (!capable(CAP_SYS_ADMIN)) + goto out_free; + } + /* * When called from kernel_thread, don't do user tracing stuff. */ @@ -1438,6 +1540,10 @@ } else { nr = PTR_ERR(p); } + +out_free: + kfree(target_pids); + return nr; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall 2009-09-07 21:17 ` [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall Sukadev Bhattiprolu @ 2009-09-08 18:19 ` Nathan Lynch [not found] ` <m3zl95l2oz.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org> [not found] ` <20090907211700.GH6685-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 32+ messages in thread From: Nathan Lynch @ 2009-09-08 18:19 UTC (permalink / raw) To: Sukadev Bhattiprolu Cc: linux-kernel, Containers, Eric W. Biederman, hpa, mingo, torvalds, Alexey Dobriyan, Pavel Emelyanov Suka, I ran across an issue with this on ppc64. > +static pid_t *copy_target_pids(void __user *upid_setp) > +{ > + int j; > + int rc; > + int size; > + int unum_pids; /* # of pids specified by user */ > + int knum_pids; /* # of pids needed in kernel */ > + pid_t *target_pids; > + struct pid_set pid_set; > + > + if (!upid_setp) > + return NULL; > + > + rc = copy_from_user(&pid_set, upid_setp, sizeof(pid_set)); This doesn't work on a 64-bit kernel when the process is 32-bit and uses the definition of struct pid_set provided in types.h: +struct pid_set { + int num_pids; + pid_t *pids; +}; Shouldn't the pids field be u64 or some other type of fixed size? ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <m3zl95l2oz.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall 2009-09-08 18:19 ` Nathan Lynch @ 2009-09-09 12:19 ` Arnd Bergmann 0 siblings, 0 replies; 32+ messages in thread From: Arnd Bergmann @ 2009-09-09 12:19 UTC (permalink / raw) To: Nathan Lynch Cc: Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, Sukadev Bhattiprolu, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov On Tuesday 08 September 2009, Nathan Lynch wrote: > This doesn't work on a 64-bit kernel when the process is 32-bit and uses > the definition of struct pid_set provided in types.h: > > +struct pid_set { > + int num_pids; > + pid_t *pids; > +}; > > Shouldn't the pids field be u64 or some other type of fixed size? This is a complex problem. The structure above would need a conversion for the pointer size that you can avoid by using a u64, but that introduces another problem: struct pid_set { int num_pids; u64 pidp; }; Has implicit padding between the two members on all 64 bit architectures, but not on i386, so you would still need a conversion (not for s390, power, mips, sparc or parisc though, only for x86). I can see two solutions for this: 1. use separate system call arguments for num_pids and pidp. This avoids the data structure and saves one copy_from_user call, at the cost of adding another argument to the syscall. syscalls with more than 6 arguments are somewhat problematic as well. 2. use a single pointer, with variable length data structures: struct pid_set { int num_pids; pid_t pids[0]; }; Since pid_t is always an int, you have no problem with padding or incompatible types, but rely on a data structure definition that is not in C89 (not sure about C99). Arnd <>< ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall @ 2009-09-09 12:19 ` Arnd Bergmann 0 siblings, 0 replies; 32+ messages in thread From: Arnd Bergmann @ 2009-09-09 12:19 UTC (permalink / raw) To: Nathan Lynch Cc: Sukadev Bhattiprolu, linux-kernel, Containers, Eric W. Biederman, hpa, mingo, torvalds, Alexey Dobriyan, Pavel Emelyanov On Tuesday 08 September 2009, Nathan Lynch wrote: > This doesn't work on a 64-bit kernel when the process is 32-bit and uses > the definition of struct pid_set provided in types.h: > > +struct pid_set { > + int num_pids; > + pid_t *pids; > +}; > > Shouldn't the pids field be u64 or some other type of fixed size? This is a complex problem. The structure above would need a conversion for the pointer size that you can avoid by using a u64, but that introduces another problem: struct pid_set { int num_pids; u64 pidp; }; Has implicit padding between the two members on all 64 bit architectures, but not on i386, so you would still need a conversion (not for s390, power, mips, sparc or parisc though, only for x86). I can see two solutions for this: 1. use separate system call arguments for num_pids and pidp. This avoids the data structure and saves one copy_from_user call, at the cost of adding another argument to the syscall. syscalls with more than 6 arguments are somewhat problematic as well. 2. use a single pointer, with variable length data structures: struct pid_set { int num_pids; pid_t pids[0]; }; Since pid_t is always an int, you have no problem with padding or incompatible types, but rely on a data structure definition that is not in C89 (not sure about C99). Arnd <>< ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <200909091419.50496.arnd-r2nGTMty4D4@public.gmane.org>]
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall 2009-09-09 12:19 ` Arnd Bergmann @ 2009-09-09 15:51 ` H. Peter Anvin -1 siblings, 0 replies; 32+ messages in thread From: H. Peter Anvin @ 2009-09-09 15:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nathan Lynch, Eric W. Biederman, mingo-X9Un+BFzKDI, Sukadev Bhattiprolu, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov On 09/09/2009 05:19 AM, Arnd Bergmann wrote: > > This is a complex problem. The structure above would need a conversion > for the pointer size that you can avoid by using a u64, but that introduces > another problem: > > 2. use a single pointer, with variable length data structures: > > struct pid_set { > int num_pids; > pid_t pids[0]; > }; > > Since pid_t is always an int, you have no problem with padding or > incompatible types, but rely on a data structure definition that > is not in C89 (not sure about C99). > C90 has these data structures, but you have to give the array a nonzero length: struct pid_set { int num_pids; pid_t pids[1]; }; In C99, this is spelt: struct pid_set { int num_pids; pid_t pids[]; }; -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall @ 2009-09-09 15:51 ` H. Peter Anvin 0 siblings, 0 replies; 32+ messages in thread From: H. Peter Anvin @ 2009-09-09 15:51 UTC (permalink / raw) To: Arnd Bergmann Cc: Nathan Lynch, Sukadev Bhattiprolu, linux-kernel, Containers, Eric W. Biederman, mingo, torvalds, Alexey Dobriyan, Pavel Emelyanov On 09/09/2009 05:19 AM, Arnd Bergmann wrote: > > This is a complex problem. The structure above would need a conversion > for the pointer size that you can avoid by using a u64, but that introduces > another problem: > > 2. use a single pointer, with variable length data structures: > > struct pid_set { > int num_pids; > pid_t pids[0]; > }; > > Since pid_t is always an int, you have no problem with padding or > incompatible types, but rely on a data structure definition that > is not in C89 (not sure about C99). > C90 has these data structures, but you have to give the array a nonzero length: struct pid_set { int num_pids; pid_t pids[1]; }; In C99, this is spelt: struct pid_set { int num_pids; pid_t pids[]; }; -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <4AA7CF1F.3020408-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall [not found] ` <4AA7CF1F.3020408-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> @ 2009-09-09 18:03 ` Sukadev Bhattiprolu 0 siblings, 0 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-09 18:03 UTC (permalink / raw) To: H. Peter Anvin Cc: Arnd Bergmann, Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nathan Lynch, Eric W. Biederman, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov H. Peter Anvin [hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org] wrote: | On 09/09/2009 05:19 AM, Arnd Bergmann wrote: | > | > This is a complex problem. The structure above would need a conversion | > for the pointer size that you can avoid by using a u64, but that introduces | > another problem: | > | > 2. use a single pointer, with variable length data structures: | > | > struct pid_set { | > int num_pids; | > pid_t pids[0]; | > }; | > | > Since pid_t is always an int, you have no problem with padding or | > incompatible types, but rely on a data structure definition that | > is not in C89 (not sure about C99). C90 or C99 below should work. Is it ok to use a data structure that is not in C89 ? BTW, would it work if we defined struct pid_set { u64 pids; int num_pids; } where ->pids can be still be a pointer ? The data structure would have the same size on all architectures. Thanks for the input Sukadev ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall 2009-09-09 15:51 ` H. Peter Anvin (?) (?) @ 2009-09-09 18:03 ` Sukadev Bhattiprolu 2009-09-09 18:01 ` H. Peter Anvin ` (2 more replies) -1 siblings, 3 replies; 32+ messages in thread From: Sukadev Bhattiprolu @ 2009-09-09 18:03 UTC (permalink / raw) To: H. Peter Anvin Cc: Arnd Bergmann, Nathan Lynch, linux-kernel, Containers, Eric W. Biederman, mingo, torvalds, Alexey Dobriyan, Pavel Emelyanov H. Peter Anvin [hpa@zytor.com] wrote: | On 09/09/2009 05:19 AM, Arnd Bergmann wrote: | > | > This is a complex problem. The structure above would need a conversion | > for the pointer size that you can avoid by using a u64, but that introduces | > another problem: | > | > 2. use a single pointer, with variable length data structures: | > | > struct pid_set { | > int num_pids; | > pid_t pids[0]; | > }; | > | > Since pid_t is always an int, you have no problem with padding or | > incompatible types, but rely on a data structure definition that | > is not in C89 (not sure about C99). C90 or C99 below should work. Is it ok to use a data structure that is not in C89 ? BTW, would it work if we defined struct pid_set { u64 pids; int num_pids; } where ->pids can be still be a pointer ? The data structure would have the same size on all architectures. Thanks for the input Sukadev ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall 2009-09-09 18:03 ` Sukadev Bhattiprolu @ 2009-09-09 18:01 ` H. Peter Anvin 2009-09-09 18:34 ` Linus Torvalds [not found] ` <20090909180303.GA21048-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2 siblings, 0 replies; 32+ messages in thread From: H. Peter Anvin @ 2009-09-09 18:01 UTC (permalink / raw) To: Sukadev Bhattiprolu Cc: Arnd Bergmann, Nathan Lynch, linux-kernel, Containers, Eric W. Biederman, mingo, torvalds, Alexey Dobriyan, Pavel Emelyanov On 09/09/2009 11:03 AM, Sukadev Bhattiprolu wrote: > > C90 or C99 below should work. Is it ok to use a data structure that is > not in C89 ? > C89 is the same as C90 (C89 refers to the ANSI standard, C90 to the ISO standard, but they're functionally identical.) > BTW, would it work if we defined > > struct pid_set { > u64 pids; > int num_pids; > } > > where ->pids can be still be a pointer ? The data structure would > have the same size on all architectures. > It would rather suck in terms of usability, though. -hpa ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall 2009-09-09 18:03 ` Sukadev Bhattiprolu 2009-09-09 18:01 ` H. Peter Anvin @ 2009-09-09 18:34 ` Linus Torvalds [not found] ` <20090909180303.GA21048-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2 siblings, 0 replies; 32+ messages in thread From: Linus Torvalds @ 2009-09-09 18:34 UTC (permalink / raw) To: Sukadev Bhattiprolu Cc: H. Peter Anvin, Arnd Bergmann, Nathan Lynch, linux-kernel, Containers, Eric W. Biederman, mingo, Alexey Dobriyan, Pavel Emelyanov On Wed, 9 Sep 2009, Sukadev Bhattiprolu wrote: > > BTW, would it work if we defined > > struct pid_set { > u64 pids; > int num_pids; > } > > where ->pids can be still be a pointer ? The data structure would > have the same size on all architectures. I don't think that's all that great. Just go with the C90 version, we already have that thing in the kernel, and struct pid_set { int num_pids; pid_t pids[]; }; looks simple and straightforward. And it even makes your example simpler, doesn't it? Ie now it's just struct pid_set pids = { 3, { 0, 97, 99 } }; and gcc should do the right thing. (Of course, in any real case it would be dynamically allocated, but whatever). Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <20090909180303.GA21048-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall [not found] ` <20090909180303.GA21048-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2009-09-09 18:01 ` H. Peter Anvin 2009-09-09 18:34 ` Linus Torvalds 1 sibling, 0 replies; 32+ messages in thread From: H. Peter Anvin @ 2009-09-09 18:01 UTC (permalink / raw) To: Sukadev Bhattiprolu Cc: Arnd Bergmann, Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nathan Lynch, Eric W. Biederman, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Pavel Emelyanov On 09/09/2009 11:03 AM, Sukadev Bhattiprolu wrote: > > C90 or C99 below should work. Is it ok to use a data structure that is > not in C89 ? > C89 is the same as C90 (C89 refers to the ANSI standard, C90 to the ISO standard, but they're functionally identical.) > BTW, would it work if we defined > > struct pid_set { > u64 pids; > int num_pids; > } > > where ->pids can be still be a pointer ? The data structure would > have the same size on all architectures. > It would rather suck in terms of usability, though. -hpa ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall [not found] ` <20090909180303.GA21048-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2009-09-09 18:01 ` H. Peter Anvin @ 2009-09-09 18:34 ` Linus Torvalds 1 sibling, 0 replies; 32+ messages in thread From: Linus Torvalds @ 2009-09-09 18:34 UTC (permalink / raw) To: Sukadev Bhattiprolu Cc: Arnd Bergmann, Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nathan Lynch, Eric W. Biederman, H. Peter Anvin, mingo-X9Un+BFzKDI, Alexey Dobriyan, Pavel Emelyanov On Wed, 9 Sep 2009, Sukadev Bhattiprolu wrote: > > BTW, would it work if we defined > > struct pid_set { > u64 pids; > int num_pids; > } > > where ->pids can be still be a pointer ? The data structure would > have the same size on all architectures. I don't think that's all that great. Just go with the C90 version, we already have that thing in the kernel, and struct pid_set { int num_pids; pid_t pids[]; }; looks simple and straightforward. And it even makes your example simpler, doesn't it? Ie now it's just struct pid_set pids = { 3, { 0, 97, 99 } }; and gcc should do the right thing. (Of course, in any real case it would be dynamically allocated, but whatever). Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <20090907211700.GH6685-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall [not found] ` <20090907211700.GH6685-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2009-09-08 18:19 ` Nathan Lynch 0 siblings, 0 replies; 32+ messages in thread From: Nathan Lynch @ 2009-09-08 18:19 UTC (permalink / raw) To: Sukadev Bhattiprolu Cc: Containers, Pavel-FOgKQjlUJ6BQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan, Emelyanov Suka, I ran across an issue with this on ppc64. > +static pid_t *copy_target_pids(void __user *upid_setp) > +{ > + int j; > + int rc; > + int size; > + int unum_pids; /* # of pids specified by user */ > + int knum_pids; /* # of pids needed in kernel */ > + pid_t *target_pids; > + struct pid_set pid_set; > + > + if (!upid_setp) > + return NULL; > + > + rc = copy_from_user(&pid_set, upid_setp, sizeof(pid_set)); This doesn't work on a 64-bit kernel when the process is 32-bit and uses the definition of struct pid_set provided in types.h: +struct pid_set { + int num_pids; + pid_t *pids; +}; Shouldn't the pids field be u64 or some other type of fixed size? ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2009-09-09 18:35 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-07 21:13 [RFC][v5] clone_with_pids() system call Sukadev Bhattiprolu
2009-09-07 21:13 ` Sukadev Bhattiprolu
2009-09-07 21:13 ` [RFC][v5][PATCH 1/8]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
2009-09-07 21:14 ` [RFC][v5][PATCH 3/8] Make pid_max a pid_ns property Sukadev Bhattiprolu
2009-09-07 21:15 ` [RFC][v5][PATCH 5/8]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
[not found] ` <20090907211302.GA5892-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-07 21:13 ` [RFC][v5][PATCH 1/8]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
2009-09-07 21:14 ` [RFC][v5][PATCH 2/8]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
2009-09-07 21:14 ` Sukadev Bhattiprolu
2009-09-07 21:14 ` [RFC][v5][PATCH 3/8] Make pid_max a pid_ns property Sukadev Bhattiprolu
2009-09-07 21:15 ` [RFC][v5][PATCH 4/8]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
2009-09-07 21:15 ` Sukadev Bhattiprolu
[not found] ` <20090907211506.GD6685-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-08 14:16 ` Serge E. Hallyn
2009-09-08 14:16 ` Serge E. Hallyn
2009-09-07 21:15 ` [RFC][v5][PATCH 5/8]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
2009-09-07 21:15 ` [RFC][v5][PATCH 6/8]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-09-07 21:15 ` Sukadev Bhattiprolu
2009-09-07 21:16 ` [RFC][v5][PATCH 7/8]: Define do_fork_with_pids() Sukadev Bhattiprolu
2009-09-07 21:17 ` [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall Sukadev Bhattiprolu
2009-09-07 21:16 ` [RFC][v5][PATCH 7/8]: Define do_fork_with_pids() Sukadev Bhattiprolu
2009-09-07 21:17 ` [RFC][v5][PATCH 8/8]: Define clone_with_pids() syscall Sukadev Bhattiprolu
2009-09-08 18:19 ` Nathan Lynch
[not found] ` <m3zl95l2oz.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-09-09 12:19 ` Arnd Bergmann
2009-09-09 12:19 ` Arnd Bergmann
[not found] ` <200909091419.50496.arnd-r2nGTMty4D4@public.gmane.org>
2009-09-09 15:51 ` H. Peter Anvin
2009-09-09 15:51 ` H. Peter Anvin
[not found] ` <4AA7CF1F.3020408-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-09-09 18:03 ` Sukadev Bhattiprolu
2009-09-09 18:03 ` Sukadev Bhattiprolu
2009-09-09 18:01 ` H. Peter Anvin
2009-09-09 18:34 ` Linus Torvalds
[not found] ` <20090909180303.GA21048-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-09 18:01 ` H. Peter Anvin
2009-09-09 18:34 ` Linus Torvalds
[not found] ` <20090907211700.GH6685-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-08 18:19 ` Nathan Lynch
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.