* [RFC][PATCH 0/4] Object creation with a specified id
@ 2008-04-04 14:51 Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next id Nadia.Derbey-6ktuUTfB/bM
` (9 more replies)
0 siblings, 10 replies; 31+ messages in thread
From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, orenl
Hi,
When restarting a process that has been previously checkpointed, that process
should keep on using some of its ids (such as its process id, or sysV ipc ids).
This patch provides a feature that can help ensuring this saved state reuse:
it makes it possible to create an object with a pre-defined id.
A first implementation had been proposed 2 months ago. It consisted in
changing an object's id after it had been created.
Here is a second implementation based on Oren Ladaan's idea: Oren's suggestion
was to force an object's id during its creation, rather than 1. create it,
2. change its id.
A new file is created in procfs: /proc/self/next_id.
When this file is filled with and id value, a structure pointed to by the
calling task struct is filled with that id.
Then, when an object supporting this feature is created, the id present in
that new structure is used, instead of the default one.
The syntax is one of:
. echo "LONG XX" > /proc/self/next_id
next object to be created will have an id set to XX
. echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
next object to be created will have its ids set to XX0, ... X<n-1>
This is particularly useful for processes that may have several ids if
they belong to nested namespaces.
The objects covered here are ipc objects and processes.
Today, the ids are specified as long, but having a type string specified in
the next_id file makes it possible to cover more types in the future, if
needed.
The patches are against 2.6.25-rc3-mm1, in the following order:
[PATCH 1/4] adds the procfs facility for next object to be created, this
object being associated to a single id.
[PATCH 2/4] enhances the procfs facility for objects associated to multiple
ids (like processes).
[PATCH 3/4] makes use of the specified id (if any) to allocate the new IPC
object (changes the ipc_addid() path).
[PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) for a newly
allocated process (changes the alloc_pid()/alloc_pidmap() paths).
Any comment and/or suggestions are welcome.
Regards,
Nadia
--
--
^ permalink raw reply [flat|nested] 31+ messages in thread* [RFC][PATCH 1/4] Provide a new procfs interface to set next id 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey @ 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM 2008-04-04 14:51 ` Nadia.Derbey ` (8 subsequent siblings) 9 siblings, 0 replies; 31+ messages in thread From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nadia Derbey [-- Attachment #1: proc_set_next_id.patch --] [-- Type: text/plain, Size: 8855 bytes --] [PATCH 01/04] This patch proposes the procfs facilities needed to feed the id for the next object to be allocated. if an echo "LONG XX" > /proc/self/next_id is issued, next object to be created will have XX as its id. This applies to objects that need a single id, such as ipc objects. Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org> --- fs/exec.c | 3 + fs/proc/base.c | 73 +++++++++++++++++++++++++++++++++++++++++ include/linux/sched.h | 3 + include/linux/sysids.h | 24 +++++++++++++ kernel/Makefile | 2 - kernel/exit.c | 4 ++ kernel/fork.c | 2 + kernel/nextid.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++ 8 files changed, 196 insertions(+), 1 deletion(-) Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 13:53:04.000000000 +0200 @@ -0,0 +1,24 @@ +/* + * include/linux/sysids.h + * + * Definitions to support object creation with predefined id. + * + */ + +#ifndef _LINUX_SYSIDS_H +#define _LINUX_SYSIDS_H + +struct sys_id { + long id; +}; + +extern ssize_t get_nextid(struct task_struct *, char *, size_t); +extern int set_nextid(struct task_struct *, char *); +extern int reset_nextid(struct task_struct *); + +static inline void exit_nextid(struct task_struct *tsk) +{ + reset_nextid(tsk); +} + +#endif /* _LINUX_SYSIDS_H */ Index: linux-2.6.25-rc8-mm1/include/linux/sched.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/sched.h 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/include/linux/sched.h 2008-04-04 13:55:10.000000000 +0200 @@ -88,6 +88,7 @@ struct sched_param { #include <linux/task_io_accounting.h> #include <linux/kobject.h> #include <linux/latencytop.h> +#include <linux/sysids.h> #include <asm/processor.h> @@ -1278,6 +1279,8 @@ struct task_struct { int latency_record_count; struct latency_record latency_record[LT_SAVECOUNT]; #endif + /* Id to assign to the next resource to be created */ + struct sys_id *next_id; }; /* Index: linux-2.6.25-rc8-mm1/fs/proc/base.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/fs/proc/base.c 2008-04-04 13:11:35.000000000 +0200 +++ linux-2.6.25-rc8-mm1/fs/proc/base.c 2008-04-04 13:57:18.000000000 +0200 @@ -1138,6 +1138,77 @@ static const struct file_operations proc #endif +static ssize_t next_id_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + char *page; + ssize_t length; + + task = get_proc_task(file->f_path.dentry->d_inode); + if (!task) + return -ESRCH; + + if (count >= PAGE_SIZE) + count = PAGE_SIZE - 1; + + length = -ENOMEM; + page = (char *) __get_free_page(GFP_TEMPORARY); + if (!page) + goto out; + + length = get_nextid(task, (char *) page, count); + if (length >= 0) + length = simple_read_from_buffer(buf, count, ppos, + (char *)page, length); + free_page((unsigned long) page); + +out: + put_task_struct(task); + return length; +} + +static ssize_t next_id_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct inode *inode = file->f_path.dentry->d_inode; + char *page; + ssize_t length; + + if (pid_task(proc_pid(inode), PIDTYPE_PID) != current) + return -EPERM; + + if (count >= PAGE_SIZE) + count = PAGE_SIZE - 1; + + if (*ppos != 0) { + /* No partial writes. */ + return -EINVAL; + } + page = (char *)__get_free_page(GFP_TEMPORARY); + if (!page) + return -ENOMEM; + length = -EFAULT; + if (copy_from_user(page, buf, count)) + goto out_free_page; + + page[count] = '\0'; + + length = set_nextid(current, page); + if (!length) + length = count; + +out_free_page: + free_page((unsigned long) page); + return length; +} + +static const struct file_operations proc_next_id_operations = { + .read = next_id_read, + .write = next_id_write, +}; + + #ifdef CONFIG_SCHED_DEBUG /* * Print out various scheduling related per-task fields: @@ -2453,6 +2524,7 @@ static const struct pid_entry tgid_base_ #ifdef CONFIG_TASK_IO_ACCOUNTING INF("io", S_IRUGO, pid_io_accounting), #endif + REG("next_id", S_IRUGO|S_IWUSR, next_id), }; static int proc_tgid_base_readdir(struct file * filp, @@ -2779,6 +2851,7 @@ static const struct pid_entry tid_base_s #ifdef CONFIG_FAULT_INJECTION REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject), #endif + REG("next_id", S_IRUGO|S_IWUSR, next_id), }; static int proc_tid_base_readdir(struct file * filp, Index: linux-2.6.25-rc8-mm1/kernel/Makefile =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/Makefile 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/Makefile 2008-04-04 13:58:22.000000000 +0200 @@ -9,7 +9,7 @@ obj-y = sched.o fork.o exec_domain.o rcupdate.o extable.o params.o posix-timers.o \ kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \ hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \ - notifier.o ksysfs.o pm_qos_params.o + notifier.o ksysfs.o pm_qos_params.o nextid.o obj-$(CONFIG_SYSCTL_SYSCALL_CHECK) += sysctl_check.o obj-$(CONFIG_STACKTRACE) += stacktrace.o Index: linux-2.6.25-rc8-mm1/kernel/nextid.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 13:59:59.000000000 +0200 @@ -0,0 +1,86 @@ +/* + * linux/kernel/nextid.c + * + * + * Provide the get_nextid() / set_nextid() routines + * (called from fs/proc/base.c). + * They allow to specify the id for the next resource to be allocated, + * instead of letting the allocator set it for us. + */ + +#include <linux/sched.h> +#include <linux/ctype.h> + + + +ssize_t get_nextid(struct task_struct *task, char *buffer, size_t size) +{ + struct sys_id *sid; + + sid = task->next_id; + if (!sid) + return snprintf(buffer, size, "UNSET\n"); + + return snprintf(buffer, size, "LONG %ld\n", sid->id); +} + +static int set_single_id(struct task_struct *task, char *buffer) +{ + struct sys_id *sid; + long next_id; + char *end; + + next_id = simple_strtol(buffer, &end, 0); + if (end == buffer || (end && !isspace(*end))) + return -EINVAL; + + sid = task->next_id; + if (!sid) { + sid = kzalloc(sizeof(*sid), GFP_KERNEL); + if (!sid) + return -ENOMEM; + task->next_id = sid; + } + + sid->id = next_id; + + return 0; +} + +int reset_nextid(struct task_struct *task) +{ + struct sys_id *sid; + + sid = task->next_id; + if (!sid) + return 0; + + task->next_id = NULL; + kfree(sid); + return 0; +} + +#define LONG_STR "LONG" +#define RESET_STR "RESET" + +/* + * Parses a line written to /proc/self/next_id. + * this line has the following format: + * LONG id --> a single id is specified + */ +int set_nextid(struct task_struct *task, char *buffer) +{ + char *token, *out = buffer; + + if (!out) + return -EINVAL; + + token = strsep(&out, " "); + + if (!strcmp(token, LONG_STR)) + return set_single_id(task, out); + else if (!strncmp(token, RESET_STR, strlen(RESET_STR))) + return reset_nextid(task); + else + return -EINVAL; +} Index: linux-2.6.25-rc8-mm1/kernel/fork.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/fork.c 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/fork.c 2008-04-04 14:00:35.000000000 +0200 @@ -1167,6 +1167,8 @@ static struct task_struct *copy_process( p->blocked_on = NULL; /* not blocked yet */ #endif + p->next_id = NULL; + /* Perform scheduler related setup. Assign this task to a CPU. */ sched_fork(p, clone_flags); Index: linux-2.6.25-rc8-mm1/kernel/exit.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/exit.c 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/exit.c 2008-04-04 14:01:22.000000000 +0200 @@ -987,6 +987,10 @@ NORET_TYPE void do_exit(long code) proc_exit_connector(tsk); exit_notify(tsk, group_dead); + + if (unlikely(tsk->next_id)) + exit_nextid(tsk); + #ifdef CONFIG_NUMA mpol_free(tsk->mempolicy); tsk->mempolicy = NULL; Index: linux-2.6.25-rc8-mm1/fs/exec.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/fs/exec.c 2008-04-04 13:11:34.000000000 +0200 +++ linux-2.6.25-rc8-mm1/fs/exec.c 2008-04-04 14:02:09.000000000 +0200 @@ -1024,6 +1024,9 @@ int flush_old_exec(struct linux_binprm * flush_signal_handlers(current, 0); flush_old_files(current->files); + if (unlikely(current->next_id)) + reset_nextid(current); + return 0; mmap_failed: -- ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 1/4] Provide a new procfs interface to set next id 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey 2008-04-04 14:51 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next id Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 ` Nadia.Derbey 2008-04-04 14:51 ` [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) Nadia.Derbey-6ktuUTfB/bM ` (7 subsequent siblings) 9 siblings, 0 replies; 31+ messages in thread From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw) To: linux-kernel; +Cc: containers, orenl, Nadia Derbey [-- Attachment #1: proc_set_next_id.patch --] [-- Type: text/plain, Size: 8835 bytes --] [PATCH 01/04] This patch proposes the procfs facilities needed to feed the id for the next object to be allocated. if an echo "LONG XX" > /proc/self/next_id is issued, next object to be created will have XX as its id. This applies to objects that need a single id, such as ipc objects. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> --- fs/exec.c | 3 + fs/proc/base.c | 73 +++++++++++++++++++++++++++++++++++++++++ include/linux/sched.h | 3 + include/linux/sysids.h | 24 +++++++++++++ kernel/Makefile | 2 - kernel/exit.c | 4 ++ kernel/fork.c | 2 + kernel/nextid.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++ 8 files changed, 196 insertions(+), 1 deletion(-) Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 13:53:04.000000000 +0200 @@ -0,0 +1,24 @@ +/* + * include/linux/sysids.h + * + * Definitions to support object creation with predefined id. + * + */ + +#ifndef _LINUX_SYSIDS_H +#define _LINUX_SYSIDS_H + +struct sys_id { + long id; +}; + +extern ssize_t get_nextid(struct task_struct *, char *, size_t); +extern int set_nextid(struct task_struct *, char *); +extern int reset_nextid(struct task_struct *); + +static inline void exit_nextid(struct task_struct *tsk) +{ + reset_nextid(tsk); +} + +#endif /* _LINUX_SYSIDS_H */ Index: linux-2.6.25-rc8-mm1/include/linux/sched.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/sched.h 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/include/linux/sched.h 2008-04-04 13:55:10.000000000 +0200 @@ -88,6 +88,7 @@ struct sched_param { #include <linux/task_io_accounting.h> #include <linux/kobject.h> #include <linux/latencytop.h> +#include <linux/sysids.h> #include <asm/processor.h> @@ -1278,6 +1279,8 @@ struct task_struct { int latency_record_count; struct latency_record latency_record[LT_SAVECOUNT]; #endif + /* Id to assign to the next resource to be created */ + struct sys_id *next_id; }; /* Index: linux-2.6.25-rc8-mm1/fs/proc/base.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/fs/proc/base.c 2008-04-04 13:11:35.000000000 +0200 +++ linux-2.6.25-rc8-mm1/fs/proc/base.c 2008-04-04 13:57:18.000000000 +0200 @@ -1138,6 +1138,77 @@ static const struct file_operations proc #endif +static ssize_t next_id_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + char *page; + ssize_t length; + + task = get_proc_task(file->f_path.dentry->d_inode); + if (!task) + return -ESRCH; + + if (count >= PAGE_SIZE) + count = PAGE_SIZE - 1; + + length = -ENOMEM; + page = (char *) __get_free_page(GFP_TEMPORARY); + if (!page) + goto out; + + length = get_nextid(task, (char *) page, count); + if (length >= 0) + length = simple_read_from_buffer(buf, count, ppos, + (char *)page, length); + free_page((unsigned long) page); + +out: + put_task_struct(task); + return length; +} + +static ssize_t next_id_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct inode *inode = file->f_path.dentry->d_inode; + char *page; + ssize_t length; + + if (pid_task(proc_pid(inode), PIDTYPE_PID) != current) + return -EPERM; + + if (count >= PAGE_SIZE) + count = PAGE_SIZE - 1; + + if (*ppos != 0) { + /* No partial writes. */ + return -EINVAL; + } + page = (char *)__get_free_page(GFP_TEMPORARY); + if (!page) + return -ENOMEM; + length = -EFAULT; + if (copy_from_user(page, buf, count)) + goto out_free_page; + + page[count] = '\0'; + + length = set_nextid(current, page); + if (!length) + length = count; + +out_free_page: + free_page((unsigned long) page); + return length; +} + +static const struct file_operations proc_next_id_operations = { + .read = next_id_read, + .write = next_id_write, +}; + + #ifdef CONFIG_SCHED_DEBUG /* * Print out various scheduling related per-task fields: @@ -2453,6 +2524,7 @@ static const struct pid_entry tgid_base_ #ifdef CONFIG_TASK_IO_ACCOUNTING INF("io", S_IRUGO, pid_io_accounting), #endif + REG("next_id", S_IRUGO|S_IWUSR, next_id), }; static int proc_tgid_base_readdir(struct file * filp, @@ -2779,6 +2851,7 @@ static const struct pid_entry tid_base_s #ifdef CONFIG_FAULT_INJECTION REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject), #endif + REG("next_id", S_IRUGO|S_IWUSR, next_id), }; static int proc_tid_base_readdir(struct file * filp, Index: linux-2.6.25-rc8-mm1/kernel/Makefile =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/Makefile 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/Makefile 2008-04-04 13:58:22.000000000 +0200 @@ -9,7 +9,7 @@ obj-y = sched.o fork.o exec_domain.o rcupdate.o extable.o params.o posix-timers.o \ kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \ hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \ - notifier.o ksysfs.o pm_qos_params.o + notifier.o ksysfs.o pm_qos_params.o nextid.o obj-$(CONFIG_SYSCTL_SYSCALL_CHECK) += sysctl_check.o obj-$(CONFIG_STACKTRACE) += stacktrace.o Index: linux-2.6.25-rc8-mm1/kernel/nextid.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 13:59:59.000000000 +0200 @@ -0,0 +1,86 @@ +/* + * linux/kernel/nextid.c + * + * + * Provide the get_nextid() / set_nextid() routines + * (called from fs/proc/base.c). + * They allow to specify the id for the next resource to be allocated, + * instead of letting the allocator set it for us. + */ + +#include <linux/sched.h> +#include <linux/ctype.h> + + + +ssize_t get_nextid(struct task_struct *task, char *buffer, size_t size) +{ + struct sys_id *sid; + + sid = task->next_id; + if (!sid) + return snprintf(buffer, size, "UNSET\n"); + + return snprintf(buffer, size, "LONG %ld\n", sid->id); +} + +static int set_single_id(struct task_struct *task, char *buffer) +{ + struct sys_id *sid; + long next_id; + char *end; + + next_id = simple_strtol(buffer, &end, 0); + if (end == buffer || (end && !isspace(*end))) + return -EINVAL; + + sid = task->next_id; + if (!sid) { + sid = kzalloc(sizeof(*sid), GFP_KERNEL); + if (!sid) + return -ENOMEM; + task->next_id = sid; + } + + sid->id = next_id; + + return 0; +} + +int reset_nextid(struct task_struct *task) +{ + struct sys_id *sid; + + sid = task->next_id; + if (!sid) + return 0; + + task->next_id = NULL; + kfree(sid); + return 0; +} + +#define LONG_STR "LONG" +#define RESET_STR "RESET" + +/* + * Parses a line written to /proc/self/next_id. + * this line has the following format: + * LONG id --> a single id is specified + */ +int set_nextid(struct task_struct *task, char *buffer) +{ + char *token, *out = buffer; + + if (!out) + return -EINVAL; + + token = strsep(&out, " "); + + if (!strcmp(token, LONG_STR)) + return set_single_id(task, out); + else if (!strncmp(token, RESET_STR, strlen(RESET_STR))) + return reset_nextid(task); + else + return -EINVAL; +} Index: linux-2.6.25-rc8-mm1/kernel/fork.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/fork.c 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/fork.c 2008-04-04 14:00:35.000000000 +0200 @@ -1167,6 +1167,8 @@ static struct task_struct *copy_process( p->blocked_on = NULL; /* not blocked yet */ #endif + p->next_id = NULL; + /* Perform scheduler related setup. Assign this task to a CPU. */ sched_fork(p, clone_flags); Index: linux-2.6.25-rc8-mm1/kernel/exit.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/exit.c 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/exit.c 2008-04-04 14:01:22.000000000 +0200 @@ -987,6 +987,10 @@ NORET_TYPE void do_exit(long code) proc_exit_connector(tsk); exit_notify(tsk, group_dead); + + if (unlikely(tsk->next_id)) + exit_nextid(tsk); + #ifdef CONFIG_NUMA mpol_free(tsk->mempolicy); tsk->mempolicy = NULL; Index: linux-2.6.25-rc8-mm1/fs/exec.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/fs/exec.c 2008-04-04 13:11:34.000000000 +0200 +++ linux-2.6.25-rc8-mm1/fs/exec.c 2008-04-04 14:02:09.000000000 +0200 @@ -1024,6 +1024,9 @@ int flush_old_exec(struct linux_binprm * flush_signal_handlers(current, 0); flush_old_files(current->files); + if (unlikely(current->next_id)) + reset_nextid(current); + return 0; mmap_failed: -- ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey 2008-04-04 14:51 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next id Nadia.Derbey-6ktuUTfB/bM 2008-04-04 14:51 ` Nadia.Derbey @ 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM 2008-04-04 14:51 ` Nadia.Derbey ` (6 subsequent siblings) 9 siblings, 0 replies; 31+ messages in thread From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nadia Derbey [-- Attachment #1: proc_set_next_ids.patch --] [-- Type: text/plain, Size: 6815 bytes --] [PATCH 02/04] This patch proposes the procfs facilities needed to feed the id(s) for the next task to be forked. say n is the number of pids to be provided through procfs: if an echo "LONG<n> X0 X1 ... X<n-1>" > /proc/self/next_pids is issued, the next task to be forked will have its upid nrs set as follows (say it is forked in a pid ns of level L): level upid nr L ----------> X0 .. L - i ------> Xi .. L - n + 1 --> X<n-1> Then, for levels L-n down to level 0, the pids will be left to the kernel choice. Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org> --- include/linux/sysids.h | 27 ++++++++ kernel/nextid.c | 150 ++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 155 insertions(+), 22 deletions(-) Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/sysids.h 2008-04-04 13:53:04.000000000 +0200 +++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 14:18:04.000000000 +0200 @@ -8,8 +8,33 @@ #ifndef _LINUX_SYSIDS_H #define _LINUX_SYSIDS_H + +#define NIDS_SMALL 32 +#define NIDS_PER_BLOCK ((unsigned int)(PAGE_SIZE / sizeof(long))) + +/* access the ids "array" with this macro */ +#define ID_AT(pi, i) \ + ((pi)->blocks[(i) / NIDS_PER_BLOCK][(i) % NIDS_PER_BLOCK]) + + +/* + * List of ids for the next object to be created. This presently applies to + * next process to be created. + * The next process to be created is associated to a set of upid nrs: one for + * each pid namespace level that process belongs to. + * upid nrs from level 0 up to level <npids - 1> will be automatically + * allocated. + * upid nr for level nids will be set to blocks[0][0] + * upid nr for level <nids + i> will be set to ID_AT(ids, i); + * + * If a single id is needed, nids is set to 1 and small_block[0] is set to + * that id. + */ struct sys_id { - long id; + int nids; + long small_block[NIDS_SMALL]; + int nblocks; + long *blocks[0]; }; extern ssize_t get_nextid(struct task_struct *, char *, size_t); Index: linux-2.6.25-rc8-mm1/kernel/nextid.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/nextid.c 2008-04-04 13:59:59.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 14:28:13.000000000 +0200 @@ -13,38 +13,138 @@ +static struct sys_id *id_blocks_alloc(int nids) +{ + struct sys_id *ids; + int nblocks; + int i; + + nblocks = (nids + NIDS_PER_BLOCK - 1) / NIDS_PER_BLOCK; + BUG_ON(nblocks < 1); + + ids = kmalloc(sizeof(*ids) + nblocks * sizeof(long *), GFP_KERNEL); + if (!ids) + return NULL; + ids->nids = nids; + ids->nblocks = nblocks; + + if (nids <= NIDS_SMALL) + ids->blocks[0] = ids->small_block; + else { + for (i = 0; i < nblocks; i++) { + long *b; + b = (void *)__get_free_page(GFP_KERNEL); + if (!b) + goto out_undo_partial_alloc; + ids->blocks[i] = b; + } + } + return ids; + +out_undo_partial_alloc: + while (--i >= 0) + free_page((unsigned long)ids->blocks[i]); + + kfree(ids); + return NULL; +} + +static void id_blocks_free(struct sys_id *ids) +{ + if (ids == NULL) + return; + + if (ids->blocks[0] != ids->small_block) { + int i; + for (i = 0; i < ids->nblocks; i++) + free_page((unsigned long)ids->blocks[i]); + } + kfree(ids); + return; +} + ssize_t get_nextid(struct task_struct *task, char *buffer, size_t size) { + ssize_t count = 0; struct sys_id *sid; + char *bufptr = buffer; + int i; sid = task->next_id; - if (!sid) + if (!sid || !sid->nids) return snprintf(buffer, size, "UNSET\n"); - return snprintf(buffer, size, "LONG %ld\n", sid->id); + count = sprintf(bufptr, "LONGS (%d) ", sid->nids); + + for (i = 0; i < sid->nids - 1; i++) + count += sprintf(&bufptr[count], "%ld ", ID_AT(sid, i)); + + count += sprintf(&bufptr[count], "%ld\n", ID_AT(sid, i)); + + return count; } -static int set_single_id(struct task_struct *task, char *buffer) +static int fill_nextid_list(struct task_struct *task, int nids, char *buffer) { - struct sys_id *sid; - long next_id; + char *token, *buff = buffer; char *end; + struct sys_id *sid; + struct sys_id *old_list = task->next_id; + int i; - next_id = simple_strtol(buffer, &end, 0); - if (end == buffer || (end && !isspace(*end))) - return -EINVAL; + sid = id_blocks_alloc(nids); + if (!sid) + return -ENOMEM; - sid = task->next_id; - if (!sid) { - sid = kzalloc(sizeof(*sid), GFP_KERNEL); - if (!sid) - return -ENOMEM; - task->next_id = sid; + i = 0; + while ((token = strsep(&buff, " ")) != NULL && i < nids) { + long id; + + if (!*token) + goto out_free; + id = simple_strtol(token, &end, 0); + if (end == token || (*end && !isspace(*end))) + goto out_free; + ID_AT(sid, i) = id; + i++; } - sid->id = next_id; + if (i != nids) + /* Not enough pids compared to npids */ + goto out_free; + + if (old_list) + id_blocks_free(old_list); + task->next_id = sid; return 0; + +out_free: + id_blocks_free(sid); + return -EINVAL; +} + +/* + * Parses a line with the following format: + * <x> <id0> ... <idx-1> + * and sets <id0> to <idx-1> as the sequence of ids to be used for the next + * object to be created by the task. + * This applies to processes that need 1 id per namespace level. + * Any trailing character on the line is skipped. + */ +static int set_multiple_ids(struct task_struct *task, char *nb, char *buffer) +{ + int nids; + char *end; + + nids = simple_strtol(nb, &end, 0); + if (*end) + return -EINVAL; + + if (nids <= 0) + return -EINVAL; + + return fill_nextid_list(task, nids, buffer); } int reset_nextid(struct task_struct *task) @@ -55,8 +155,8 @@ int reset_nextid(struct task_struct *tas if (!sid) return 0; + id_blocks_free(sid); task->next_id = NULL; - kfree(sid); return 0; } @@ -65,12 +165,14 @@ int reset_nextid(struct task_struct *tas /* * Parses a line written to /proc/self/next_id. - * this line has the following format: + * this line has one of the following formats: * LONG id --> a single id is specified + * LONG<x> id0 ... id<x-1> --> a sequence of ids is specified */ int set_nextid(struct task_struct *task, char *buffer) { char *token, *out = buffer; + size_t sz; if (!out) return -EINVAL; @@ -78,9 +180,15 @@ int set_nextid(struct task_struct *task, token = strsep(&out, " "); if (!strcmp(token, LONG_STR)) - return set_single_id(task, out); - else if (!strncmp(token, RESET_STR, strlen(RESET_STR))) + return fill_nextid_list(task, 1, out); + + sz = strlen(LONG_STR); + + if (!strncmp(token, LONG_STR, sz)) + return set_multiple_ids(task, token + sz, out); + + if (!strncmp(token, RESET_STR, strlen(RESET_STR))) return reset_nextid(task); - else - return -EINVAL; + + return -EINVAL; } -- ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey ` (2 preceding siblings ...) 2008-04-04 14:51 ` [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 ` Nadia.Derbey 2008-04-04 14:51 ` [RFC][PATCH 3/4] IPC: use the target ID specified in procfs Nadia.Derbey ` (5 subsequent siblings) 9 siblings, 0 replies; 31+ messages in thread From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw) To: linux-kernel; +Cc: containers, orenl, Nadia Derbey [-- Attachment #1: proc_set_next_ids.patch --] [-- Type: text/plain, Size: 6795 bytes --] [PATCH 02/04] This patch proposes the procfs facilities needed to feed the id(s) for the next task to be forked. say n is the number of pids to be provided through procfs: if an echo "LONG<n> X0 X1 ... X<n-1>" > /proc/self/next_pids is issued, the next task to be forked will have its upid nrs set as follows (say it is forked in a pid ns of level L): level upid nr L ----------> X0 .. L - i ------> Xi .. L - n + 1 --> X<n-1> Then, for levels L-n down to level 0, the pids will be left to the kernel choice. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> --- include/linux/sysids.h | 27 ++++++++ kernel/nextid.c | 150 ++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 155 insertions(+), 22 deletions(-) Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/sysids.h 2008-04-04 13:53:04.000000000 +0200 +++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 14:18:04.000000000 +0200 @@ -8,8 +8,33 @@ #ifndef _LINUX_SYSIDS_H #define _LINUX_SYSIDS_H + +#define NIDS_SMALL 32 +#define NIDS_PER_BLOCK ((unsigned int)(PAGE_SIZE / sizeof(long))) + +/* access the ids "array" with this macro */ +#define ID_AT(pi, i) \ + ((pi)->blocks[(i) / NIDS_PER_BLOCK][(i) % NIDS_PER_BLOCK]) + + +/* + * List of ids for the next object to be created. This presently applies to + * next process to be created. + * The next process to be created is associated to a set of upid nrs: one for + * each pid namespace level that process belongs to. + * upid nrs from level 0 up to level <npids - 1> will be automatically + * allocated. + * upid nr for level nids will be set to blocks[0][0] + * upid nr for level <nids + i> will be set to ID_AT(ids, i); + * + * If a single id is needed, nids is set to 1 and small_block[0] is set to + * that id. + */ struct sys_id { - long id; + int nids; + long small_block[NIDS_SMALL]; + int nblocks; + long *blocks[0]; }; extern ssize_t get_nextid(struct task_struct *, char *, size_t); Index: linux-2.6.25-rc8-mm1/kernel/nextid.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/nextid.c 2008-04-04 13:59:59.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 14:28:13.000000000 +0200 @@ -13,38 +13,138 @@ +static struct sys_id *id_blocks_alloc(int nids) +{ + struct sys_id *ids; + int nblocks; + int i; + + nblocks = (nids + NIDS_PER_BLOCK - 1) / NIDS_PER_BLOCK; + BUG_ON(nblocks < 1); + + ids = kmalloc(sizeof(*ids) + nblocks * sizeof(long *), GFP_KERNEL); + if (!ids) + return NULL; + ids->nids = nids; + ids->nblocks = nblocks; + + if (nids <= NIDS_SMALL) + ids->blocks[0] = ids->small_block; + else { + for (i = 0; i < nblocks; i++) { + long *b; + b = (void *)__get_free_page(GFP_KERNEL); + if (!b) + goto out_undo_partial_alloc; + ids->blocks[i] = b; + } + } + return ids; + +out_undo_partial_alloc: + while (--i >= 0) + free_page((unsigned long)ids->blocks[i]); + + kfree(ids); + return NULL; +} + +static void id_blocks_free(struct sys_id *ids) +{ + if (ids == NULL) + return; + + if (ids->blocks[0] != ids->small_block) { + int i; + for (i = 0; i < ids->nblocks; i++) + free_page((unsigned long)ids->blocks[i]); + } + kfree(ids); + return; +} + ssize_t get_nextid(struct task_struct *task, char *buffer, size_t size) { + ssize_t count = 0; struct sys_id *sid; + char *bufptr = buffer; + int i; sid = task->next_id; - if (!sid) + if (!sid || !sid->nids) return snprintf(buffer, size, "UNSET\n"); - return snprintf(buffer, size, "LONG %ld\n", sid->id); + count = sprintf(bufptr, "LONGS (%d) ", sid->nids); + + for (i = 0; i < sid->nids - 1; i++) + count += sprintf(&bufptr[count], "%ld ", ID_AT(sid, i)); + + count += sprintf(&bufptr[count], "%ld\n", ID_AT(sid, i)); + + return count; } -static int set_single_id(struct task_struct *task, char *buffer) +static int fill_nextid_list(struct task_struct *task, int nids, char *buffer) { - struct sys_id *sid; - long next_id; + char *token, *buff = buffer; char *end; + struct sys_id *sid; + struct sys_id *old_list = task->next_id; + int i; - next_id = simple_strtol(buffer, &end, 0); - if (end == buffer || (end && !isspace(*end))) - return -EINVAL; + sid = id_blocks_alloc(nids); + if (!sid) + return -ENOMEM; - sid = task->next_id; - if (!sid) { - sid = kzalloc(sizeof(*sid), GFP_KERNEL); - if (!sid) - return -ENOMEM; - task->next_id = sid; + i = 0; + while ((token = strsep(&buff, " ")) != NULL && i < nids) { + long id; + + if (!*token) + goto out_free; + id = simple_strtol(token, &end, 0); + if (end == token || (*end && !isspace(*end))) + goto out_free; + ID_AT(sid, i) = id; + i++; } - sid->id = next_id; + if (i != nids) + /* Not enough pids compared to npids */ + goto out_free; + + if (old_list) + id_blocks_free(old_list); + task->next_id = sid; return 0; + +out_free: + id_blocks_free(sid); + return -EINVAL; +} + +/* + * Parses a line with the following format: + * <x> <id0> ... <idx-1> + * and sets <id0> to <idx-1> as the sequence of ids to be used for the next + * object to be created by the task. + * This applies to processes that need 1 id per namespace level. + * Any trailing character on the line is skipped. + */ +static int set_multiple_ids(struct task_struct *task, char *nb, char *buffer) +{ + int nids; + char *end; + + nids = simple_strtol(nb, &end, 0); + if (*end) + return -EINVAL; + + if (nids <= 0) + return -EINVAL; + + return fill_nextid_list(task, nids, buffer); } int reset_nextid(struct task_struct *task) @@ -55,8 +155,8 @@ int reset_nextid(struct task_struct *tas if (!sid) return 0; + id_blocks_free(sid); task->next_id = NULL; - kfree(sid); return 0; } @@ -65,12 +165,14 @@ int reset_nextid(struct task_struct *tas /* * Parses a line written to /proc/self/next_id. - * this line has the following format: + * this line has one of the following formats: * LONG id --> a single id is specified + * LONG<x> id0 ... id<x-1> --> a sequence of ids is specified */ int set_nextid(struct task_struct *task, char *buffer) { char *token, *out = buffer; + size_t sz; if (!out) return -EINVAL; @@ -78,9 +180,15 @@ int set_nextid(struct task_struct *task, token = strsep(&out, " "); if (!strcmp(token, LONG_STR)) - return set_single_id(task, out); - else if (!strncmp(token, RESET_STR, strlen(RESET_STR))) + return fill_nextid_list(task, 1, out); + + sz = strlen(LONG_STR); + + if (!strncmp(token, LONG_STR, sz)) + return set_multiple_ids(task, token + sz, out); + + if (!strncmp(token, RESET_STR, strlen(RESET_STR))) return reset_nextid(task); - else - return -EINVAL; + + return -EINVAL; } -- ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 3/4] IPC: use the target ID specified in procfs 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey ` (3 preceding siblings ...) 2008-04-04 14:51 ` Nadia.Derbey @ 2008-04-04 14:51 ` Nadia.Derbey 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM ` (4 subsequent siblings) 9 siblings, 0 replies; 31+ messages in thread From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw) To: linux-kernel; +Cc: containers, orenl, Nadia Derbey [-- Attachment #1: ipc_use_next_id.patch --] [-- Type: text/plain, Size: 3347 bytes --] [PATCH 03/04] This patch makes use of the target id specified by a previous write into /proc/self/next_id as the id to use to allocate the next IPC object. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> --- include/linux/sysids.h | 7 +++++++ ipc/util.c | 40 ++++++++++++++++++++++++++++++++-------- kernel/nextid.c | 2 +- 3 files changed, 40 insertions(+), 9 deletions(-) Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/sysids.h 2008-04-04 14:18:04.000000000 +0200 +++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 14:37:45.000000000 +0200 @@ -37,9 +37,16 @@ struct sys_id { long *blocks[0]; }; +#define next_ipcid(tsk) ((tsk)->next_id \ + ? ((tsk)->next_id->nids \ + ? ID_AT((tsk)->next_id, 0) \ + : -1) \ + : -1) + extern ssize_t get_nextid(struct task_struct *, char *, size_t); extern int set_nextid(struct task_struct *, char *); extern int reset_nextid(struct task_struct *); +extern void id_blocks_free(struct sys_id *); static inline void exit_nextid(struct task_struct *tsk) { Index: linux-2.6.25-rc8-mm1/kernel/nextid.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/nextid.c 2008-04-04 14:28:13.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 14:38:38.000000000 +0200 @@ -49,7 +49,7 @@ out_undo_partial_alloc: return NULL; } -static void id_blocks_free(struct sys_id *ids) +void id_blocks_free(struct sys_id *ids) { if (ids == NULL) return; Index: linux-2.6.25-rc8-mm1/ipc/util.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/ipc/util.c 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/ipc/util.c 2008-04-04 14:41:53.000000000 +0200 @@ -260,6 +260,7 @@ int ipc_get_maxid(struct ipc_ids *ids) int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size) { int id, err; + int next_id; if (size > IPCMNI) size = IPCMNI; @@ -267,20 +268,43 @@ int ipc_addid(struct ipc_ids* ids, struc if (ids->in_use >= size) return -ENOSPC; - err = idr_get_new(&ids->ipcs_idr, new, &id); - if (err) - return err; + next_id = next_ipcid(current); + if (next_id >= 0) { + /* There is a target id specified, try to use it */ + int new_lid = next_id % SEQ_MULTIPLIER; + + if (next_id != + (new_lid + (next_id / SEQ_MULTIPLIER) * SEQ_MULTIPLIER)) + return -EINVAL; + + err = idr_get_new_above(&ids->ipcs_idr, new, new_lid, &id); + if (err) + return err; + if (id != new_lid) { + idr_remove(&ids->ipcs_idr, id); + return -EBUSY; + } + + new->id = next_id; + new->seq = next_id / SEQ_MULTIPLIER; + id_blocks_free(current->next_id); + current->next_id = NULL; + } else { + err = idr_get_new(&ids->ipcs_idr, new, &id); + if (err) + return err; + + new->seq = ids->seq++; + if (ids->seq > ids->seq_max) + ids->seq = 0; + new->id = ipc_buildid(id, new->seq); + } ids->in_use++; new->cuid = new->uid = current->euid; new->gid = new->cgid = current->egid; - new->seq = ids->seq++; - if(ids->seq > ids->seq_max) - ids->seq = 0; - - new->id = ipc_buildid(id, new->seq); spin_lock_init(&new->lock); new->deleted = 0; rcu_read_lock(); -- ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 3/4] IPC: use the target ID specified in procfs 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey ` (4 preceding siblings ...) 2008-04-04 14:51 ` [RFC][PATCH 3/4] IPC: use the target ID specified in procfs Nadia.Derbey @ 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM 2008-04-04 14:51 ` [RFC][PATCH 4/4] PID: " Nadia.Derbey-6ktuUTfB/bM ` (3 subsequent siblings) 9 siblings, 0 replies; 31+ messages in thread From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nadia Derbey [-- Attachment #1: ipc_use_next_id.patch --] [-- Type: text/plain, Size: 3367 bytes --] [PATCH 03/04] This patch makes use of the target id specified by a previous write into /proc/self/next_id as the id to use to allocate the next IPC object. Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org> --- include/linux/sysids.h | 7 +++++++ ipc/util.c | 40 ++++++++++++++++++++++++++++++++-------- kernel/nextid.c | 2 +- 3 files changed, 40 insertions(+), 9 deletions(-) Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/sysids.h 2008-04-04 14:18:04.000000000 +0200 +++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 14:37:45.000000000 +0200 @@ -37,9 +37,16 @@ struct sys_id { long *blocks[0]; }; +#define next_ipcid(tsk) ((tsk)->next_id \ + ? ((tsk)->next_id->nids \ + ? ID_AT((tsk)->next_id, 0) \ + : -1) \ + : -1) + extern ssize_t get_nextid(struct task_struct *, char *, size_t); extern int set_nextid(struct task_struct *, char *); extern int reset_nextid(struct task_struct *); +extern void id_blocks_free(struct sys_id *); static inline void exit_nextid(struct task_struct *tsk) { Index: linux-2.6.25-rc8-mm1/kernel/nextid.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/nextid.c 2008-04-04 14:28:13.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 14:38:38.000000000 +0200 @@ -49,7 +49,7 @@ out_undo_partial_alloc: return NULL; } -static void id_blocks_free(struct sys_id *ids) +void id_blocks_free(struct sys_id *ids) { if (ids == NULL) return; Index: linux-2.6.25-rc8-mm1/ipc/util.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/ipc/util.c 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/ipc/util.c 2008-04-04 14:41:53.000000000 +0200 @@ -260,6 +260,7 @@ int ipc_get_maxid(struct ipc_ids *ids) int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size) { int id, err; + int next_id; if (size > IPCMNI) size = IPCMNI; @@ -267,20 +268,43 @@ int ipc_addid(struct ipc_ids* ids, struc if (ids->in_use >= size) return -ENOSPC; - err = idr_get_new(&ids->ipcs_idr, new, &id); - if (err) - return err; + next_id = next_ipcid(current); + if (next_id >= 0) { + /* There is a target id specified, try to use it */ + int new_lid = next_id % SEQ_MULTIPLIER; + + if (next_id != + (new_lid + (next_id / SEQ_MULTIPLIER) * SEQ_MULTIPLIER)) + return -EINVAL; + + err = idr_get_new_above(&ids->ipcs_idr, new, new_lid, &id); + if (err) + return err; + if (id != new_lid) { + idr_remove(&ids->ipcs_idr, id); + return -EBUSY; + } + + new->id = next_id; + new->seq = next_id / SEQ_MULTIPLIER; + id_blocks_free(current->next_id); + current->next_id = NULL; + } else { + err = idr_get_new(&ids->ipcs_idr, new, &id); + if (err) + return err; + + new->seq = ids->seq++; + if (ids->seq > ids->seq_max) + ids->seq = 0; + new->id = ipc_buildid(id, new->seq); + } ids->in_use++; new->cuid = new->uid = current->euid; new->gid = new->cgid = current->egid; - new->seq = ids->seq++; - if(ids->seq > ids->seq_max) - ids->seq = 0; - - new->id = ipc_buildid(id, new->seq); spin_lock_init(&new->lock); new->deleted = 0; rcu_read_lock(); -- ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 4/4] PID: use the target ID specified in procfs 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey ` (5 preceding siblings ...) 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM 2008-04-04 14:51 ` Nadia.Derbey ` (2 subsequent siblings) 9 siblings, 0 replies; 31+ messages in thread From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nadia Derbey [-- Attachment #1: upidnr_use_next_id.patch --] [-- Type: text/plain, Size: 6406 bytes --] [PATCH 04/04] This patch makes use of the target ids specified by a previous write to /proc/self/next_id as the ids to use to allocate the next upid nrs. Upper levels upid nrs that are not specified in next_pids file are left to the kernel choice. Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org> --- include/linux/pid.h | 2 kernel/fork.c | 3 - kernel/pid.c | 141 +++++++++++++++++++++++++++++++++++++++++++++------- 3 files changed, 126 insertions(+), 20 deletions(-) Index: linux-2.6.25-rc8-mm1/include/linux/pid.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/pid.h 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/include/linux/pid.h 2008-04-04 14:54:09.000000000 +0200 @@ -121,7 +121,7 @@ extern struct pid *find_get_pid(int nr); extern struct pid *find_ge_pid(int nr, struct pid_namespace *); int next_pidmap(struct pid_namespace *pid_ns, int last); -extern struct pid *alloc_pid(struct pid_namespace *ns); +extern struct pid *alloc_pid(struct pid_namespace *ns, int *retval); extern void free_pid(struct pid *pid); /* Index: linux-2.6.25-rc8-mm1/kernel/fork.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/fork.c 2008-04-04 14:00:35.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/fork.c 2008-04-04 14:54:43.000000000 +0200 @@ -1200,8 +1200,7 @@ static struct task_struct *copy_process( goto bad_fork_cleanup_io; if (pid != &init_struct_pid) { - retval = -ENOMEM; - pid = alloc_pid(task_active_pid_ns(p)); + pid = alloc_pid(task_active_pid_ns(p), &retval); if (!pid) goto bad_fork_cleanup_io; Index: linux-2.6.25-rc8-mm1/kernel/pid.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/pid.c 2008-04-04 13:11:39.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/pid.c 2008-04-04 14:59:24.000000000 +0200 @@ -122,6 +122,26 @@ static void free_pidmap(struct upid *upi atomic_inc(&map->nr_free); } +static inline int alloc_pidmap_page(struct pidmap *map) +{ + if (unlikely(!map->page)) { + void *page = kzalloc(PAGE_SIZE, GFP_KERNEL); + /* + * Free the page if someone raced with us + * installing it: + */ + spin_lock_irq(&pidmap_lock); + if (map->page) + kfree(page); + else + map->page = page; + spin_unlock_irq(&pidmap_lock); + if (unlikely(!map->page)) + return -1; + } + return 0; +} + static int alloc_pidmap(struct pid_namespace *pid_ns) { int i, offset, max_scan, pid, last = pid_ns->last_pid; @@ -134,21 +154,8 @@ static int alloc_pidmap(struct pid_names map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset; for (i = 0; i <= max_scan; ++i) { - if (unlikely(!map->page)) { - void *page = kzalloc(PAGE_SIZE, GFP_KERNEL); - /* - * Free the page if someone raced with us - * installing it: - */ - spin_lock_irq(&pidmap_lock); - if (map->page) - kfree(page); - else - map->page = page; - spin_unlock_irq(&pidmap_lock); - if (unlikely(!map->page)) - break; - } + if (unlikely(alloc_pidmap_page(map))) + break; if (likely(atomic_read(&map->nr_free))) { do { if (!test_and_set_bit(offset, map->page)) { @@ -182,6 +189,35 @@ static int alloc_pidmap(struct pid_names return -1; } +/* + * Return a predefined pid value if successful (ID_AT(pid_l, level)), + * -errno else + */ +static int alloc_fixed_pidmap(struct pid_namespace *pid_ns, + struct sys_id *pid_l, int level) +{ + int offset, pid; + struct pidmap *map; + + pid = ID_AT(pid_l, level); + if (pid < RESERVED_PIDS || pid >= pid_max) + return -EINVAL; + + map = &pid_ns->pidmap[pid / BITS_PER_PAGE]; + + if (unlikely(alloc_pidmap_page(map))) + return -ENOMEM; + + offset = pid & BITS_PER_PAGE_MASK; + if (test_and_set_bit(offset, map->page)) + return -EBUSY; + + atomic_dec(&map->nr_free); + pid_ns->last_pid = max(pid_ns->last_pid, pid); + + return pid; +} + int next_pidmap(struct pid_namespace *pid_ns, int last) { int offset; @@ -243,20 +279,91 @@ void free_pid(struct pid *pid) call_rcu(&pid->rcu, delayed_put_pid); } -struct pid *alloc_pid(struct pid_namespace *ns) +/* + * Called by alloc_pid() to use a list of predefined ids for the calling + * process' upper ns levels. + * Returns next pid ns to visit if successful (may be NULL if walked through + * the entire pid ns hierarchy). + * i is filled with next level to be visited (useful for the error cases). + */ +static struct pid_namespace *set_predefined_pids(struct pid_namespace *ns, + struct pid *pid, + struct sys_id *pid_l, + int *next_level) +{ + struct pid_namespace *tmp; + int rel_level, i, nr; + + rel_level = pid_l->nids - 1; + if (rel_level > ns->level) + return ERR_PTR(-EINVAL); + + tmp = ns; + + /* + * Use the predefined upid nrs for levels ns->level down to + * ns->level - rel_level + */ + for (i = ns->level ; rel_level >= 0; i--, rel_level--) { + nr = alloc_fixed_pidmap(tmp, pid_l, rel_level); + if (nr < 0) { + tmp = ERR_PTR(nr); + goto out; + } + + pid->numbers[i].nr = nr; + pid->numbers[i].ns = tmp; + tmp = tmp->parent; + } + + id_blocks_free(pid_l); +out: + *next_level = i; + return tmp; +} + +struct pid *alloc_pid(struct pid_namespace *ns, int *retval) { struct pid *pid; enum pid_type type; int i, nr; struct pid_namespace *tmp; struct upid *upid; + struct sys_id *pid_l; + *retval = -ENOMEM; pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL); if (!pid) goto out; tmp = ns; - for (i = ns->level; i >= 0; i--) { + i = ns->level; + + /* + * If there is a list of upid nrs specified, use it instead of letting + * the kernel chose the upid nrs for us. + */ + pid_l = current->next_id; + if (pid_l && pid_l->nids) { + /* + * returns the next ns to be visited in the following loop + * (or NULL if we are done). + * i is filled in with the next level to be visited. We need + * it to undo things in the error cases. + */ + tmp = set_predefined_pids(ns, pid, pid_l, &i); + if (IS_ERR(tmp)) { + *retval = PTR_ERR(tmp); + goto out_free; + } + current->next_id = NULL; + } + + *retval = -ENOMEM; + /* + * Let the lower levels upid nrs be automatically allocated + */ + for ( ; i >= 0; i--) { nr = alloc_pidmap(tmp); if (nr < 0) goto out_free; -- ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 4/4] PID: use the target ID specified in procfs 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey ` (6 preceding siblings ...) 2008-04-04 14:51 ` [RFC][PATCH 4/4] PID: " Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 ` Nadia.Derbey [not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org> 2008-04-15 3:06 ` Nick Andrew 9 siblings, 0 replies; 31+ messages in thread From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw) To: linux-kernel; +Cc: containers, orenl, Nadia Derbey [-- Attachment #1: upidnr_use_next_id.patch --] [-- Type: text/plain, Size: 6386 bytes --] [PATCH 04/04] This patch makes use of the target ids specified by a previous write to /proc/self/next_id as the ids to use to allocate the next upid nrs. Upper levels upid nrs that are not specified in next_pids file are left to the kernel choice. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> --- include/linux/pid.h | 2 kernel/fork.c | 3 - kernel/pid.c | 141 +++++++++++++++++++++++++++++++++++++++++++++------- 3 files changed, 126 insertions(+), 20 deletions(-) Index: linux-2.6.25-rc8-mm1/include/linux/pid.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/pid.h 2008-04-04 13:11:37.000000000 +0200 +++ linux-2.6.25-rc8-mm1/include/linux/pid.h 2008-04-04 14:54:09.000000000 +0200 @@ -121,7 +121,7 @@ extern struct pid *find_get_pid(int nr); extern struct pid *find_ge_pid(int nr, struct pid_namespace *); int next_pidmap(struct pid_namespace *pid_ns, int last); -extern struct pid *alloc_pid(struct pid_namespace *ns); +extern struct pid *alloc_pid(struct pid_namespace *ns, int *retval); extern void free_pid(struct pid *pid); /* Index: linux-2.6.25-rc8-mm1/kernel/fork.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/fork.c 2008-04-04 14:00:35.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/fork.c 2008-04-04 14:54:43.000000000 +0200 @@ -1200,8 +1200,7 @@ static struct task_struct *copy_process( goto bad_fork_cleanup_io; if (pid != &init_struct_pid) { - retval = -ENOMEM; - pid = alloc_pid(task_active_pid_ns(p)); + pid = alloc_pid(task_active_pid_ns(p), &retval); if (!pid) goto bad_fork_cleanup_io; Index: linux-2.6.25-rc8-mm1/kernel/pid.c =================================================================== --- linux-2.6.25-rc8-mm1.orig/kernel/pid.c 2008-04-04 13:11:39.000000000 +0200 +++ linux-2.6.25-rc8-mm1/kernel/pid.c 2008-04-04 14:59:24.000000000 +0200 @@ -122,6 +122,26 @@ static void free_pidmap(struct upid *upi atomic_inc(&map->nr_free); } +static inline int alloc_pidmap_page(struct pidmap *map) +{ + if (unlikely(!map->page)) { + void *page = kzalloc(PAGE_SIZE, GFP_KERNEL); + /* + * Free the page if someone raced with us + * installing it: + */ + spin_lock_irq(&pidmap_lock); + if (map->page) + kfree(page); + else + map->page = page; + spin_unlock_irq(&pidmap_lock); + if (unlikely(!map->page)) + return -1; + } + return 0; +} + static int alloc_pidmap(struct pid_namespace *pid_ns) { int i, offset, max_scan, pid, last = pid_ns->last_pid; @@ -134,21 +154,8 @@ static int alloc_pidmap(struct pid_names map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset; for (i = 0; i <= max_scan; ++i) { - if (unlikely(!map->page)) { - void *page = kzalloc(PAGE_SIZE, GFP_KERNEL); - /* - * Free the page if someone raced with us - * installing it: - */ - spin_lock_irq(&pidmap_lock); - if (map->page) - kfree(page); - else - map->page = page; - spin_unlock_irq(&pidmap_lock); - if (unlikely(!map->page)) - break; - } + if (unlikely(alloc_pidmap_page(map))) + break; if (likely(atomic_read(&map->nr_free))) { do { if (!test_and_set_bit(offset, map->page)) { @@ -182,6 +189,35 @@ static int alloc_pidmap(struct pid_names return -1; } +/* + * Return a predefined pid value if successful (ID_AT(pid_l, level)), + * -errno else + */ +static int alloc_fixed_pidmap(struct pid_namespace *pid_ns, + struct sys_id *pid_l, int level) +{ + int offset, pid; + struct pidmap *map; + + pid = ID_AT(pid_l, level); + if (pid < RESERVED_PIDS || pid >= pid_max) + return -EINVAL; + + map = &pid_ns->pidmap[pid / BITS_PER_PAGE]; + + if (unlikely(alloc_pidmap_page(map))) + return -ENOMEM; + + offset = pid & BITS_PER_PAGE_MASK; + if (test_and_set_bit(offset, map->page)) + return -EBUSY; + + atomic_dec(&map->nr_free); + pid_ns->last_pid = max(pid_ns->last_pid, pid); + + return pid; +} + int next_pidmap(struct pid_namespace *pid_ns, int last) { int offset; @@ -243,20 +279,91 @@ void free_pid(struct pid *pid) call_rcu(&pid->rcu, delayed_put_pid); } -struct pid *alloc_pid(struct pid_namespace *ns) +/* + * Called by alloc_pid() to use a list of predefined ids for the calling + * process' upper ns levels. + * Returns next pid ns to visit if successful (may be NULL if walked through + * the entire pid ns hierarchy). + * i is filled with next level to be visited (useful for the error cases). + */ +static struct pid_namespace *set_predefined_pids(struct pid_namespace *ns, + struct pid *pid, + struct sys_id *pid_l, + int *next_level) +{ + struct pid_namespace *tmp; + int rel_level, i, nr; + + rel_level = pid_l->nids - 1; + if (rel_level > ns->level) + return ERR_PTR(-EINVAL); + + tmp = ns; + + /* + * Use the predefined upid nrs for levels ns->level down to + * ns->level - rel_level + */ + for (i = ns->level ; rel_level >= 0; i--, rel_level--) { + nr = alloc_fixed_pidmap(tmp, pid_l, rel_level); + if (nr < 0) { + tmp = ERR_PTR(nr); + goto out; + } + + pid->numbers[i].nr = nr; + pid->numbers[i].ns = tmp; + tmp = tmp->parent; + } + + id_blocks_free(pid_l); +out: + *next_level = i; + return tmp; +} + +struct pid *alloc_pid(struct pid_namespace *ns, int *retval) { struct pid *pid; enum pid_type type; int i, nr; struct pid_namespace *tmp; struct upid *upid; + struct sys_id *pid_l; + *retval = -ENOMEM; pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL); if (!pid) goto out; tmp = ns; - for (i = ns->level; i >= 0; i--) { + i = ns->level; + + /* + * If there is a list of upid nrs specified, use it instead of letting + * the kernel chose the upid nrs for us. + */ + pid_l = current->next_id; + if (pid_l && pid_l->nids) { + /* + * returns the next ns to be visited in the following loop + * (or NULL if we are done). + * i is filled in with the next level to be visited. We need + * it to undo things in the error cases. + */ + tmp = set_predefined_pids(ns, pid, pid_l, &i); + if (IS_ERR(tmp)) { + *retval = PTR_ERR(tmp); + goto out_free; + } + current->next_id = NULL; + } + + *retval = -ENOMEM; + /* + * Let the lower levels upid nrs be automatically allocated + */ + for ( ; i >= 0; i--) { nr = alloc_pidmap(tmp); if (nr < 0) goto out_free; -- ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org> @ 2008-04-15 3:06 ` Nick Andrew 0 siblings, 0 replies; 31+ messages in thread From: Nick Andrew @ 2008-04-15 3:06 UTC (permalink / raw) To: Nadia.Derbey-6ktuUTfB/bM Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: > . echo "LONG XX" > /proc/self/next_id > next object to be created will have an id set to XX > . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id > next object to be created will have its ids set to XX0, ... X<n-1> > This is particularly useful for processes that may have several ids if > they belong to nested namespaces. How do you handle race conditions, i.e. you specify the ID for the next object to be created, and then some other thread goes and creates an object before your thread creates one? Nick. -- PGP Key ID = 0x418487E7 http://www.nick-andrew.net/ PGP Key fingerprint = B3ED 6894 8E49 1770 C24A 67E3 6266 6EB9 4184 87E7 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id 2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey ` (8 preceding siblings ...) [not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org> @ 2008-04-15 3:06 ` Nick Andrew 2008-04-15 10:30 ` Nadia Derbey ` (2 more replies) 9 siblings, 3 replies; 31+ messages in thread From: Nick Andrew @ 2008-04-15 3:06 UTC (permalink / raw) To: Nadia.Derbey; +Cc: linux-kernel, containers, orenl On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey@bull.net wrote: > . echo "LONG XX" > /proc/self/next_id > next object to be created will have an id set to XX > . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id > next object to be created will have its ids set to XX0, ... X<n-1> > This is particularly useful for processes that may have several ids if > they belong to nested namespaces. How do you handle race conditions, i.e. you specify the ID for the next object to be created, and then some other thread goes and creates an object before your thread creates one? Nick. -- PGP Key ID = 0x418487E7 http://www.nick-andrew.net/ PGP Key fingerprint = B3ED 6894 8E49 1770 C24A 67E3 6266 6EB9 4184 87E7 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id 2008-04-15 3:06 ` Nick Andrew @ 2008-04-15 10:30 ` Nadia Derbey [not found] ` <480483C2.3030509-6ktuUTfB/bM@public.gmane.org> [not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org> 2008-04-18 5:46 ` Nadia Derbey 2 siblings, 1 reply; 31+ messages in thread From: Nadia Derbey @ 2008-04-15 10:30 UTC (permalink / raw) To: Nick Andrew; +Cc: linux-kernel, containers, orenl Nick Andrew wrote: > On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey@bull.net wrote: > >> . echo "LONG XX" > /proc/self/next_id >> next object to be created will have an id set to XX >> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id >> next object to be created will have its ids set to XX0, ... X<n-1> >> This is particularly useful for processes that may have several ids if >> they belong to nested namespaces. > > > How do you handle race conditions, i.e. you specify the ID for the > next object to be created, and then some other thread goes and creates > an object before your thread creates one? > > Nick. Sorry for not answering earlier, I just saw your e-mail! It's true that the way I've done things, the "create_with_id" doesn't take into account multi-threaded apps, since "self" is related to the thread group leader. May be using something like /proc/self/task/<my_tid>/next_id would be better, but I have to think more about it... Regards, Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <480483C2.3030509-6ktuUTfB/bM@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id 2008-04-15 10:30 ` Nadia Derbey @ 2008-04-15 18:52 ` Oren Laadan 0 siblings, 0 replies; 31+ messages in thread From: Oren Laadan @ 2008-04-15 18:52 UTC (permalink / raw) To: Nadia Derbey Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nick Andrew, linux-kernel-u79uwXL29TY76Z2rM5mHXA Nadia Derbey wrote: > Nick Andrew wrote: >> On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >> >>> . echo "LONG XX" > /proc/self/next_id >>> next object to be created will have an id set to XX >>> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id >>> next object to be created will have its ids set to XX0, ... X<n-1> >>> This is particularly useful for processes that may have several >>> ids if >>> they belong to nested namespaces. >> >> >> How do you handle race conditions, i.e. you specify the ID for the >> next object to be created, and then some other thread goes and creates >> an object before your thread creates one? >> >> Nick. > > > Sorry for not answering earlier, I just saw your e-mail! [I too managed to miss that message]. > > It's true that the way I've done things, the "create_with_id" doesn't > take into account multi-threaded apps, since "self" is related to the > thread group leader. > > May be using something like /proc/self/task/<my_tid>/next_id would be > better, but I have to think more about it... That /proc/self links to /proc/TGID slipped my mind. Definitely must be done on a per-thread basis (and /proc/<TGID>/task/<PID>/next_id will do the trick). Oren. > > Regards, > Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id @ 2008-04-15 18:52 ` Oren Laadan 0 siblings, 0 replies; 31+ messages in thread From: Oren Laadan @ 2008-04-15 18:52 UTC (permalink / raw) To: Nadia Derbey; +Cc: Nick Andrew, linux-kernel, containers Nadia Derbey wrote: > Nick Andrew wrote: >> On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey@bull.net wrote: >> >>> . echo "LONG XX" > /proc/self/next_id >>> next object to be created will have an id set to XX >>> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id >>> next object to be created will have its ids set to XX0, ... X<n-1> >>> This is particularly useful for processes that may have several >>> ids if >>> they belong to nested namespaces. >> >> >> How do you handle race conditions, i.e. you specify the ID for the >> next object to be created, and then some other thread goes and creates >> an object before your thread creates one? >> >> Nick. > > > Sorry for not answering earlier, I just saw your e-mail! [I too managed to miss that message]. > > It's true that the way I've done things, the "create_with_id" doesn't > take into account multi-threaded apps, since "self" is related to the > thread group leader. > > May be using something like /proc/self/task/<my_tid>/next_id would be > better, but I have to think more about it... That /proc/self links to /proc/TGID slipped my mind. Definitely must be done on a per-thread basis (and /proc/<TGID>/task/<PID>/next_id will do the trick). Oren. > > Regards, > Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org> @ 2008-04-15 10:30 ` Nadia Derbey 2008-04-18 5:46 ` Nadia Derbey 1 sibling, 0 replies; 31+ messages in thread From: Nadia Derbey @ 2008-04-15 10:30 UTC (permalink / raw) To: Nick Andrew Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Nick Andrew wrote: > On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: > >> . echo "LONG XX" > /proc/self/next_id >> next object to be created will have an id set to XX >> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id >> next object to be created will have its ids set to XX0, ... X<n-1> >> This is particularly useful for processes that may have several ids if >> they belong to nested namespaces. > > > How do you handle race conditions, i.e. you specify the ID for the > next object to be created, and then some other thread goes and creates > an object before your thread creates one? > > Nick. Sorry for not answering earlier, I just saw your e-mail! It's true that the way I've done things, the "create_with_id" doesn't take into account multi-threaded apps, since "self" is related to the thread group leader. May be using something like /proc/self/task/<my_tid>/next_id would be better, but I have to think more about it... Regards, Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org> 2008-04-15 10:30 ` Nadia Derbey @ 2008-04-18 5:46 ` Nadia Derbey 1 sibling, 0 replies; 31+ messages in thread From: Nadia Derbey @ 2008-04-18 5:46 UTC (permalink / raw) To: Nick Andrew Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Nick Andrew wrote: > On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: > >> . echo "LONG XX" > /proc/self/next_id >> next object to be created will have an id set to XX >> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id >> next object to be created will have its ids set to XX0, ... X<n-1> >> This is particularly useful for processes that may have several ids if >> they belong to nested namespaces. > > > How do you handle race conditions, i.e. you specify the ID for the > next object to be created, and then some other thread goes and creates > an object before your thread creates one? > > Nick. OK, race problem between threads is fixed. Thanks for finding the issue! The new patch series is coming next. Regards, Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id 2008-04-15 3:06 ` Nick Andrew 2008-04-15 10:30 ` Nadia Derbey [not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org> @ 2008-04-18 5:46 ` Nadia Derbey 2 siblings, 0 replies; 31+ messages in thread From: Nadia Derbey @ 2008-04-18 5:46 UTC (permalink / raw) To: Nick Andrew; +Cc: linux-kernel, containers, orenl Nick Andrew wrote: > On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey@bull.net wrote: > >> . echo "LONG XX" > /proc/self/next_id >> next object to be created will have an id set to XX >> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id >> next object to be created will have its ids set to XX0, ... X<n-1> >> This is particularly useful for processes that may have several ids if >> they belong to nested namespaces. > > > How do you handle race conditions, i.e. you specify the ID for the > next object to be created, and then some other thread goes and creates > an object before your thread creates one? > > Nick. OK, race problem between threads is fixed. Thanks for finding the issue! The new patch series is coming next. Regards, Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 0/4] Object creation with a specified id
@ 2008-04-04 14:51 Nadia.Derbey-6ktuUTfB/bM
0 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Hi,
When restarting a process that has been previously checkpointed, that process
should keep on using some of its ids (such as its process id, or sysV ipc ids).
This patch provides a feature that can help ensuring this saved state reuse:
it makes it possible to create an object with a pre-defined id.
A first implementation had been proposed 2 months ago. It consisted in
changing an object's id after it had been created.
Here is a second implementation based on Oren Ladaan's idea: Oren's suggestion
was to force an object's id during its creation, rather than 1. create it,
2. change its id.
A new file is created in procfs: /proc/self/next_id.
When this file is filled with and id value, a structure pointed to by the
calling task struct is filled with that id.
Then, when an object supporting this feature is created, the id present in
that new structure is used, instead of the default one.
The syntax is one of:
. echo "LONG XX" > /proc/self/next_id
next object to be created will have an id set to XX
. echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
next object to be created will have its ids set to XX0, ... X<n-1>
This is particularly useful for processes that may have several ids if
they belong to nested namespaces.
The objects covered here are ipc objects and processes.
Today, the ids are specified as long, but having a type string specified in
the next_id file makes it possible to cover more types in the future, if
needed.
The patches are against 2.6.25-rc3-mm1, in the following order:
[PATCH 1/4] adds the procfs facility for next object to be created, this
object being associated to a single id.
[PATCH 2/4] enhances the procfs facility for objects associated to multiple
ids (like processes).
[PATCH 3/4] makes use of the specified id (if any) to allocate the new IPC
object (changes the ipc_addid() path).
[PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) for a newly
allocated process (changes the alloc_pid()/alloc_pidmap() paths).
Any comment and/or suggestions are welcome.
Regards,
Nadia
--
--
^ permalink raw reply [flat|nested] 31+ messages in thread* [RFC][PATCH 0/4] Object creation with a specified id
@ 2008-03-10 13:50 Nadia.Derbey-6ktuUTfB/bM
[not found] ` <20080310135054.312992000-6ktuUTfB/bM@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-03-10 13:50 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: xemul-GEFAQzZX7r8dnm+yROfE0A
A couple of weeks ago, a discussion has started after Pierre's proposal for
a new syscall to change an ipc id (see thread
http://lkml.org/lkml/2008/1/29/209).
Oren's suggestion was to force an object's id during its creation, rather
than 1. create it, 2. change its id.
So here is an implementation of what Oren has suggested.
2 new files are defined under /proc/self:
. next_ipcid --> next id to use for ipc object creation
. next_pids --> next upid nr(s) to use for next task to be forked
(see patch #2 for more details).
When one of these files (or both of them) is filled, a structure pointed to
by the calling task struct is filled with these ids.
Then, when the object is created, the id(s) present in that structure are
used, instead of the default ones.
The patches are against 2.6.25-rc3-mm1, in the following order:
[PATCH 1/4] adds the procfs facility for next ipc to be created.
[PATCH 2/4] adds the procfs facility for next task to be forked.
[PATCH 3/4] makes use of the specified id (if any) to allocate the new IPC
object (changes the ipc_addid() path).
[PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) for a newly
allocated process (changes the alloc_pid()/alloc_pidmap() paths).
Any comment and/or suggestions are welcome.
Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
Regards,
Nadia
--
--
^ permalink raw reply [flat|nested] 31+ messages in thread[parent not found: <20080310135054.312992000-6ktuUTfB/bM@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <20080310135054.312992000-6ktuUTfB/bM@public.gmane.org> @ 2008-03-13 23:16 ` Oren Laadan [not found] ` <47D9B5B7.6060803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Oren Laadan @ 2008-03-13 23:16 UTC (permalink / raw) To: Nadia.Derbey-6ktuUTfB/bM Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: > A couple of weeks ago, a discussion has started after Pierre's proposal for > a new syscall to change an ipc id (see thread > http://lkml.org/lkml/2008/1/29/209). > > > Oren's suggestion was to force an object's id during its creation, rather > than 1. create it, 2. change its id. > > So here is an implementation of what Oren has suggested. > > 2 new files are defined under /proc/self: > . next_ipcid --> next id to use for ipc object creation > . next_pids --> next upid nr(s) to use for next task to be forked > (see patch #2 for more details). Generally looks good. One meta-comment, though: I wonder why you use separate files for separate resources, and why you'd want to write multiple identifiers in one go; it seems to complicate the code and interface with minimal gain. In practice, a process will only do either one or the other, so a single file is enough (e.g. "next_id"). Also, writing a single value at a time followed by the syscall is enough; it's definitely not a performance issue to have multiple calls. We assume the user/caller knows what she's doing, so no need to classify the identifier (that is, tell the kernel it's a pid, or an ipc id) ahead of time. The caller simply writes a value and then calls the relevant syscall, or otherwise the results may not be what she expected... If such context is expected to be required (although I don't see any at the moment), we can require that the user write "TYPE VALUE" pair to the "next_id" file. > > When one of these files (or both of them) is filled, a structure pointed to > by the calling task struct is filled with these ids. > > Then, when the object is created, the id(s) present in that structure are > used, instead of the default ones. > > The patches are against 2.6.25-rc3-mm1, in the following order: > > [PATCH 1/4] adds the procfs facility for next ipc to be created. > [PATCH 2/4] adds the procfs facility for next task to be forked. > [PATCH 3/4] makes use of the specified id (if any) to allocate the new IPC > object (changes the ipc_addid() path). > [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) for a newly > allocated process (changes the alloc_pid()/alloc_pidmap() paths). > > Any comment and/or suggestions are welcome. > > Cc-ing Pavel and Sukadev, since they are the pid namespace authors. > > Regards, > Nadia > > -- > > -- ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <47D9B5B7.6060803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47D9B5B7.6060803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-03-14 6:21 ` Nadia Derbey [not found] ` <47DA195B.8070704-6ktuUTfB/bM@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Nadia Derbey @ 2008-03-14 6:21 UTC (permalink / raw) To: Oren Laadan Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A Oren Laadan wrote: > > > Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: > >> A couple of weeks ago, a discussion has started after Pierre's >> proposal for >> a new syscall to change an ipc id (see thread >> http://lkml.org/lkml/2008/1/29/209). >> >> >> Oren's suggestion was to force an object's id during its creation, rather >> than 1. create it, 2. change its id. >> >> So here is an implementation of what Oren has suggested. >> >> 2 new files are defined under /proc/self: >> . next_ipcid --> next id to use for ipc object creation >> . next_pids --> next upid nr(s) to use for next task to be forked >> (see patch #2 for more details). > > > Generally looks good. One meta-comment, though: > > I wonder why you use separate files for separate resources, That would be needed in a situation wheere we don't care about next, say, ipc id to be created but we need a predefined pid. But I must admit I don't see any pratical application to it. > and why you'd > want to write multiple identifiers in one go; I used multiple identifiers only for the pid values: this is because when a new pid value is allocated for a process that belongs to nested namespaces, the lower level upid nr values are allocated in a single shot. (see alloc_pid()). > it seems to complicate the > code and interface with minimal gain. > In practice, a process will only do either one or the other, so a single > file is enough (e.g. "next_id"). > Also, writing a single value at a time followed by the syscall is enough; > it's definitely not a performance issue to have multiple calls. > We assume the user/caller knows what she's doing, so no need to classify > the identifier (that is, tell the kernel it's a pid, or an ipc id) ahead > of time. The caller simply writes a value and then calls the relevant > syscall, or otherwise the results may not be what she expected... > If such context is expected to be required (although I don't see any at > the moment), we can require that the user write "TYPE VALUE" pair to > the "next_id" file. That's exactly what I wanted to avoid by creating 1 file per object. Now, it's true that in a restart context where I guess that things will be done synchronously, we could have a single next_id file. > >> >> When one of these files (or both of them) is filled, a structure >> pointed to >> by the calling task struct is filled with these ids. >> >> Then, when the object is created, the id(s) present in that structure are >> used, instead of the default ones. >> >> The patches are against 2.6.25-rc3-mm1, in the following order: >> >> [PATCH 1/4] adds the procfs facility for next ipc to be created. >> [PATCH 2/4] adds the procfs facility for next task to be forked. >> [PATCH 3/4] makes use of the specified id (if any) to allocate the new >> IPC >> object (changes the ipc_addid() path). >> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) >> for a newly >> allocated process (changes the alloc_pid()/alloc_pidmap() >> paths). >> >> Any comment and/or suggestions are welcome. >> >> Cc-ing Pavel and Sukadev, since they are the pid namespace authors. >> >> Regards, >> Nadia >> >> -- >> >> -- > > > Regards, Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <47DA195B.8070704-6ktuUTfB/bM@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DA195B.8070704-6ktuUTfB/bM@public.gmane.org> @ 2008-03-14 15:50 ` Oren Laadan [not found] ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Oren Laadan @ 2008-03-14 15:50 UTC (permalink / raw) To: Nadia Derbey Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A Nadia Derbey wrote: > Oren Laadan wrote: >> >> >> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >> >>> A couple of weeks ago, a discussion has started after Pierre's >>> proposal for >>> a new syscall to change an ipc id (see thread >>> http://lkml.org/lkml/2008/1/29/209). >>> >>> >>> Oren's suggestion was to force an object's id during its creation, >>> rather >>> than 1. create it, 2. change its id. >>> >>> So here is an implementation of what Oren has suggested. >>> >>> 2 new files are defined under /proc/self: >>> . next_ipcid --> next id to use for ipc object creation >>> . next_pids --> next upid nr(s) to use for next task to be forked >>> (see patch #2 for more details). >> >> >> Generally looks good. One meta-comment, though: >> >> I wonder why you use separate files for separate resources, > > That would be needed in a situation wheere we don't care about next, > say, ipc id to be created but we need a predefined pid. But I must admit > I don't see any pratical application to it. exactly; why set the next-ipc value so far in advance ? I think it's better (and less confusing) if we require that setting the next-id value be done right before the respective syscall. > >> and why you'd >> want to write multiple identifiers in one go; > > I used multiple identifiers only for the pid values: this is because > when a new pid value is allocated for a process that belongs to nested > namespaces, the lower level upid nr values are allocated in a single > shot. (see alloc_pid()). > >> it seems to complicate the >> code and interface with minimal gain. >> In practice, a process will only do either one or the other, so a single >> file is enough (e.g. "next_id"). >> Also, writing a single value at a time followed by the syscall is enough; >> it's definitely not a performance issue to have multiple calls. >> We assume the user/caller knows what she's doing, so no need to classify >> the identifier (that is, tell the kernel it's a pid, or an ipc id) ahead >> of time. The caller simply writes a value and then calls the relevant >> syscall, or otherwise the results may not be what she expected... >> If such context is expected to be required (although I don't see any at >> the moment), we can require that the user write "TYPE VALUE" pair to >> the "next_id" file. > > That's exactly what I wanted to avoid by creating 1 file per object. > Now, it's true that in a restart context where I guess that things will > be done synchronously, we could have a single next_id file. > >> >>> >>> When one of these files (or both of them) is filled, a structure >>> pointed to >>> by the calling task struct is filled with these ids. >>> >>> Then, when the object is created, the id(s) present in that structure >>> are >>> used, instead of the default ones. >>> >>> The patches are against 2.6.25-rc3-mm1, in the following order: >>> >>> [PATCH 1/4] adds the procfs facility for next ipc to be created. >>> [PATCH 2/4] adds the procfs facility for next task to be forked. >>> [PATCH 3/4] makes use of the specified id (if any) to allocate the >>> new IPC >>> object (changes the ipc_addid() path). >>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) >>> for a newly >>> allocated process (changes the alloc_pid()/alloc_pidmap() >>> paths). >>> >>> Any comment and/or suggestions are welcome. >>> >>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors. >>> >>> Regards, >>> Nadia >>> >>> -- >>> >>> -- >> >> >> > > > Regards, > Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-03-14 15:56 ` Pavel Emelyanov [not found] ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2008-03-14 16:11 ` Nadia Derbey 1 sibling, 1 reply; 31+ messages in thread From: Pavel Emelyanov @ 2008-03-14 15:56 UTC (permalink / raw) To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Oren Laadan wrote: > > Nadia Derbey wrote: >> Oren Laadan wrote: >>> >>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >>> >>>> A couple of weeks ago, a discussion has started after Pierre's >>>> proposal for >>>> a new syscall to change an ipc id (see thread >>>> http://lkml.org/lkml/2008/1/29/209). >>>> >>>> >>>> Oren's suggestion was to force an object's id during its creation, >>>> rather >>>> than 1. create it, 2. change its id. >>>> >>>> So here is an implementation of what Oren has suggested. >>>> >>>> 2 new files are defined under /proc/self: >>>> . next_ipcid --> next id to use for ipc object creation >>>> . next_pids --> next upid nr(s) to use for next task to be forked >>>> (see patch #2 for more details). >>> >>> Generally looks good. One meta-comment, though: >>> >>> I wonder why you use separate files for separate resources, >> That would be needed in a situation wheere we don't care about next, >> say, ipc id to be created but we need a predefined pid. But I must admit >> I don't see any pratical application to it. > > exactly; why set the next-ipc value so far in advance ? I think it's > better (and less confusing) if we require that setting the next-id value > be done right before the respective syscall. And race with some other syscall caller? This will only work if the next-ipc-id and the next-pid are on a task_struct. Are they (at least supposed to be such)? ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2008-03-14 16:02 ` Oren Laadan [not found] ` <47DAA1A6.6010509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 2008-03-14 16:11 ` Nadia Derbey 1 sibling, 1 reply; 31+ messages in thread From: Oren Laadan @ 2008-03-14 16:02 UTC (permalink / raw) To: Pavel Emelyanov; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Pavel Emelyanov wrote: > Oren Laadan wrote: >> Nadia Derbey wrote: >>> Oren Laadan wrote: >>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >>>> >>>>> A couple of weeks ago, a discussion has started after Pierre's >>>>> proposal for >>>>> a new syscall to change an ipc id (see thread >>>>> http://lkml.org/lkml/2008/1/29/209). >>>>> >>>>> >>>>> Oren's suggestion was to force an object's id during its creation, >>>>> rather >>>>> than 1. create it, 2. change its id. >>>>> >>>>> So here is an implementation of what Oren has suggested. >>>>> >>>>> 2 new files are defined under /proc/self: >>>>> . next_ipcid --> next id to use for ipc object creation >>>>> . next_pids --> next upid nr(s) to use for next task to be forked >>>>> (see patch #2 for more details). >>>> Generally looks good. One meta-comment, though: >>>> >>>> I wonder why you use separate files for separate resources, >>> That would be needed in a situation wheere we don't care about next, >>> say, ipc id to be created but we need a predefined pid. But I must admit >>> I don't see any pratical application to it. >> exactly; why set the next-ipc value so far in advance ? I think it's >> better (and less confusing) if we require that setting the next-id value >> be done right before the respective syscall. > > And race with some other syscall caller? This will only work if the next-ipc-id > and the next-pid are on a task_struct. Are they (at least supposed to be such)? yes. that's the first detail I looked for in the patch :) ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <47DAA1A6.6010509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DAA1A6.6010509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-03-14 16:08 ` Pavel Emelyanov 0 siblings, 0 replies; 31+ messages in thread From: Pavel Emelyanov @ 2008-03-14 16:08 UTC (permalink / raw) To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Oren Laadan wrote: > > Pavel Emelyanov wrote: >> Oren Laadan wrote: >>> Nadia Derbey wrote: >>>> Oren Laadan wrote: >>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >>>>> >>>>>> A couple of weeks ago, a discussion has started after Pierre's >>>>>> proposal for >>>>>> a new syscall to change an ipc id (see thread >>>>>> http://lkml.org/lkml/2008/1/29/209). >>>>>> >>>>>> >>>>>> Oren's suggestion was to force an object's id during its creation, >>>>>> rather >>>>>> than 1. create it, 2. change its id. >>>>>> >>>>>> So here is an implementation of what Oren has suggested. >>>>>> >>>>>> 2 new files are defined under /proc/self: >>>>>> . next_ipcid --> next id to use for ipc object creation >>>>>> . next_pids --> next upid nr(s) to use for next task to be forked >>>>>> (see patch #2 for more details). >>>>> Generally looks good. One meta-comment, though: >>>>> >>>>> I wonder why you use separate files for separate resources, >>>> That would be needed in a situation wheere we don't care about next, >>>> say, ipc id to be created but we need a predefined pid. But I must admit >>>> I don't see any pratical application to it. >>> exactly; why set the next-ipc value so far in advance ? I think it's >>> better (and less confusing) if we require that setting the next-id value >>> be done right before the respective syscall. >> And race with some other syscall caller? This will only work if the next-ipc-id >> and the next-pid are on a task_struct. Are they (at least supposed to be such)? > > yes. that's the first detail I looked for in the patch :) OK :) I just remembered some talks about using last_pid for pid allocations and just wanted to be sure. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2008-03-14 16:02 ` Oren Laadan @ 2008-03-14 16:11 ` Nadia Derbey 1 sibling, 0 replies; 31+ messages in thread From: Nadia Derbey @ 2008-03-14 16:11 UTC (permalink / raw) To: Pavel Emelyanov; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Pavel Emelyanov wrote: > Oren Laadan wrote: > >>Nadia Derbey wrote: >> >>>Oren Laadan wrote: >>> >>>>Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >>>> >>>> >>>>>A couple of weeks ago, a discussion has started after Pierre's >>>>>proposal for >>>>>a new syscall to change an ipc id (see thread >>>>>http://lkml.org/lkml/2008/1/29/209). >>>>> >>>>> >>>>>Oren's suggestion was to force an object's id during its creation, >>>>>rather >>>>>than 1. create it, 2. change its id. >>>>> >>>>>So here is an implementation of what Oren has suggested. >>>>> >>>>>2 new files are defined under /proc/self: >>>>> . next_ipcid --> next id to use for ipc object creation >>>>> . next_pids --> next upid nr(s) to use for next task to be forked >>>>> (see patch #2 for more details). >>>> >>>>Generally looks good. One meta-comment, though: >>>> >>>>I wonder why you use separate files for separate resources, >>> >>>That would be needed in a situation wheere we don't care about next, >>>say, ipc id to be created but we need a predefined pid. But I must admit >>>I don't see any pratical application to it. >> >>exactly; why set the next-ipc value so far in advance ? I think it's >>better (and less confusing) if we require that setting the next-id value >>be done right before the respective syscall. > > > And race with some other syscall caller? This will only work if the next-ipc-id > and the next-pid are on a task_struct. Are they (at least supposed to be such)? > > Yes they are. Regards, Nadia ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 2008-03-14 15:56 ` Pavel Emelyanov @ 2008-03-14 16:11 ` Nadia Derbey [not found] ` <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org> 1 sibling, 1 reply; 31+ messages in thread From: Nadia Derbey @ 2008-03-14 16:11 UTC (permalink / raw) To: Oren Laadan Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A Oren Laadan wrote: > > > Nadia Derbey wrote: > >> Oren Laadan wrote: >> >>> >>> >>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >>> >>>> A couple of weeks ago, a discussion has started after Pierre's >>>> proposal for >>>> a new syscall to change an ipc id (see thread >>>> http://lkml.org/lkml/2008/1/29/209). >>>> >>>> >>>> Oren's suggestion was to force an object's id during its creation, >>>> rather >>>> than 1. create it, 2. change its id. >>>> >>>> So here is an implementation of what Oren has suggested. >>>> >>>> 2 new files are defined under /proc/self: >>>> . next_ipcid --> next id to use for ipc object creation >>>> . next_pids --> next upid nr(s) to use for next task to be forked >>>> (see patch #2 for more details). >>> >>> >>> >>> Generally looks good. One meta-comment, though: >>> >>> I wonder why you use separate files for separate resources, >> >> >> That would be needed in a situation wheere we don't care about next, >> say, ipc id to be created but we need a predefined pid. But I must >> admit I don't see any pratical application to it. > > > exactly; why set the next-ipc value so far in advance ? I think it's > better (and less confusing) if we require that setting the next-id value > be done right before the respective syscall. Ok, but this "requirement" should be widely agreed upon ;-) What I mean here is that the solution with 1 file per "object type" can easily be extended imho: I don't know how the restart is supposed to work, but we can imagine feeding all these files with all the object ids just before restart and let the process pick up the objects ids as it needs them. Of course, this would require to enhance the files formats, as well as the way things are stored in the task_struct. Hope what I'm saying is not too stupid ;-) ? Regards, Nadia > >> >>> and why you'd >>> want to write multiple identifiers in one go; >> >> >> I used multiple identifiers only for the pid values: this is because >> when a new pid value is allocated for a process that belongs to nested >> namespaces, the lower level upid nr values are allocated in a single >> shot. (see alloc_pid()). >> >>> it seems to complicate the >>> code and interface with minimal gain. >>> In practice, a process will only do either one or the other, so a single >>> file is enough (e.g. "next_id"). >>> Also, writing a single value at a time followed by the syscall is >>> enough; >>> it's definitely not a performance issue to have multiple calls. >>> We assume the user/caller knows what she's doing, so no need to classify >>> the identifier (that is, tell the kernel it's a pid, or an ipc id) ahead >>> of time. The caller simply writes a value and then calls the relevant >>> syscall, or otherwise the results may not be what she expected... >>> If such context is expected to be required (although I don't see any at >>> the moment), we can require that the user write "TYPE VALUE" pair to >>> the "next_id" file. >> >> >> That's exactly what I wanted to avoid by creating 1 file per object. >> Now, it's true that in a restart context where I guess that things >> will be done synchronously, we could have a single next_id file. >> >>> >>>> >>>> When one of these files (or both of them) is filled, a structure >>>> pointed to >>>> by the calling task struct is filled with these ids. >>>> >>>> Then, when the object is created, the id(s) present in that >>>> structure are >>>> used, instead of the default ones. >>>> >>>> The patches are against 2.6.25-rc3-mm1, in the following order: >>>> >>>> [PATCH 1/4] adds the procfs facility for next ipc to be created. >>>> [PATCH 2/4] adds the procfs facility for next task to be forked. >>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the >>>> new IPC >>>> object (changes the ipc_addid() path). >>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) >>>> for a newly >>>> allocated process (changes the >>>> alloc_pid()/alloc_pidmap() paths). >>>> >>>> Any comment and/or suggestions are welcome. >>>> >>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors. >>>> >>>> Regards, >>>> Nadia >>>> >>>> -- >>>> >>>> -- >>> >>> >>> >>> >> >> >> Regards, >> Nadia > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org> @ 2008-03-14 16:45 ` Oren Laadan [not found] ` <47DAABAB.7000706-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Oren Laadan @ 2008-03-14 16:45 UTC (permalink / raw) To: Nadia Derbey Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A Nadia Derbey wrote: > Oren Laadan wrote: >> >> >> Nadia Derbey wrote: >> >>> Oren Laadan wrote: >>> >>>> >>>> >>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >>>> >>>>> A couple of weeks ago, a discussion has started after Pierre's >>>>> proposal for >>>>> a new syscall to change an ipc id (see thread >>>>> http://lkml.org/lkml/2008/1/29/209). >>>>> >>>>> >>>>> Oren's suggestion was to force an object's id during its creation, >>>>> rather >>>>> than 1. create it, 2. change its id. >>>>> >>>>> So here is an implementation of what Oren has suggested. >>>>> >>>>> 2 new files are defined under /proc/self: >>>>> . next_ipcid --> next id to use for ipc object creation >>>>> . next_pids --> next upid nr(s) to use for next task to be forked >>>>> (see patch #2 for more details). >>>> >>>> >>>> >>>> Generally looks good. One meta-comment, though: >>>> >>>> I wonder why you use separate files for separate resources, >>> >>> >>> That would be needed in a situation wheere we don't care about next, >>> say, ipc id to be created but we need a predefined pid. But I must >>> admit I don't see any pratical application to it. >> >> >> exactly; why set the next-ipc value so far in advance ? I think it's >> better (and less confusing) if we require that setting the next-id value >> be done right before the respective syscall. > > Ok, but this "requirement" should be widely agreed upon ;-) A discussion on the overall checkpoint/restart policy is certainly due (and increasingly noted recently). > What I mean here is that the solution with 1 file per "object type" can > easily be extended imho: I'm aiming at simplicity and minimal (but not restrictive) API for user space. I argue that we never really need more than one predetermined value at a time (eg see below), and the cost of setting such value is so small that there is no real benefit in setting more than one at a time (either via multiple files or via an array of values). If in fact you wanted more than one type at a time, you could still make it happen with a single file without adding many user-visible files in /proc/<pid>. So far, I can't think of any such identifier that we'd like to pre-set that does not fit into a "long" type; simply because the kernel does not use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To be on the safe side, we can require that the format be "long VAL", just in case (and later you could have other formats). The only exception, perhaps, is if a TCP connection is rebuilt with a, say, connect() syscall, and some information needs to be "predetermined" so we'll need to extend the format. That can be done with another type eg. "tcp ....." or a separate file (per your view), _then_, not now. (As a side note, I don't suggest that this is how TCP will be restored). In any event, the bottom line is that a single file, with a single value at a time (possibly annotated with a type), is the simplest, and isn't restrictive, for our purposes. Looking one step ahead, simplicity and minimal commitment to user space is important in trying to push this to the mainline kernel... > I don't know how the restart is supposed to work, but we can imagine > feeding all these files with all the object ids just before restart and Building on my own experience with zap I envision the restart operation of a given task occurring in the context of that task. (I assume this is how restart will work). Therefore, it makes much sense that before every syscall that requires a pre-determined resource identifier (eg. clone, ipc, pty allocation), the task will place the desired value in "next_id" (and that will only be meaningful during restart) and invoke the said syscall. Voila. Note that the restart will "rebuild" the container's state (and the task state) as it reads in the data from some source. It is likely that not all data will be available when the first said syscall is about to be invoked, so you may not be able to feed everything ahead of time. > let the process pick up the objects ids as it needs them. > Of course, this would require to enhance the files formats, as well as > the way things are stored in the task_struct. > > Hope what I'm saying is not too stupid ;-) ? > > Regards, > Nadia > >> >>> >>>> and why you'd >>>> want to write multiple identifiers in one go; >>> >>> >>> I used multiple identifiers only for the pid values: this is because >>> when a new pid value is allocated for a process that belongs to >>> nested namespaces, the lower level upid nr values are allocated in a >>> single shot. (see alloc_pid()). >>> >>>> it seems to complicate the >>>> code and interface with minimal gain. >>>> In practice, a process will only do either one or the other, so a >>>> single >>>> file is enough (e.g. "next_id"). >>>> Also, writing a single value at a time followed by the syscall is >>>> enough; >>>> it's definitely not a performance issue to have multiple calls. >>>> We assume the user/caller knows what she's doing, so no need to >>>> classify >>>> the identifier (that is, tell the kernel it's a pid, or an ipc id) >>>> ahead >>>> of time. The caller simply writes a value and then calls the relevant >>>> syscall, or otherwise the results may not be what she expected... >>>> If such context is expected to be required (although I don't see any at >>>> the moment), we can require that the user write "TYPE VALUE" pair to >>>> the "next_id" file. >>> >>> >>> That's exactly what I wanted to avoid by creating 1 file per object. >>> Now, it's true that in a restart context where I guess that things >>> will be done synchronously, we could have a single next_id file. >>> >>>> >>>>> >>>>> When one of these files (or both of them) is filled, a structure >>>>> pointed to >>>>> by the calling task struct is filled with these ids. >>>>> >>>>> Then, when the object is created, the id(s) present in that >>>>> structure are >>>>> used, instead of the default ones. >>>>> >>>>> The patches are against 2.6.25-rc3-mm1, in the following order: >>>>> >>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created. >>>>> [PATCH 2/4] adds the procfs facility for next task to be forked. >>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the >>>>> new IPC >>>>> object (changes the ipc_addid() path). >>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) >>>>> for a newly >>>>> allocated process (changes the >>>>> alloc_pid()/alloc_pidmap() paths). >>>>> >>>>> Any comment and/or suggestions are welcome. >>>>> >>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors. >>>>> >>>>> Regards, >>>>> Nadia >>>>> >>>>> -- >>>>> >>>>> -- >>>> >>>> >>>> >>>> >>> >>> >>> Regards, >>> Nadia >> >> >> > > ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <47DAABAB.7000706-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DAABAB.7000706-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-03-16 3:43 ` Serge E. Hallyn [not found] ` <20080316034320.GA19793-6s5zFf/epYLPQpwDFJZrxFMas7LaWZ9n@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Serge E. Hallyn @ 2008-03-16 3:43 UTC (permalink / raw) To: Oren Laadan Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): > > > Nadia Derbey wrote: > > Oren Laadan wrote: > >> > >> > >> Nadia Derbey wrote: > >> > >>> Oren Laadan wrote: > >>> > >>>> > >>>> > >>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: > >>>> > >>>>> A couple of weeks ago, a discussion has started after Pierre's > >>>>> proposal for > >>>>> a new syscall to change an ipc id (see thread > >>>>> http://lkml.org/lkml/2008/1/29/209). > >>>>> > >>>>> > >>>>> Oren's suggestion was to force an object's id during its creation, > >>>>> rather > >>>>> than 1. create it, 2. change its id. > >>>>> > >>>>> So here is an implementation of what Oren has suggested. > >>>>> > >>>>> 2 new files are defined under /proc/self: > >>>>> . next_ipcid --> next id to use for ipc object creation > >>>>> . next_pids --> next upid nr(s) to use for next task to be forked > >>>>> (see patch #2 for more details). > >>>> > >>>> > >>>> > >>>> Generally looks good. One meta-comment, though: > >>>> > >>>> I wonder why you use separate files for separate resources, > >>> > >>> > >>> That would be needed in a situation wheere we don't care about next, > >>> say, ipc id to be created but we need a predefined pid. But I must > >>> admit I don't see any pratical application to it. > >> > >> > >> exactly; why set the next-ipc value so far in advance ? I think it's > >> better (and less confusing) if we require that setting the next-id value > >> be done right before the respective syscall. > > > > Ok, but this "requirement" should be widely agreed upon ;-) > > A discussion on the overall checkpoint/restart policy is certainly due > (and increasingly noted recently). > > > What I mean here is that the solution with 1 file per "object type" can > > easily be extended imho: > > I'm aiming at simplicity and minimal (but not restrictive) API for user > space. I argue that we never really need more than one predetermined value > at a time (eg see below), and the cost of setting such value is so small > that there is no real benefit in setting more than one at a time (either > via multiple files or via an array of values). If in fact you wanted more > than one type at a time, you could still make it happen with a single > file without adding many user-visible files in /proc/<pid>. > > So far, I can't think of any such identifier that we'd like to pre-set > that does not fit into a "long" type; As Nadia has mentioned, if we have checkpointed a container which has another pid namespace underneath itself, then we will need to restart some tasks with two predetermined pids. So we'll need two (or more) longs for the tasks in deeper namespaces. > simply because the kernel does not > use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To > be on the safe side, we can require that the format be "long VAL", just > in case (and later you could have other formats). > > The only exception, perhaps, is if a TCP connection is rebuilt with a, > say, connect() syscall, and some information needs to be "predetermined" > so we'll need to extend the format. That can be done with another type > eg. "tcp ....." or a separate file (per your view), _then_, not now. > (As a side note, I don't suggest that this is how TCP will be restored). > > In any event, the bottom line is that a single file, with a single > value at a time (possibly annotated with a type), is the simplest, and > isn't restrictive, for our purposes. Looking one step ahead, simplicity > and minimal commitment to user space is important in trying to push this > to the mainline kernel... > > > I don't know how the restart is supposed to work, but we can imagine > > feeding all these files with all the object ids just before restart and > > Building on my own experience with zap I envision the restart operation > of a given task occurring in the context of that task. Could be, but not necessarily the case. Eric has mentioned using elf files for restart, and that's one way to go, but whether one central restart task sets up all the children or the children set themselves up is yet another design point we haven't decided. I would think that with a centralized restart it would be easier to assure for instance that shared anon pages would be properly set up and shared, but since you advocate each-task-starts-itself I trust zap must handle that. > (I assume this is > how restart will work). Therefore, it makes much sense that before every > syscall that requires a pre-determined resource identifier (eg. clone, > ipc, pty allocation), the task will place the desired value in "next_id" > (and that will only be meaningful during restart) and invoke the said > syscall. Voila. > > Note that the restart will "rebuild" the container's state (and the task > state) as it reads in the data from some source. It is likely that not > all data will be available when the first said syscall is about to be > invoked, so you may not be able to feed everything ahead of time. > > > > let the process pick up the objects ids as it needs them. > > Of course, this would require to enhance the files formats, as well as > > the way things are stored in the task_struct. > > > > Hope what I'm saying is not too stupid ;-) ? > > > > Regards, > > Nadia > > > >> > >>> > >>>> and why you'd > >>>> want to write multiple identifiers in one go; > >>> > >>> > >>> I used multiple identifiers only for the pid values: this is because > >>> when a new pid value is allocated for a process that belongs to > >>> nested namespaces, the lower level upid nr values are allocated in a > >>> single shot. (see alloc_pid()). > >>> > >>>> it seems to complicate the > >>>> code and interface with minimal gain. > >>>> In practice, a process will only do either one or the other, so a > >>>> single > >>>> file is enough (e.g. "next_id"). > >>>> Also, writing a single value at a time followed by the syscall is > >>>> enough; > >>>> it's definitely not a performance issue to have multiple calls. > >>>> We assume the user/caller knows what she's doing, so no need to > >>>> classify > >>>> the identifier (that is, tell the kernel it's a pid, or an ipc id) > >>>> ahead > >>>> of time. The caller simply writes a value and then calls the relevant > >>>> syscall, or otherwise the results may not be what she expected... > >>>> If such context is expected to be required (although I don't see any at > >>>> the moment), we can require that the user write "TYPE VALUE" pair to > >>>> the "next_id" file. > >>> > >>> > >>> That's exactly what I wanted to avoid by creating 1 file per object. > >>> Now, it's true that in a restart context where I guess that things > >>> will be done synchronously, we could have a single next_id file. > >>> > >>>> > >>>>> > >>>>> When one of these files (or both of them) is filled, a structure > >>>>> pointed to > >>>>> by the calling task struct is filled with these ids. > >>>>> > >>>>> Then, when the object is created, the id(s) present in that > >>>>> structure are > >>>>> used, instead of the default ones. > >>>>> > >>>>> The patches are against 2.6.25-rc3-mm1, in the following order: > >>>>> > >>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created. > >>>>> [PATCH 2/4] adds the procfs facility for next task to be forked. > >>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the > >>>>> new IPC > >>>>> object (changes the ipc_addid() path). > >>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) > >>>>> for a newly > >>>>> allocated process (changes the > >>>>> alloc_pid()/alloc_pidmap() paths). > >>>>> > >>>>> Any comment and/or suggestions are welcome. > >>>>> > >>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors. > >>>>> > >>>>> Regards, > >>>>> Nadia > >>>>> > >>>>> -- > >>>>> > >>>>> -- > >>>> > >>>> > >>>> > >>>> > >>> > >>> > >>> Regards, > >>> Nadia > >> > >> > >> > > > > > _______________________________________________ > Containers mailing list > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org > https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <20080316034320.GA19793-6s5zFf/epYLPQpwDFJZrxFMas7LaWZ9n@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <20080316034320.GA19793-6s5zFf/epYLPQpwDFJZrxFMas7LaWZ9n@public.gmane.org> @ 2008-03-16 19:08 ` Oren Laadan [not found] ` <47DD703C.4030809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Oren Laadan @ 2008-03-16 19:08 UTC (permalink / raw) To: Serge E. Hallyn Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A Serge E. Hallyn wrote: > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): >> >> Nadia Derbey wrote: >>> Oren Laadan wrote: >>>> >>>> Nadia Derbey wrote: >>>> >>>>> Oren Laadan wrote: >>>>> >>>>>> >>>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >>>>>> >>>>>>> A couple of weeks ago, a discussion has started after Pierre's >>>>>>> proposal for >>>>>>> a new syscall to change an ipc id (see thread >>>>>>> http://lkml.org/lkml/2008/1/29/209). >>>>>>> >>>>>>> >>>>>>> Oren's suggestion was to force an object's id during its creation, >>>>>>> rather >>>>>>> than 1. create it, 2. change its id. >>>>>>> >>>>>>> So here is an implementation of what Oren has suggested. >>>>>>> >>>>>>> 2 new files are defined under /proc/self: >>>>>>> . next_ipcid --> next id to use for ipc object creation >>>>>>> . next_pids --> next upid nr(s) to use for next task to be forked >>>>>>> (see patch #2 for more details). >>>>>> >>>>>> >>>>>> Generally looks good. One meta-comment, though: >>>>>> >>>>>> I wonder why you use separate files for separate resources, >>>>> >>>>> That would be needed in a situation wheere we don't care about next, >>>>> say, ipc id to be created but we need a predefined pid. But I must >>>>> admit I don't see any pratical application to it. >>>> >>>> exactly; why set the next-ipc value so far in advance ? I think it's >>>> better (and less confusing) if we require that setting the next-id value >>>> be done right before the respective syscall. >>> Ok, but this "requirement" should be widely agreed upon ;-) >> A discussion on the overall checkpoint/restart policy is certainly due >> (and increasingly noted recently). >> >>> What I mean here is that the solution with 1 file per "object type" can >>> easily be extended imho: >> I'm aiming at simplicity and minimal (but not restrictive) API for user >> space. I argue that we never really need more than one predetermined value >> at a time (eg see below), and the cost of setting such value is so small >> that there is no real benefit in setting more than one at a time (either >> via multiple files or via an array of values). If in fact you wanted more >> than one type at a time, you could still make it happen with a single >> file without adding many user-visible files in /proc/<pid>. >> >> So far, I can't think of any such identifier that we'd like to pre-set >> that does not fit into a "long" type; > > As Nadia has mentioned, if we have checkpointed a container which has > another pid namespace underneath itself, then we will need to restart > some tasks with two predetermined pids. So we'll need two (or more) > longs for the tasks in deeper namespaces. I see. So more than a single "long" type is probably needed. I'd still prefer that the "scope" of a preset identifier through "next_id" should be the subsequent syscall; so if you need multiple values for the next syscall you use it, but you don't support leftovers for the next syscall to use. The typing system can be something like "long VAL" and then for array "long* VAL VAL VAL ...", for instance. > >> simply because the kernel does not >> use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To >> be on the safe side, we can require that the format be "long VAL", just >> in case (and later you could have other formats). >> >> The only exception, perhaps, is if a TCP connection is rebuilt with a, >> say, connect() syscall, and some information needs to be "predetermined" >> so we'll need to extend the format. That can be done with another type >> eg. "tcp ....." or a separate file (per your view), _then_, not now. >> (As a side note, I don't suggest that this is how TCP will be restored). >> >> In any event, the bottom line is that a single file, with a single >> value at a time (possibly annotated with a type), is the simplest, and >> isn't restrictive, for our purposes. Looking one step ahead, simplicity >> and minimal commitment to user space is important in trying to push this >> to the mainline kernel... >> >>> I don't know how the restart is supposed to work, but we can imagine >>> feeding all these files with all the object ids just before restart and >> Building on my own experience with zap I envision the restart operation >> of a given task occurring in the context of that task. > > Could be, but not necessarily the case. Eric has mentioned using elf > files for restart, and that's one way to go, but whether one central I'm not familiar with the details of this. > restart task sets up all the children or the children set themselves up > is yet another design point we haven't decided. I would think that > with a centralized restart it would be easier to assure for instance > that shared anon pages would be properly set up and shared, but since > you advocate each-task-starts-itself I trust zap must handle that. The main reason I think a task should setup itself, is because most of the setup requires that new resources be allocated, and the kernel is already centered around this approach that a task allocates for itself, not for another task. For instance, if you need to restore a VMA, you simply call mmap(), a new file, you call open() etc. Shared anon pages are one example of shared resources that may be used by multiple processes. Zap's approach is to have the "first" user (in the sense of the first time the resource is seen during checkpoint) do the actual restore, and place it in a global table, and then subsequent tasks will find it in the table and "map" it into their view. Decentralizing also allow multiple tasks to restart concurrently. Are we ready to start concrete discussion on the architecture for the checkpoint/restart ? (and if so .. time to change the subject line). > >> (I assume this is >> how restart will work). Therefore, it makes much sense that before every >> syscall that requires a pre-determined resource identifier (eg. clone, >> ipc, pty allocation), the task will place the desired value in "next_id" >> (and that will only be meaningful during restart) and invoke the said >> syscall. Voila. >> >> Note that the restart will "rebuild" the container's state (and the task >> state) as it reads in the data from some source. It is likely that not >> all data will be available when the first said syscall is about to be >> invoked, so you may not be able to feed everything ahead of time. >> >> >>> let the process pick up the objects ids as it needs them. >>> Of course, this would require to enhance the files formats, as well as >>> the way things are stored in the task_struct. >>> >>> Hope what I'm saying is not too stupid ;-) ? >>> >>> Regards, >>> Nadia >>> >>>>>> and why you'd >>>>>> want to write multiple identifiers in one go; >>>>> >>>>> I used multiple identifiers only for the pid values: this is because >>>>> when a new pid value is allocated for a process that belongs to >>>>> nested namespaces, the lower level upid nr values are allocated in a >>>>> single shot. (see alloc_pid()). >>>>> >>>>>> it seems to complicate the >>>>>> code and interface with minimal gain. >>>>>> In practice, a process will only do either one or the other, so a >>>>>> single >>>>>> file is enough (e.g. "next_id"). >>>>>> Also, writing a single value at a time followed by the syscall is >>>>>> enough; >>>>>> it's definitely not a performance issue to have multiple calls. >>>>>> We assume the user/caller knows what she's doing, so no need to >>>>>> classify >>>>>> the identifier (that is, tell the kernel it's a pid, or an ipc id) >>>>>> ahead >>>>>> of time. The caller simply writes a value and then calls the relevant >>>>>> syscall, or otherwise the results may not be what she expected... >>>>>> If such context is expected to be required (although I don't see any at >>>>>> the moment), we can require that the user write "TYPE VALUE" pair to >>>>>> the "next_id" file. >>>>> >>>>> That's exactly what I wanted to avoid by creating 1 file per object. >>>>> Now, it's true that in a restart context where I guess that things >>>>> will be done synchronously, we could have a single next_id file. >>>>> >>>>>>> When one of these files (or both of them) is filled, a structure >>>>>>> pointed to >>>>>>> by the calling task struct is filled with these ids. >>>>>>> >>>>>>> Then, when the object is created, the id(s) present in that >>>>>>> structure are >>>>>>> used, instead of the default ones. >>>>>>> >>>>>>> The patches are against 2.6.25-rc3-mm1, in the following order: >>>>>>> >>>>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created. >>>>>>> [PATCH 2/4] adds the procfs facility for next task to be forked. >>>>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the >>>>>>> new IPC >>>>>>> object (changes the ipc_addid() path). >>>>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) >>>>>>> for a newly >>>>>>> allocated process (changes the >>>>>>> alloc_pid()/alloc_pidmap() paths). >>>>>>> >>>>>>> Any comment and/or suggestions are welcome. >>>>>>> >>>>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors. >>>>>>> >>>>>>> Regards, >>>>>>> Nadia >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> -- >>>>>> >>>>>> >>>>>> >>>>> >>>>> Regards, >>>>> Nadia >>>> >>>> >>> >> _______________________________________________ >> Containers mailing list >> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org >> https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <47DD703C.4030809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: [RFC][PATCH 0/4] Object creation with a specified id [not found] ` <47DD703C.4030809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-03-17 14:44 ` Serge E. Hallyn 0 siblings, 0 replies; 31+ messages in thread From: Serge E. Hallyn @ 2008-03-17 14:44 UTC (permalink / raw) To: Oren Laadan Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): > > > Serge E. Hallyn wrote: >> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): >>> >>> Nadia Derbey wrote: >>>> Oren Laadan wrote: >>>>> >>>>> Nadia Derbey wrote: >>>>> >>>>>> Oren Laadan wrote: >>>>>> >>>>>>> >>>>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote: >>>>>>> >>>>>>>> A couple of weeks ago, a discussion has started after Pierre's >>>>>>>> proposal for >>>>>>>> a new syscall to change an ipc id (see thread >>>>>>>> http://lkml.org/lkml/2008/1/29/209). >>>>>>>> >>>>>>>> >>>>>>>> Oren's suggestion was to force an object's id during its creation, >>>>>>>> rather >>>>>>>> than 1. create it, 2. change its id. >>>>>>>> >>>>>>>> So here is an implementation of what Oren has suggested. >>>>>>>> >>>>>>>> 2 new files are defined under /proc/self: >>>>>>>> . next_ipcid --> next id to use for ipc object creation >>>>>>>> . next_pids --> next upid nr(s) to use for next task to be forked >>>>>>>> (see patch #2 for more details). >>>>>>> >>>>>>> >>>>>>> Generally looks good. One meta-comment, though: >>>>>>> >>>>>>> I wonder why you use separate files for separate resources, >>>>>> >>>>>> That would be needed in a situation wheere we don't care about next, >>>>>> say, ipc id to be created but we need a predefined pid. But I must >>>>>> admit I don't see any pratical application to it. >>>>> >>>>> exactly; why set the next-ipc value so far in advance ? I think it's >>>>> better (and less confusing) if we require that setting the next-id >>>>> value >>>>> be done right before the respective syscall. >>>> Ok, but this "requirement" should be widely agreed upon ;-) >>> A discussion on the overall checkpoint/restart policy is certainly due >>> (and increasingly noted recently). >>> >>>> What I mean here is that the solution with 1 file per "object type" can >>>> easily be extended imho: >>> I'm aiming at simplicity and minimal (but not restrictive) API for user >>> space. I argue that we never really need more than one predetermined >>> value >>> at a time (eg see below), and the cost of setting such value is so small >>> that there is no real benefit in setting more than one at a time (either >>> via multiple files or via an array of values). If in fact you wanted more >>> than one type at a time, you could still make it happen with a single >>> file without adding many user-visible files in /proc/<pid>. >>> >>> So far, I can't think of any such identifier that we'd like to pre-set >>> that does not fit into a "long" type; >> As Nadia has mentioned, if we have checkpointed a container which has >> another pid namespace underneath itself, then we will need to restart >> some tasks with two predetermined pids. So we'll need two (or more) >> longs for the tasks in deeper namespaces. > > I see. So more than a single "long" type is probably needed. I'd still > prefer that the "scope" of a preset identifier through "next_id" should > be the subsequent syscall; > so if you need multiple values for the next > syscall you use it, but you don't support leftovers for the next syscall > to use. Agreed. > The typing system can be something like "long VAL" and then for > array "long* VAL VAL VAL ...", for instance. > >>> simply because the kernel does not >>> use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To >>> be on the safe side, we can require that the format be "long VAL", just >>> in case (and later you could have other formats). >>> >>> The only exception, perhaps, is if a TCP connection is rebuilt with a, >>> say, connect() syscall, and some information needs to be "predetermined" >>> so we'll need to extend the format. That can be done with another type >>> eg. "tcp ....." or a separate file (per your view), _then_, not now. >>> (As a side note, I don't suggest that this is how TCP will be restored). >>> >>> In any event, the bottom line is that a single file, with a single >>> value at a time (possibly annotated with a type), is the simplest, and >>> isn't restrictive, for our purposes. Looking one step ahead, simplicity >>> and minimal commitment to user space is important in trying to push this >>> to the mainline kernel... >>> >>>> I don't know how the restart is supposed to work, but we can imagine >>>> feeding all these files with all the object ids just before restart and >>> Building on my own experience with zap I envision the restart operation >>> of a given task occurring in the context of that task. >> Could be, but not necessarily the case. Eric has mentioned using elf >> files for restart, and that's one way to go, but whether one central > > I'm not familiar with the details of this. Well he wasn't specific and I'm not sure what his details were, I just pictured it the way crack and other userspace c/r systems have worked, where the checkpoint creates and ELF which you execute to restart the task(set). >> restart task sets up all the children or the children set themselves up >> is yet another design point we haven't decided. I would think that >> with a centralized restart it would be easier to assure for instance >> that shared anon pages would be properly set up and shared, but since >> you advocate each-task-starts-itself I trust zap must handle that. > > The main reason I think a task should setup itself, is because most of > the setup requires that new resources be allocated, and the kernel is > already centered around this approach that a task allocates for itself, > not for another task. For instance, if you need to restore a VMA, you > simply call mmap(), a new file, you call open() etc. Agreed, it does seem cleaner, and if we go with the "sys_create_id()" approach then clearly that's where we're aiming. > Shared anon pages are one example of shared resources that may be used > by multiple processes. Zap's approach is to have the "first" user (in > the sense of the first time the resource is seen during checkpoint) do > the actual restore, and place it in a global table, and then subsequent > tasks will find it in the table and "map" it into their view. Makes sense. > Decentralizing also allow multiple tasks to restart concurrently. Yes, but we lose that if we force create_with_pid() to be implemented by setting /proc/sys/whatever/pid_min and max :) > Are we ready to start concrete discussion on the architecture for the > checkpoint/restart ? (and if so .. time to change the subject line). Good news on this topic - unofficial word is that the containers mini-summit at OLS has been approved. They don't yet know whether it will be monday or tuesday, but hopefully this is enough information early enough for anyone needing to make/change travel plans. thanks, -serge >>> (I assume this is >>> how restart will work). Therefore, it makes much sense that before every >>> syscall that requires a pre-determined resource identifier (eg. clone, >>> ipc, pty allocation), the task will place the desired value in "next_id" >>> (and that will only be meaningful during restart) and invoke the said >>> syscall. Voila. >>> >>> Note that the restart will "rebuild" the container's state (and the task >>> state) as it reads in the data from some source. It is likely that not >>> all data will be available when the first said syscall is about to be >>> invoked, so you may not be able to feed everything ahead of time. >>> >>> >>>> let the process pick up the objects ids as it needs them. >>>> Of course, this would require to enhance the files formats, as well as >>>> the way things are stored in the task_struct. >>>> >>>> Hope what I'm saying is not too stupid ;-) ? >>>> >>>> Regards, >>>> Nadia >>>> >>>>>>> and why you'd >>>>>>> want to write multiple identifiers in one go; >>>>>> >>>>>> I used multiple identifiers only for the pid values: this is because >>>>>> when a new pid value is allocated for a process that belongs to nested >>>>>> namespaces, the lower level upid nr values are allocated in a single >>>>>> shot. (see alloc_pid()). >>>>>> >>>>>>> it seems to complicate the >>>>>>> code and interface with minimal gain. >>>>>>> In practice, a process will only do either one or the other, so a >>>>>>> single >>>>>>> file is enough (e.g. "next_id"). >>>>>>> Also, writing a single value at a time followed by the syscall is >>>>>>> enough; >>>>>>> it's definitely not a performance issue to have multiple calls. >>>>>>> We assume the user/caller knows what she's doing, so no need to >>>>>>> classify >>>>>>> the identifier (that is, tell the kernel it's a pid, or an ipc id) >>>>>>> ahead >>>>>>> of time. The caller simply writes a value and then calls the relevant >>>>>>> syscall, or otherwise the results may not be what she expected... >>>>>>> If such context is expected to be required (although I don't see any >>>>>>> at >>>>>>> the moment), we can require that the user write "TYPE VALUE" pair to >>>>>>> the "next_id" file. >>>>>> >>>>>> That's exactly what I wanted to avoid by creating 1 file per object. >>>>>> Now, it's true that in a restart context where I guess that things >>>>>> will be done synchronously, we could have a single next_id file. >>>>>> >>>>>>>> When one of these files (or both of them) is filled, a structure >>>>>>>> pointed to >>>>>>>> by the calling task struct is filled with these ids. >>>>>>>> >>>>>>>> Then, when the object is created, the id(s) present in that >>>>>>>> structure are >>>>>>>> used, instead of the default ones. >>>>>>>> >>>>>>>> The patches are against 2.6.25-rc3-mm1, in the following order: >>>>>>>> >>>>>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created. >>>>>>>> [PATCH 2/4] adds the procfs facility for next task to be forked. >>>>>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the >>>>>>>> new IPC >>>>>>>> object (changes the ipc_addid() path). >>>>>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) >>>>>>>> for a newly >>>>>>>> allocated process (changes the >>>>>>>> alloc_pid()/alloc_pidmap() paths). >>>>>>>> >>>>>>>> Any comment and/or suggestions are welcome. >>>>>>>> >>>>>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nadia >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> -- >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> Regards, >>>>>> Nadia >>>>> >>>>> >>>> >>> _______________________________________________ >>> Containers mailing list >>> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org >>> https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2008-04-18 5:46 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next id Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 3/4] IPC: use the target ID specified in procfs Nadia.Derbey
2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` [RFC][PATCH 4/4] PID: " Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` Nadia.Derbey
[not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org>
2008-04-15 3:06 ` [RFC][PATCH 0/4] Object creation with a specified id Nick Andrew
2008-04-15 3:06 ` Nick Andrew
2008-04-15 10:30 ` Nadia Derbey
[not found] ` <480483C2.3030509-6ktuUTfB/bM@public.gmane.org>
2008-04-15 18:52 ` Oren Laadan
2008-04-15 18:52 ` Oren Laadan
[not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>
2008-04-15 10:30 ` Nadia Derbey
2008-04-18 5:46 ` Nadia Derbey
2008-04-18 5:46 ` Nadia Derbey
-- strict thread matches above, loose matches on Subject: below --
2008-04-04 14:51 Nadia.Derbey-6ktuUTfB/bM
2008-03-10 13:50 Nadia.Derbey-6ktuUTfB/bM
[not found] ` <20080310135054.312992000-6ktuUTfB/bM@public.gmane.org>
2008-03-13 23:16 ` Oren Laadan
[not found] ` <47D9B5B7.6060803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 6:21 ` Nadia Derbey
[not found] ` <47DA195B.8070704-6ktuUTfB/bM@public.gmane.org>
2008-03-14 15:50 ` Oren Laadan
[not found] ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 15:56 ` Pavel Emelyanov
[not found] ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-03-14 16:02 ` Oren Laadan
[not found] ` <47DAA1A6.6010509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 16:08 ` Pavel Emelyanov
2008-03-14 16:11 ` Nadia Derbey
2008-03-14 16:11 ` Nadia Derbey
[not found] ` <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org>
2008-03-14 16:45 ` Oren Laadan
[not found] ` <47DAABAB.7000706-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-16 3:43 ` Serge E. Hallyn
[not found] ` <20080316034320.GA19793-6s5zFf/epYLPQpwDFJZrxFMas7LaWZ9n@public.gmane.org>
2008-03-16 19:08 ` Oren Laadan
[not found] ` <47DD703C.4030809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-17 14:44 ` Serge E. Hallyn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.