From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Ying Han <yinghan@google.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>,
linux-api@vger.kernel.org, containers@lists.linux-foundation.org,
hpa@zytor.com, linux-kernel@vger.kernel.org,
Dave Hansen <dave@linux.vnet.ibm.com>,
linux-mm@kvack.org, viro@zeniv.linux.org.uk, mingo@elte.hu,
mpm@selenic.com, Andrew Morton <akpm@linux-foundation.org>,
xemul@openvz.org, torvalds@linux-foundation.org,
tglx@linutronix.de, Alexey Dobriyan <adobriyan@gmail.com>
Subject: Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZ do?
Date: Thu, 12 Mar 2009 22:34:58 -0700 [thread overview]
Message-ID: <20090313053458.GA28833@us.ibm.com> (raw)
In-Reply-To: <604427e00903122129y37ad791aq5fe7ef2552415da9@mail.gmail.com>
Ying Han [yinghan@google.com] wrote:
| Hi Serge:
| I made a patch based on Oren's tree recently which implement a new
| syscall clone_with_pid. I tested with checkpoint/restart process tree
| and it works as expected.
Yes, I think we had a version of clone() with pid a while ago.
But it would be easier to review if you break it up into smaller
patches. and remove the unnecessary diffs in this patch like...
| This patch has some hack in it which i made a copy of libc's clone and
| made modifications of passing one more argument(pid number). I will
| try to clean up the code and do more testing.
|
| New syscall clone_with_pid
| Implement a new syscall which clone a thread with a preselected pid number.
|
| clone_with_pid(child_func, child_stack + CHILD_STACK - 16,
| CLONE_WITH_PID|SIGCHLD, pid, NULL);
|
| Signed-off-by: Ying Han <yinghan@google.com>
|
| diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
| index 87803da..b5a1b03 100644
| --- a/arch/x86/include/asm/syscalls.h
| +++ b/arch/x86/include/asm/syscalls.h
| @@ -26,6 +26,7 @@ asmlinkage int sys_fork(struct pt_regs);
| asmlinkage int sys_clone(struct pt_regs);
| asmlinkage int sys_vfork(struct pt_regs);
| asmlinkage int sys_execve(struct pt_regs);
| +asmlinkage int sys_clone_with_pid(struct pt_regs);
|
| /* kernel/signal_32.c */
| asmlinkage int sys_sigsuspend(int, int, old_sigset_t);
| diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32
| index a5f9e09..f10ca0e 100644
| --- a/arch/x86/include/asm/unistd_32.h
| +++ b/arch/x86/include/asm/unistd_32.h
| @@ -340,6 +340,7 @@
| #define __NR_inotify_init1 332
| #define __NR_checkpoint 333
| #define __NR_restart 334
| +#define __NR_clone_with_pid 335
|
| #ifdef __KERNEL__
|
| diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
| index 0a1302f..88ae634 100644
| --- a/arch/x86/kernel/process_32.c
| +++ b/arch/x86/kernel/process_32.c
| @@ -8,7 +8,6 @@
| /*
| * This file handles the architecture-dependent parts of process handling..
| */
| -
these
| #include <stdarg.h>
|
| #include <linux/cpu.h>
| @@ -652,6 +651,28 @@ asmlinkage int sys_clone(struct pt_regs regs)
| return do_fork(clone_flags, newsp, ®s, 0, parent_tidptr, child_tidptr);
| }
|
| +/**
| + * sys_clone_with_pid - clone a thread with pre-select pid number.
| + */
| +asmlinkage int sys_clone_with_pid(struct pt_regs regs)
| +{
| + unsigned long clone_flags;
| + unsigned long newsp;
| + int __user *parent_tidptr, *child_tidptr;
| + pid_t pid_nr;
| +
| + clone_flags = regs.bx;
| + newsp = regs.cx;
| + parent_tidptr = (int __user *)regs.dx;
| + child_tidptr = (int __user *)regs.di;
| + pid_nr = regs.bp;
| +
| + if (!newsp)
| + newsp = regs.sp;
| + return do_fork(clone_flags, newsp, ®s, pid_nr, parent_tidptr,
| + child_tidptr);
| +}
| +
| /*
| * This is trivial, and on the face of it looks like it
| * could equally well be done in user mode.
| diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_tabl
| index 5543136..5191117 100644
| --- a/arch/x86/kernel/syscall_table_32.S
| +++ b/arch/x86/kernel/syscall_table_32.S
| @@ -334,3 +334,4 @@ ENTRY(sys_call_table)
| .long sys_inotify_init1
| .long sys_checkpoint
| .long sys_restart
| + .long sys_clone_with_pid
| diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
| index 50bde9a..a4aee65 100644
| --- a/arch/x86/mm/checkpoint.c
| +++ b/arch/x86/mm/checkpoint.c
| @@ -7,7 +7,6 @@
| * License. See the file COPYING in the main directory of the Linux
| * distribution for more details.
| */
| -
| #include <asm/desc.h>
| #include <asm/i387.h>
|
| diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
| index 64155de..b7de611 100644
| --- a/checkpoint/checkpoint.c
| +++ b/checkpoint/checkpoint.c
| @@ -8,6 +8,7 @@
| * distribution for more details.
| */
|
| +#define DEBUG
| #include <linux/version.h>
| #include <linux/sched.h>
| #include <linux/ptrace.h>
| @@ -564,3 +565,4 @@ int do_checkpoint(struct cr_ctx *ctx, pid_t pid)
| out:
| return ret;
| }
| +
| diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
| index e3097ac..a8c5ad5 100644
| --- a/checkpoint/ckpt_file.c
| +++ b/checkpoint/ckpt_file.c
| @@ -7,7 +7,7 @@
| * License. See the file COPYING in the main directory of the Linux
| * distribution for more details.
| */
| -
| +#define DEBUG
| #include <linux/kernel.h>
| #include <linux/sched.h>
| #include <linux/file.h>
| diff --git a/checkpoint/ckpt_mem.c b/checkpoint/ckpt_mem.c
| index 4925ff2..ca5840b 100644
| --- a/checkpoint/ckpt_mem.c
| +++ b/checkpoint/ckpt_mem.c
| @@ -7,7 +7,7 @@
| * License. See the file COPYING in the main directory of the Linux
| * distribution for more details.
| */
| -
| +#define DEBUG
| #include <linux/kernel.h>
| #include <linux/sched.h>
| #include <linux/slab.h>
| diff --git a/checkpoint/restart.c b/checkpoint/restart.c
| index 7ec4de4..30e43c2 100644
| --- a/checkpoint/restart.c
| +++ b/checkpoint/restart.c
| @@ -8,6 +8,7 @@
| * distribution for more details.
| */
|
| +#define DEBUG
| #include <linux/version.h>
| #include <linux/sched.h>
| #include <linux/wait.h>
| @@ -242,7 +243,7 @@ static int cr_read_task_struct(struct cr_ctx *ctx)
| memcpy(t->comm, buf, min(hh->task_comm_len, TASK_COMM_LEN));
| }
| kfree(buf);
| -
| + pr_debug("read task %s\n", t->comm);
| /* FIXME: restore remaining relevant task_struct fields */
| out:
| cr_hbuf_put(ctx, sizeof(*hh));
| diff --git a/checkpoint/rstr_file.c b/checkpoint/rstr_file.c
| index f44b081..755e40e 100644
| --- a/checkpoint/rstr_file.c
| +++ b/checkpoint/rstr_file.c
| @@ -7,7 +7,7 @@
| * License. See the file COPYING in the main directory of the Linux
| * distribution for more details.
| */
| -
| +#define DEBUG
| #include <linux/kernel.h>
| #include <linux/sched.h>
| #include <linux/fs.h>
| diff --git a/checkpoint/rstr_mem.c b/checkpoint/rstr_mem.c
| index 4d5ce1a..8330468 100644
| --- a/checkpoint/rstr_mem.c
| +++ b/checkpoint/rstr_mem.c
| @@ -7,7 +7,7 @@
| * License. See the file COPYING in the main directory of the Linux
| * distribution for more details.
| */
| -
| +#define DEBUG
| #include <linux/kernel.h>
| #include <linux/sched.h>
| #include <linux/fcntl.h>
| diff --git a/checkpoint/sys.c b/checkpoint/sys.c
| index f26b0c6..d1a5394 100644
| --- a/checkpoint/sys.c
| +++ b/checkpoint/sys.c
| @@ -7,7 +7,7 @@
| * License. See the file COPYING in the main directory of the Linux
| * distribution for more details.
| */
| -
| +#define DEBUG
| #include <linux/sched.h>
| #include <linux/nsproxy.h>
| #include <linux/kernel.h>
| @@ -263,7 +263,6 @@ asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned
| return PTR_ERR(ctx);
|
| ret = do_checkpoint(ctx, pid);
| -
| if (!ret)
| ret = ctx->crid;
|
| @@ -304,3 +303,4 @@ asmlinkage long sys_restart(int crid, int fd, unsigned lon
| cr_ctx_put(ctx);
| return ret;
| }
| +
| diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
| index 217cf6e..bc2c202 100644
| --- a/include/linux/checkpoint.h
| +++ b/include/linux/checkpoint.h
| @@ -114,7 +114,6 @@ extern int cr_write_files(struct cr_ctx *ctx, struct task_
| extern int do_restart(struct cr_ctx *ctx, pid_t pid);
| extern int cr_read_mm(struct cr_ctx *ctx);
| extern int cr_read_files(struct cr_ctx *ctx);
| -
| #ifdef pr_fmt
| #undef pr_fmt
| #endif
| diff --git a/include/linux/pid.h b/include/linux/pid.h
| index d7e98ff..86e2f61 100644
| --- a/include/linux/pid.h
| +++ b/include/linux/pid.h
| @@ -119,7 +119,7 @@ extern struct pid *find_get_pid(int nr);
| extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
| int next_pidmap(struct pid_namespace *pid_ns, int last);
|
| -extern struct pid *alloc_pid(struct pid_namespace *ns);
| +extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t pid_nr);
| extern void free_pid(struct pid *pid);
|
| /*
| diff --git a/include/linux/sched.h b/include/linux/sched.h
| index 0150e90..7fb4e28 100644
| --- a/include/linux/sched.h
| +++ b/include/linux/sched.h
| @@ -28,6 +28,7 @@
| #define CLONE_NEWPID 0x20000000 /* New pid namespace */
| #define CLONE_NEWNET 0x40000000 /* New network namespace */
| #define CLONE_IO 0x80000000 /* Clone io context */
| +#define CLONE_WITH_PID 0x00001000 /* Clone with pre-select PID */
|
| /*
| * Scheduling policies
| diff --git a/kernel/exit.c b/kernel/exit.c
| index 2d8be7e..4baf651 100644
| --- a/kernel/exit.c
| +++ b/kernel/exit.c
| @@ -3,7 +3,7 @@
| *
| * Copyright (C) 1991, 1992 Linus Torvalds
| */
| -
| +#define DEBUG
| #include <linux/mm.h>
| #include <linux/slab.h>
| #include <linux/interrupt.h>
| @@ -1676,6 +1676,7 @@ static long do_wait(enum pid_type type, struct pid *pid,
| DECLARE_WAITQUEUE(wait, current);
| struct task_struct *tsk;
| int retval;
| + int level;
and this (level is not used).
|
| trace_sched_process_wait(pid);
|
| @@ -1708,7 +1709,6 @@ repeat:
| retval = tsk_result;
| goto end;
| }
| -
| if (options & __WNOTHREAD)
| break;
| tsk = next_thread(tsk);
| @@ -1817,7 +1817,6 @@ asmlinkage long sys_wait4(pid_t upid, int __user *stat_a
| type = PIDTYPE_PID;
| pid = find_get_pid(upid);
| }
| -
| ret = do_wait(type, pid, options | WEXITED, NULL, stat_addr, ru);
| put_pid(pid);
|
| diff --git a/kernel/fork.c b/kernel/fork.c
| index 085ce56..262ae1e 100644
| --- a/kernel/fork.c
| +++ b/kernel/fork.c
| @@ -10,7 +10,7 @@
| * Fork is rather simple, once you get the hang of it, but the memory
| * management can be a bitch. See 'mm/memory.c': 'copy_page_range()'
| */
| -
| +#define DEBUG
| #include <linux/slab.h>
| #include <linux/init.h>
| #include <linux/unistd.h>
| @@ -959,10 +959,19 @@ static struct task_struct *copy_process(unsigned long cl
| int retval;
| struct task_struct *p;
| int cgroup_callbacks_done = 0;
| + pid_t clone_pid = stack_size;
|
| if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
| return ERR_PTR(-EINVAL);
|
| + /* We only allow the clone_with_pid when a new pid namespace is
| + * created. FIXME: how to restrict it.
Not sure why CLONE_NEWPID is required to set pid_nr. In fact with CLONE_NEWPID,
by definition, pid_nr should be 1. Also, what happens if a container has
more than one process - where the second process has a pid_nr > 2 ?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-13 5:36 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-27 17:07 [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Oren Laadan
2009-01-27 17:07 ` [RFC v13][PATCH 01/14] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-01-27 17:20 ` Randy Dunlap
2009-01-27 17:08 ` [RFC v13][PATCH 02/14] Checkpoint/restart: initial documentation Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 03/14] Make file_pos_read/write() public Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 04/14] General infrastructure for checkpoint restart Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 05/14] x86 support for checkpoint/restart Oren Laadan
2009-02-24 7:47 ` Nathan Lynch
2009-02-24 16:06 ` Dave Hansen
2009-03-18 7:21 ` Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 06/14] Dump memory address space Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 07/14] Restore " Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 08/14] Infrastructure for shared objects Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 09/14] Dump open file descriptors Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 10/14] Restore open file descriprtors Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 11/14] External checkpoint of a task other than ourself Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 12/14] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 13/14] Checkpoint multiple processes Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 14/14] Restart " Oren Laadan
2009-02-10 17:05 ` [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-02-11 22:14 ` Andrew Morton
2009-02-12 9:17 ` Ingo Molnar
2009-02-12 18:11 ` Dave Hansen
2009-02-12 20:48 ` Serge E. Hallyn
2009-02-13 10:20 ` Ingo Molnar
2009-02-12 18:11 ` Dave Hansen
2009-02-12 19:30 ` Matt Mackall
2009-02-12 19:42 ` Andrew Morton
2009-02-12 21:51 ` What can OpenVZ do? Dave Hansen
2009-02-12 22:10 ` Andrew Morton
2009-02-12 23:04 ` How much of a mess does OpenVZ make? ;) Was: " Dave Hansen
2009-02-26 15:57 ` Alexey Dobriyan
2009-03-10 21:53 ` Alexey Dobriyan
2009-03-10 23:28 ` Serge E. Hallyn
2009-03-11 8:26 ` Cedric Le Goater
2009-03-12 14:53 ` Serge E. Hallyn
2009-03-12 21:01 ` Greg Kurz
2009-03-12 21:21 ` Serge E. Hallyn
2009-03-13 4:29 ` Ying Han
2009-03-13 5:34 ` Sukadev Bhattiprolu [this message]
2009-03-13 6:19 ` Ying Han
2009-03-13 17:27 ` Linus Torvalds
2009-03-13 19:02 ` Serge E. Hallyn
2009-03-13 19:35 ` Alexey Dobriyan
2009-03-13 21:01 ` Linus Torvalds
2009-03-13 21:51 ` Dave Hansen
2009-03-13 22:15 ` Oren Laadan
2009-03-14 0:27 ` Eric W. Biederman
2009-03-14 8:12 ` Ingo Molnar
2009-03-16 22:33 ` Kevin Fox
2009-03-19 21:19 ` Eric W. Biederman
2009-03-14 0:20 ` Alexey Dobriyan
2009-03-14 8:25 ` Ingo Molnar
[not found] ` <20090314082532.GB16436-X9Un+BFzKDI@public.gmane.org>
2009-03-14 17:11 ` Joseph Ruscio
2009-03-16 6:01 ` Oren Laadan
2009-03-13 20:48 ` Mike Waychison
2009-03-13 22:35 ` Oren Laadan
2009-03-18 18:54 ` Mike Waychison
2009-03-18 19:04 ` Oren Laadan
2009-03-13 15:27 ` Cedric Le Goater
2009-03-13 17:11 ` Greg Kurz
2009-03-13 17:37 ` Serge E. Hallyn
2009-03-13 15:47 ` Cedric Le Goater
2009-03-13 16:35 ` Serge E. Hallyn
2009-03-13 16:53 ` Cedric Le Goater
2009-02-26 16:27 ` Alexey Dobriyan
2009-02-26 17:33 ` Ingo Molnar
2009-02-26 18:30 ` Greg Kurz
2009-02-26 22:17 ` Alexey Dobriyan
2009-02-27 9:19 ` Greg Kurz
2009-02-27 10:53 ` Alexey Dobriyan
2009-02-27 14:33 ` Cedric Le Goater
2009-02-27 9:36 ` Cedric Le Goater
2009-02-26 22:31 ` Alexey Dobriyan
2009-02-27 9:03 ` Ingo Molnar
2009-02-27 9:19 ` Andrew Morton
2009-02-27 10:57 ` Alexey Dobriyan
2009-02-27 9:22 ` Andrew Morton
2009-02-27 10:59 ` Alexey Dobriyan
2009-02-27 16:14 ` Dave Hansen
2009-02-27 21:57 ` Alexey Dobriyan
2009-02-27 21:54 ` Dave Hansen
2009-03-01 1:33 ` Alexey Dobriyan
2009-03-01 20:02 ` Serge E. Hallyn
2009-03-01 20:56 ` Alexey Dobriyan
2009-03-01 22:21 ` Serge E. Hallyn
2009-03-03 16:17 ` Cedric Le Goater
2009-03-03 18:28 ` Serge E. Hallyn
2009-02-13 10:53 ` Ingo Molnar
2009-02-16 20:51 ` Dave Hansen
2009-02-17 22:23 ` Ingo Molnar
2009-02-17 22:30 ` Dave Hansen
2009-02-18 0:32 ` Ingo Molnar
2009-02-18 0:40 ` Dave Hansen
2009-02-18 5:11 ` Alexey Dobriyan
2009-02-18 18:16 ` Ingo Molnar
2009-02-18 21:27 ` Dave Hansen
2009-02-18 23:15 ` Ingo Molnar
2009-02-19 19:06 ` Banning checkpoint (was: Re: What can OpenVZ do?) Alexey Dobriyan
2009-02-19 19:11 ` Dave Hansen
2009-02-24 4:47 ` Alexey Dobriyan
2009-02-24 5:11 ` Dave Hansen
2009-02-24 15:43 ` Serge E. Hallyn
2009-02-24 20:09 ` Alexey Dobriyan
2009-02-12 22:17 ` What can OpenVZ do? Alexey Dobriyan
2009-02-13 10:27 ` Ingo Molnar
2009-02-13 11:32 ` Alexey Dobriyan
2009-02-13 11:45 ` Ingo Molnar
2009-02-13 22:28 ` Alexey Dobriyan
2009-03-14 0:04 ` Eric W. Biederman
2009-03-14 0:26 ` Serge E. Hallyn
2009-02-12 22:57 ` [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-02-12 23:05 ` Matt Mackall
2009-02-12 23:13 ` Dave Hansen
2009-02-13 23:28 ` Andrew Morton
2009-02-14 23:08 ` Ingo Molnar
2009-02-14 23:31 ` Andrew Morton
2009-02-14 23:50 ` Ingo Molnar
2009-02-16 17:37 ` Dave Hansen
2009-03-13 2:45 ` Oren Laadan
2009-03-13 3:57 ` Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090313053458.GA28833@us.ibm.com \
--to=sukadev@linux.vnet.ibm.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=mpm@selenic.com \
--cc=serue@us.ibm.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=xemul@openvz.org \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).