From: Oren Laadan <orenl@cs.columbia.edu>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@osdl.org>,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-api@vger.kernel.org, Serge Hallyn <serue@us.ibm.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [RFC v16][PATCH 23/43] c/r: restart multiple processes
Date: Wed, 27 May 2009 17:38:46 -0400 [thread overview]
Message-ID: <4A1DB2E6.8000102@cs.columbia.edu> (raw)
In-Reply-To: <20090527193758.GC31930@x200.localdomain>
Alexey Dobriyan wrote:
> On Wed, May 27, 2009 at 01:32:49PM -0400, Oren Laadan wrote:
>> Restarting of multiple processes expects all restarting tasks to call
>> sys_restart(). Once inside the system call, each task will restart
>> itself at the same order that they were saved. The internals of the
>> syscall will take care of in-kernel synchronization bewteen tasks.
>>
>> This patch does _not_ create the task tree in the kernel. Instead it
>> assumes that all tasks are created in some way and then invoke the
>> restart syscall. You can use the userspace mktree.c program to do
>> that.
>>
>> The init task (*) has a special role: it allocates the restart context
>> (ctx), and coordinates the operation. In particular, it first waits
>> until all participating tasks enter the kernel, and provides them the
>> common restart context. Once everyone in ready, it begins to restart
>> itself.
>>
>> In contrast, the other tasks enter the kernel, locate the init task (*)
>> and grab its restart context, and then wait for their turn to restore.
>>
>> When a task (init or not) completes its restart, it hands the control
>> over to the next in line, by waking that task.
>>
>> An array of pids (the one saved during the checkpoint) is used to
>> synchronize the operation. The first task in the array is the init
>> task (*). The restart context (ctx) maintain a "current position" in
>> the array, which indicates which task is currently active. Once the
>> currently active task completes its own restart, it increments that
>> position and wakes up the next task.
>>
>> Restart assumes that userspace provides meaningful data, otherwise
>> it's garbage-in-garbage-out. In this case, the syscall may block
>> indefinitely, but in TASK_INTERRUPTIBLE, so the user can ctrl-c or
>> otherwise kill the stray restarting tasks.
>>
>> In terms of security, restart runs as the user the invokes it, so it
>> will not allow a user to do more than is otherwise permitted by the
>> usual system semantics and policy.
>>
>> Currently we ignore threads and zombies
>
> Let's discuss threads and zombies.
>
> 1. Will zombie end up in a image?
Zombies will be mentioned in the hierarchy description, and will
have very little state saved (e.g. exit status, parent).
> 2. If yes, how it will be restored. Will it be forked, call restart(2)
> and then somehow zombified inside kernel?
(not part of this patchset, but soon will be added to ckpt-v16-dev)
Zombie will be restarted as a normal process, will restore bare
minimum needed, and will call do_exit(). It will have to ensure
that there are no side effects on (=signals to) parent/children.
> 3. How thread group will be restored, will every thread be CLONE_THREAD'ed?
> What to do with exited thread group leaders, will they be forked, then
> CLONE_THREAD thread group?
First, user space creates the entire tree hierarchy, including
zombies. Then each task calls sys_restart(). Inside, they are
coordinated to restore their state one after the other. So that
eventually, the to-be-zombies, be it a thread-group-leader or not,
will call do_exit() and zombify themselves.
Take a look at mktree.c (part of the user tools). It's already done
there using CLONE_THREAD. The reason I wrote that it isn't supported
well is because I think that in full-container mode the link count
won't work correctly. Other than that, threads should work as long
as you don't play with "partial" sharing (e.g. only CLONE_FS).
Oren.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-05-27 21:38 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-27 17:32 [RFC v16][PATCH 00/43] Kernel based checkpoint/restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 01/43] c/r: extend arch_setup_additional_pages() Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 02/43] c/r: make file_pos_read/write() public Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 03/43] c/r: create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 04/43] c/r: documentation Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 05/43] c/r: basic infrastructure for checkpoint/restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 06/43] c/r: x86_32 support " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 07/43] c/r: infrastructure for shared objects Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 08/43] c/r: introduce '->checkpoint()' method in 'struct file_operations' Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 09/43] c/r: dump open file descriptors Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 10/43] c/r: restore " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 11/43] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 12/43] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 14/43] c/r: dump memory address space (private memory) Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 17/43] c/r: dump anonymous- and file-mapped- shared memory Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 19/43] c/r: external checkpoint of a task other than ourself Oren Laadan
2009-05-27 21:19 ` Alexey Dobriyan
2009-05-27 22:32 ` Oren Laadan
[not found] ` <Pine.LNX.4.64.0905271831030.7284-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2009-05-28 16:33 ` Alexey Dobriyan
2009-05-27 17:32 ` [RFC v16][PATCH 20/43] c/r: export functionality used in next patch for restart-blocks Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 21/43] c/r: restart-blocks Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 22/43] c/r: checkpoint multiple processes Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 23/43] c/r: restart " Oren Laadan
[not found] ` <1243445589-32388-24-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-05-27 19:37 ` Alexey Dobriyan
2009-05-27 21:38 ` Oren Laadan [this message]
2009-05-27 17:32 ` [RFC v16][PATCH 24/43] c/r: detect resource leaks for whole-container checkpoint Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 25/43] tee: don't return 0 when another task drains/fills a pipe Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 26/43] splice: added support for pipe-to-pipe splice() Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 27/43] c/r: support for open pipes Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 28/43] c/r: make ckpt_may_checkpoint_task() check each namespace individually Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 29/43] c/r: support for UTS namespace Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 32/43] c/r (ipc): allow allocation of a desired ipc identifier Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 33/43] c/r (ipc): helpers to save and restore kern_ipc_perm structures Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 35/43] c/r (ipc): export interface from ipc/shm.c to delete ipc shm Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 36/43] c/r: support share-memory sysv-ipc Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 37/43] c/r (ipc): make 'struct msg_msgseg' visible in ipc/util.h Oren Laadan
[not found] ` <1243445589-32388-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-05-27 17:32 ` [RFC v16][PATCH 13/43] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 15/43] c/r: restore memory address space (private memory) Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 16/43] c/r: export shmem_getpage() to support shared memory Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 18/43] c/r: restore anonymous- and file-mapped- " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 30/43] c/r: stub implementation for IPC namespace Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 31/43] deferqueue: generic queue to defer work Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 34/43] c/r: save and restore ipc namespace basics Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 38/43] c/r: support message-queues sysv-ipc Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 39/43] c/r (ipc): export interface from ipc/sem.c to cleanup ipc sem Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 40/43] c/r: support semaphore sysv-ipc Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 41/43] c/r: (s390): expose a constant for the number of words (CRs) Oren Laadan
2009-05-27 18:39 ` Alexey Dobriyan
2009-05-27 17:33 ` [RFC v16][PATCH 42/43] c/r: add CKPT_COPY() macro Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 43/43] c/r: define s390-specific checkpoint-restart code Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A1DB2E6.8000102@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=serue@us.ibm.com \
--cc=torvalds@osdl.org \
--cc=viro@zeniv.linux.org.uk \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).