linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@osdl.org>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-api@vger.kernel.org, Serge Hallyn <serue@us.ibm.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [RFC v16][PATCH 23/43] c/r: restart multiple processes
Date: Wed, 27 May 2009 17:38:46 -0400	[thread overview]
Message-ID: <4A1DB2E6.8000102@cs.columbia.edu> (raw)
In-Reply-To: <20090527193758.GC31930@x200.localdomain>



Alexey Dobriyan wrote:
> On Wed, May 27, 2009 at 01:32:49PM -0400, Oren Laadan wrote:
>> Restarting of multiple processes expects all restarting tasks to call
>> sys_restart(). Once inside the system call, each task will restart
>> itself at the same order that they were saved. The internals of the
>> syscall will take care of in-kernel synchronization bewteen tasks.
>>
>> This patch does _not_ create the task tree in the kernel. Instead it
>> assumes that all tasks are created in some way and then invoke the
>> restart syscall. You can use the userspace mktree.c program to do
>> that.
>>
>> The init task (*) has a special role: it allocates the restart context
>> (ctx), and coordinates the operation. In particular, it first waits
>> until all participating tasks enter the kernel, and provides them the
>> common restart context. Once everyone in ready, it begins to restart
>> itself.
>>
>> In contrast, the other tasks enter the kernel, locate the init task (*)
>> and grab its restart context, and then wait for their turn to restore.
>>
>> When a task (init or not) completes its restart, it hands the control
>> over to the next in line, by waking that task.
>>
>> An array of pids (the one saved during the checkpoint) is used to
>> synchronize the operation. The first task in the array is the init
>> task (*). The restart context (ctx) maintain a "current position" in
>> the array, which indicates which task is currently active. Once the
>> currently active task completes its own restart, it increments that
>> position and wakes up the next task.
>>
>> Restart assumes that userspace provides meaningful data, otherwise
>> it's garbage-in-garbage-out. In this case, the syscall may block
>> indefinitely, but in TASK_INTERRUPTIBLE, so the user can ctrl-c or
>> otherwise kill the stray restarting tasks.
>>
>> In terms of security, restart runs as the user the invokes it, so it
>> will not allow a user to do more than is otherwise permitted by the
>> usual system semantics and policy.
>>
>> Currently we ignore threads and zombies
> 
> Let's discuss threads and zombies.
> 
> 1. Will zombie end up in a image?

Zombies will be mentioned in the hierarchy description, and will
have very little state saved (e.g. exit status, parent).

> 2. If yes, how it will be restored. Will it be forked, call restart(2)
>    and then somehow zombified inside kernel?

(not part of this patchset, but soon will be added to ckpt-v16-dev)
Zombie will be restarted as a normal process, will restore bare
minimum needed, and will call do_exit(). It will have to ensure
that there are no side effects on (=signals to) parent/children.

> 3. How thread group will be restored, will every thread be CLONE_THREAD'ed?
>    What to do with exited thread group leaders, will they be forked, then
>    CLONE_THREAD thread group?

First, user space creates the entire tree hierarchy, including
zombies. Then each task calls sys_restart(). Inside, they are
coordinated to restore their state one after the other. So that
eventually, the to-be-zombies, be it a thread-group-leader or not,
will call do_exit() and zombify themselves.

Take a look at mktree.c (part of the user tools). It's already done
there using CLONE_THREAD.  The reason I wrote that it isn't supported
well is because I think that in full-container mode the link count
won't work correctly. Other than that, threads should work as long
as you don't play with "partial" sharing (e.g. only CLONE_FS).

Oren.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-05-27 21:38 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-27 17:32 [RFC v16][PATCH 00/43] Kernel based checkpoint/restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 01/43] c/r: extend arch_setup_additional_pages() Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 02/43] c/r: make file_pos_read/write() public Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 03/43] c/r: create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 04/43] c/r: documentation Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 05/43] c/r: basic infrastructure for checkpoint/restart Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 06/43] c/r: x86_32 support " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 07/43] c/r: infrastructure for shared objects Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 08/43] c/r: introduce '->checkpoint()' method in 'struct file_operations' Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 09/43] c/r: dump open file descriptors Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 10/43] c/r: restore " Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 11/43] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 12/43] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 14/43] c/r: dump memory address space (private memory) Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 17/43] c/r: dump anonymous- and file-mapped- shared memory Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 19/43] c/r: external checkpoint of a task other than ourself Oren Laadan
2009-05-27 21:19   ` Alexey Dobriyan
2009-05-27 22:32     ` Oren Laadan
     [not found]       ` <Pine.LNX.4.64.0905271831030.7284-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2009-05-28 16:33         ` Alexey Dobriyan
2009-05-27 17:32 ` [RFC v16][PATCH 20/43] c/r: export functionality used in next patch for restart-blocks Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 21/43] c/r: restart-blocks Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 22/43] c/r: checkpoint multiple processes Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 23/43] c/r: restart " Oren Laadan
     [not found]   ` <1243445589-32388-24-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-05-27 19:37     ` Alexey Dobriyan
2009-05-27 21:38       ` Oren Laadan [this message]
2009-05-27 17:32 ` [RFC v16][PATCH 24/43] c/r: detect resource leaks for whole-container checkpoint Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 25/43] tee: don't return 0 when another task drains/fills a pipe Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 26/43] splice: added support for pipe-to-pipe splice() Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 27/43] c/r: support for open pipes Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 28/43] c/r: make ckpt_may_checkpoint_task() check each namespace individually Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 29/43] c/r: support for UTS namespace Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 32/43] c/r (ipc): allow allocation of a desired ipc identifier Oren Laadan
2009-05-27 17:32 ` [RFC v16][PATCH 33/43] c/r (ipc): helpers to save and restore kern_ipc_perm structures Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 35/43] c/r (ipc): export interface from ipc/shm.c to delete ipc shm Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 36/43] c/r: support share-memory sysv-ipc Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 37/43] c/r (ipc): make 'struct msg_msgseg' visible in ipc/util.h Oren Laadan
     [not found] ` <1243445589-32388-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-05-27 17:32   ` [RFC v16][PATCH 13/43] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2009-05-27 17:32   ` [RFC v16][PATCH 15/43] c/r: restore memory address space (private memory) Oren Laadan
2009-05-27 17:32   ` [RFC v16][PATCH 16/43] c/r: export shmem_getpage() to support shared memory Oren Laadan
2009-05-27 17:32   ` [RFC v16][PATCH 18/43] c/r: restore anonymous- and file-mapped- " Oren Laadan
2009-05-27 17:32   ` [RFC v16][PATCH 30/43] c/r: stub implementation for IPC namespace Oren Laadan
2009-05-27 17:32   ` [RFC v16][PATCH 31/43] deferqueue: generic queue to defer work Oren Laadan
2009-05-27 17:33   ` [RFC v16][PATCH 34/43] c/r: save and restore ipc namespace basics Oren Laadan
2009-05-27 17:33   ` [RFC v16][PATCH 38/43] c/r: support message-queues sysv-ipc Oren Laadan
2009-05-27 17:33   ` [RFC v16][PATCH 39/43] c/r (ipc): export interface from ipc/sem.c to cleanup ipc sem Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 40/43] c/r: support semaphore sysv-ipc Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 41/43] c/r: (s390): expose a constant for the number of words (CRs) Oren Laadan
2009-05-27 18:39   ` Alexey Dobriyan
2009-05-27 17:33 ` [RFC v16][PATCH 42/43] c/r: add CKPT_COPY() macro Oren Laadan
2009-05-27 17:33 ` [RFC v16][PATCH 43/43] c/r: define s390-specific checkpoint-restart code Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A1DB2E6.8000102@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=serue@us.ibm.com \
    --cc=torvalds@osdl.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).