From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Sukadev Bhattiprolu
<sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Dave Hansen
<dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: Re: [RFC v14-rc2][PATCH 15/29] Restart multiple processes
Date: Tue, 07 Apr 2009 01:31:18 -0400 [thread overview]
Message-ID: <49DAE526.6010900@cs.columbia.edu> (raw)
In-Reply-To: <20090407033315.GJ12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Sukadev Bhattiprolu wrote:
> Couple of nits and couple of not-so minor comments
>
> Oren Laadan [orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org] wrote:
> | From 7162fef93ee3d9fd30a457dd7b0c7ad0200d5bcb Mon Sep 17 00:00:00 2001
> | From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> | Date: Mon, 30 Mar 2009 15:06:13 -0400
> | Subject: [PATCH 15/29] Restart multiple processes
> |
> | Restarting of multiple processes expects all restarting tasks to call
> | sys_restart(). Once inside the system call, each task will restart
> | itself at the same order that they were saved. The internals of the
> | syscall will take care of in-kernel synchronization bewteen tasks.
> |
[...]
> |
> | struct cr_ctx {
> | int crid; /* unique checkpoint id */
> | @@ -31,8 +34,7 @@ struct cr_ctx {
> | void *hbuf; /* temporary buffer for headers */
> | int hpos; /* position in headers buffer */
> |
> | - struct task_struct **tasks_arr; /* array of all tasks in container */
> | - int tasks_nr; /* size of tasks array */
> | + atomic_t refcount;
> |
> | struct cr_objhash *objhash; /* hash for shared objects */
> |
> | @@ -40,6 +42,19 @@ struct cr_ctx {
> | struct list_head pgarr_pool; /* pool of empty page arrays chain */
> |
> | struct path fs_mnt; /* container root (FIXME) */
> | +
> | + /* [multi-process checkpoint] */
> | + struct task_struct **tasks_arr; /* array of all tasks [checkpoint] */
> | + int tasks_nr; /* size of tasks array */
> | +
> | + /* [multi-process restart] */
> | + struct cr_hdr_pids *pids_arr; /* array of all pids [restart] */
> | + int pids_nr; /* size of pids array */
>
> Nit: Since we already have a pid_nr() that refers to something different,
> can we call this 'nr_pids' (and nr_tasks above) like mm_context->nr_threads ?
> Of course, there is no convention, so its easy to argue the other way.
Ok.
>
> Secondly, isn't pids_nr same as tasks_nr ? If so do we need both ?
As the comment says: one is used exclusively for checkpoint and the
other exclusively for restart.
So we don't strictly need both. I thought that for readability of it's
useful to have @pids_nr (ok, @nr_pids ...) when dealing with a @pids_arr,
and a @tasks_nr (ok .. @nr_tasks ...) when dealing with @tasks_arr.
>
> Or is this intended to address the issue of multiple pid_nr values that a
> task in a nested container can have ? If so, pids_nr is > tasks_nr and that
> brings up two comments :-)
Ugh. This topic is TBD.
>
> First, mktree.c and cr_next_task() are using 'ctx->pids_nr' to determine how
> many tasks to start. If we are talking about nested containers, pids_nr
> will be greater than tasks_nr so, mktree and cr_next_task() should be
> use 'ctx->tasks_nr' to determine how many tasks to create. Also if
> checkpointing a nested container we should view the multiple nested pid
> values a process as an attribute of the task and maybe save them in
> cr_write_task() rather than in cr_write_tree().
Lol .. who's talking about nested containers ? ;)
(seriously: I'm not considering that now; my gut feeling is that it may
be useful to do pid_ns in userspace, like task creation - and in that
case it makes sense to keep it in cr_write_tree(). then again, I have
not looked at it in depth).
>
> My second comment is more an orthogonal question. Suppose init_pid_ns = level
> 0 and we have a container that is nested at level 3. If we checkpoint just
> this container, we would want to be able to restore this container at any level
> 0 right ?
True. Do you see any limitation in the current code that prevents this ?
>
> | + int pids_pos; /* position pids array */
> | + pid_t pids_active; /* pid of (next) active task */
>
> Do we need both pids_pos and pids_active in the ctx ? Can pids_active
> just be a local variable in cr_next_task() and cr_wait_task() ?
> IOW, isn't this always true
>
> pids_arr[pids_pos] == pids_active
Ok.
Oren.
>
> | + atomic_t tasks_count; /* sync of tasks: used to coordinate */
>
> Name is a bit confusing with 'tasks_nr', but the comment helps and I can't
> think of a better name.
>
> | + struct completion complete; /* container root and other tasks on */
> | + wait_queue_head_t waitq; /* start, end, and restart ordering */
> | };
>
> Sukadev
>
next prev parent reply other threads:[~2009-04-07 5:31 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-31 5:28 [RFC v14-rc2][PATCH 00/29] Kernel based checkpoint/restart Oren Laadan
[not found] ` <1238477349-11029-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 01/29] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 02/29] Checkpoint/restart: initial documentation Oren Laadan
[not found] ` <1238477349-11029-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:22 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 03/29] Make file_pos_read/write() public Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 04/29] General infrastructure for checkpoint restart Oren Laadan
[not found] ` <1238477349-11029-5-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:24 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 05/29] x86 support for checkpoint/restart Oren Laadan
[not found] ` <1238477349-11029-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:25 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 06/29] Dump memory address space Oren Laadan
[not found] ` <1238477349-11029-7-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:26 ` Sukadev Bhattiprolu
[not found] ` <20090407032636.GD12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 4:57 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 07/29] Restore " Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 08/29] Infrastructure for shared objects Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 09/29] Dump open file descriptors Oren Laadan
[not found] ` <1238477349-11029-10-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:28 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 10/29] actually use f_op in checkpoint code Oren Laadan
[not found] ` <1238477349-11029-11-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-31 18:31 ` Oren Laadan
2009-04-01 18:54 ` Serge E. Hallyn
2009-04-07 3:29 ` Sukadev Bhattiprolu
[not found] ` <20090407032912.GF12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:36 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 11/29] add generic checkpoint f_op to ext fses Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 12/29] Restore open file descriptors Oren Laadan
[not found] ` <1238477349-11029-13-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:29 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 13/29] External checkpoint of a task other than ourself Oren Laadan
[not found] ` <1238477349-11029-14-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:30 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 14/29] Checkpoint multiple processes Oren Laadan
[not found] ` <1238477349-11029-15-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:31 ` Sukadev Bhattiprolu
[not found] ` <20090407033111.GI12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:12 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 15/29] Restart " Oren Laadan
[not found] ` <1238477349-11029-16-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:33 ` Sukadev Bhattiprolu
[not found] ` <20090407033315.GJ12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:31 ` Oren Laadan [this message]
[not found] ` <49DAE526.6010900-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 16:29 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 16/29] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
[not found] ` <1238477349-11029-17-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 13:59 ` Serge E. Hallyn
[not found] ` <20090401135952.GA16973-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-01 14:13 ` Oren Laadan
2009-04-01 18:36 ` Serge E. Hallyn
2009-04-03 15:46 ` Dan Smith
[not found] ` <87y6uhyc3j.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-03 16:25 ` Oren Laadan
[not found] ` <49D63865.1030807-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-03 16:30 ` Dan Smith
2009-04-03 16:54 ` Dave Hansen
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 17/29] Checkpoint open pipes Oren Laadan
[not found] ` <1238477349-11029-18-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 19:47 ` Serge E. Hallyn
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 18/29] Restore " Oren Laadan
[not found] ` <1238477349-11029-19-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 20:34 ` Serge E. Hallyn
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 19/29] Record 'struct file' object instead of the file name for VMAs Oren Laadan
[not found] ` <1238477349-11029-20-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 21:45 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 20/29] Prepare to support shared memory Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 21/29] Dump anonymous- and file-mapped- " Oren Laadan
[not found] ` <1238477349-11029-22-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:06 ` Serge E. Hallyn
[not found] ` <20090401230657.GB27725-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-01 23:18 ` Oren Laadan
[not found] ` <49D3F636.1070303-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:32 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 22/29] Restore " Oren Laadan
[not found] ` <1238477349-11029-23-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-02 16:59 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 23/29] s390: Expose a constant for the number of words representing the CRs Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 24/29] c/r: Add CR_COPY() macro (v4) Oren Laadan
[not found] ` <1238477349-11029-25-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:20 ` Serge E. Hallyn
[not found] ` <20090401232013.GA31361-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 19:00 ` Dan Smith
[not found] ` <87vdpmnan2.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-02 19:06 ` Serge E. Hallyn
[not found] ` <20090402190612.GA24390-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 20:22 ` Dan Smith
[not found] ` <87r60an6us.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-05 20:25 ` Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 25/29] s390: define s390-specific checkpoint-restart code (v7) Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 26/29] powerpc: provide APIs for validating and updating DABR Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 27/29] powerpc: checkpoint/restart implementation Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 28/29] powerpc: wire up checkpoint and restart syscalls Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 29/29] powerpc: enable checkpoint support in Kconfig Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49DAE526.6010900@cs.columbia.edu \
--to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.