Linux Container Development
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Sukadev Bhattiprolu
	<sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Dave Hansen
	<dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: Re: [RFC v14-rc2][PATCH 15/29] Restart multiple processes
Date: Tue, 07 Apr 2009 01:31:18 -0400	[thread overview]
Message-ID: <49DAE526.6010900@cs.columbia.edu> (raw)
In-Reply-To: <20090407033315.GJ12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>



Sukadev Bhattiprolu wrote:
> Couple of nits and couple of not-so minor comments 
> 
> Oren Laadan [orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org] wrote:
> | From 7162fef93ee3d9fd30a457dd7b0c7ad0200d5bcb Mon Sep 17 00:00:00 2001
> | From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> | Date: Mon, 30 Mar 2009 15:06:13 -0400
> | Subject: [PATCH 15/29] Restart multiple processes
> | 
> | Restarting of multiple processes expects all restarting tasks to call
> | sys_restart(). Once inside the system call, each task will restart
> | itself at the same order that they were saved. The internals of the
> | syscall will take care of in-kernel synchronization bewteen tasks.
> | 

[...]

> |  
> |  struct cr_ctx {
> |  	int crid;		/* unique checkpoint id */
> | @@ -31,8 +34,7 @@ struct cr_ctx {
> |  	void *hbuf;		/* temporary buffer for headers */
> |  	int hpos;		/* position in headers buffer */
> |  
> | -	struct task_struct **tasks_arr;	/* array of all tasks in container */
> | -	int tasks_nr;			/* size of tasks array */
> | +	atomic_t refcount;
> |  
> |  	struct cr_objhash *objhash;	/* hash for shared objects */
> |  
> | @@ -40,6 +42,19 @@ struct cr_ctx {
> |  	struct list_head pgarr_pool;	/* pool of empty page arrays chain */
> |  
> |  	struct path fs_mnt;	/* container root (FIXME) */
> | +
> | +	/* [multi-process checkpoint] */
> | +	struct task_struct **tasks_arr; /* array of all tasks [checkpoint] */
> | +	int tasks_nr;                   /* size of tasks array */
> | +
> | +	/* [multi-process restart] */
> | +	struct cr_hdr_pids *pids_arr;	/* array of all pids [restart] */
> | +	int pids_nr;			/* size of pids array */
> 
> Nit: Since we already have a pid_nr() that refers to something different,
> can we call this 'nr_pids' (and nr_tasks above)  like mm_context->nr_threads ?
> Of course, there is no convention, so its easy to argue the other way.

Ok.

> 
> Secondly, isn't pids_nr same as tasks_nr ? If so do we need both ?

As the comment says: one is used exclusively for checkpoint and the
other exclusively for restart.
So we don't strictly need both. I thought that for readability of it's
useful to have @pids_nr (ok, @nr_pids ...) when dealing with a @pids_arr,
and a @tasks_nr (ok .. @nr_tasks ...) when dealing with @tasks_arr.

> 
> Or is this intended to address the issue of multiple pid_nr values that a
> task in a nested container can have ? If so, pids_nr is > tasks_nr and that
> brings up two comments :-)

Ugh. This topic is TBD.

> 
> First, mktree.c and cr_next_task() are using 'ctx->pids_nr' to determine how
> many tasks to start. If we are talking about nested containers, pids_nr
> will be greater than tasks_nr so, mktree and cr_next_task() should be
> use 'ctx->tasks_nr' to determine how many tasks to create. Also if
> checkpointing a nested container we should view the multiple nested pid
> values a process as an attribute of the task and maybe save them in
> cr_write_task() rather than in cr_write_tree().

Lol .. who's talking about nested containers ?   ;)

(seriously: I'm not considering that now; my gut feeling is that it may
be useful to do pid_ns in userspace, like task creation - and in that
case it makes sense to keep it in cr_write_tree(). then again, I have
not looked at it in depth).

> 
> My second comment is more an orthogonal question. Suppose init_pid_ns = level
>  0 and we have a container that is nested at level 3.  If we checkpoint just
> this container, we would want to be able to restore this container at any level
> 0 right ?

True. Do you see any limitation in the current code that prevents this ?

> 
> | +	int pids_pos;			/* position pids array */
> | +	pid_t pids_active;		/* pid of (next) active task */
> 
> Do we need both pids_pos and pids_active in the ctx ? Can pids_active
> just be a local variable in cr_next_task() and cr_wait_task() ?
> IOW, isn't this always true
> 
> 	pids_arr[pids_pos] == pids_active

Ok.

Oren.

> 
> | +	atomic_t tasks_count;		/* sync of tasks: used to coordinate */
> 
> Name is a bit confusing with 'tasks_nr', but the comment helps and I can't
> think of a better name.
> 
> | +	struct completion complete;	/* container root and other tasks on */
> | +	wait_queue_head_t waitq;	/* start, end, and restart ordering */
> |  };
> 
> Sukadev
> 

  parent reply	other threads:[~2009-04-07  5:31 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-31  5:28 [RFC v14-rc2][PATCH 00/29] Kernel based checkpoint/restart Oren Laadan
     [not found] ` <1238477349-11029-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 01/29] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 02/29] Checkpoint/restart: initial documentation Oren Laadan
     [not found]     ` <1238477349-11029-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:22       ` Sukadev Bhattiprolu
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 03/29] Make file_pos_read/write() public Oren Laadan
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 04/29] General infrastructure for checkpoint restart Oren Laadan
     [not found]     ` <1238477349-11029-5-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:24       ` Sukadev Bhattiprolu
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 05/29] x86 support for checkpoint/restart Oren Laadan
     [not found]     ` <1238477349-11029-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:25       ` Sukadev Bhattiprolu
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 06/29] Dump memory address space Oren Laadan
     [not found]     ` <1238477349-11029-7-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:26       ` Sukadev Bhattiprolu
     [not found]         ` <20090407032636.GD12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07  4:57           ` Oren Laadan
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 07/29] Restore " Oren Laadan
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 08/29] Infrastructure for shared objects Oren Laadan
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 09/29] Dump open file descriptors Oren Laadan
     [not found]     ` <1238477349-11029-10-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:28       ` Sukadev Bhattiprolu
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 10/29] actually use f_op in checkpoint code Oren Laadan
     [not found]     ` <1238477349-11029-11-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-31 18:31       ` Oren Laadan
2009-04-01 18:54       ` Serge E. Hallyn
2009-04-07  3:29       ` Sukadev Bhattiprolu
     [not found]         ` <20090407032912.GF12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07  5:36           ` Oren Laadan
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 11/29] add generic checkpoint f_op to ext fses Oren Laadan
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 12/29] Restore open file descriptors Oren Laadan
     [not found]     ` <1238477349-11029-13-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:29       ` Sukadev Bhattiprolu
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 13/29] External checkpoint of a task other than ourself Oren Laadan
     [not found]     ` <1238477349-11029-14-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:30       ` Sukadev Bhattiprolu
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 14/29] Checkpoint multiple processes Oren Laadan
     [not found]     ` <1238477349-11029-15-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:31       ` Sukadev Bhattiprolu
     [not found]         ` <20090407033111.GI12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07  5:12           ` Oren Laadan
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 15/29] Restart " Oren Laadan
     [not found]     ` <1238477349-11029-16-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07  3:33       ` Sukadev Bhattiprolu
     [not found]         ` <20090407033315.GJ12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07  5:31           ` Oren Laadan [this message]
     [not found]             ` <49DAE526.6010900-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 16:29               ` Sukadev Bhattiprolu
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 16/29] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
     [not found]     ` <1238477349-11029-17-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 13:59       ` Serge E. Hallyn
     [not found]         ` <20090401135952.GA16973-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-01 14:13           ` Oren Laadan
2009-04-01 18:36       ` Serge E. Hallyn
2009-04-03 15:46       ` Dan Smith
     [not found]         ` <87y6uhyc3j.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-03 16:25           ` Oren Laadan
     [not found]             ` <49D63865.1030807-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-03 16:30               ` Dan Smith
2009-04-03 16:54               ` Dave Hansen
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 17/29] Checkpoint open pipes Oren Laadan
     [not found]     ` <1238477349-11029-18-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 19:47       ` Serge E. Hallyn
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 18/29] Restore " Oren Laadan
     [not found]     ` <1238477349-11029-19-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 20:34       ` Serge E. Hallyn
2009-03-31  5:28   ` [RFC v14-rc2][PATCH 19/29] Record 'struct file' object instead of the file name for VMAs Oren Laadan
     [not found]     ` <1238477349-11029-20-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 21:45       ` Serge E. Hallyn
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 20/29] Prepare to support shared memory Oren Laadan
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 21/29] Dump anonymous- and file-mapped- " Oren Laadan
     [not found]     ` <1238477349-11029-22-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:06       ` Serge E. Hallyn
     [not found]         ` <20090401230657.GB27725-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-01 23:18           ` Oren Laadan
     [not found]             ` <49D3F636.1070303-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:32               ` Serge E. Hallyn
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 22/29] Restore " Oren Laadan
     [not found]     ` <1238477349-11029-23-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-02 16:59       ` Serge E. Hallyn
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 23/29] s390: Expose a constant for the number of words representing the CRs Oren Laadan
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 24/29] c/r: Add CR_COPY() macro (v4) Oren Laadan
     [not found]     ` <1238477349-11029-25-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:20       ` Serge E. Hallyn
     [not found]         ` <20090401232013.GA31361-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 19:00           ` Dan Smith
     [not found]             ` <87vdpmnan2.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-02 19:06               ` Serge E. Hallyn
     [not found]                 ` <20090402190612.GA24390-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 20:22                   ` Dan Smith
     [not found]                     ` <87r60an6us.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-05 20:25                       ` Oren Laadan
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 25/29] s390: define s390-specific checkpoint-restart code (v7) Oren Laadan
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 26/29] powerpc: provide APIs for validating and updating DABR Oren Laadan
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 27/29] powerpc: checkpoint/restart implementation Oren Laadan
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 28/29] powerpc: wire up checkpoint and restart syscalls Oren Laadan
2009-03-31  5:29   ` [RFC v14-rc2][PATCH 29/29] powerpc: enable checkpoint support in Kconfig Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49DAE526.6010900@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox