Linux Container Development
 help / color / mirror / Atom feed
From: Nathan Lynch <ntl@pobox.com>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: containers@lists.osdl.org, linuxppc-dev@ozlabs.org
Subject: Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation
Date: Mon, 16 Mar 2009 13:37:45 -0500	[thread overview]
Message-ID: <20090316133745.4f636979@thinkcentre.lan> (raw)
In-Reply-To: <49B9D37A.1070503@cs.columbia.edu>

Oren Laadan <orenl@cs.columbia.edu> wrote:
> 
> Nathan Lynch wrote:
> > Nathan Lynch <ntl@pobox.com> wrote:
> >> Oren Laadan wrote:
> >>> Nathan Lynch wrote:
> >>>> What doesn't work:
> >>>> * restarting a 32-bit task from a 64-bit task and vice versa
> >>> Is there a test to bail if we attempt to checkpoint such tasks ?
> >> No, but I'll add one if it looks too hard to fix for the next round.
> > 
> > Unfortunately, adding a check for this is hard.
> > 
> > The "point of no return" in the restart path is cr_read_mm, which tears
> > down current's address space.  cr_read_mm runs way before cr_read_cpu,
> > which is the only restart method I've implemented for powerpc so far.
> > So, checking for this condition in cr_read_cpu is too late if I want
> > restart(2) to return an error and leave the caller's memory map
> > intact.  (And I do want this: restart should be as robust as execve.)
> 
> In the case of restarting a container, I think it's ok if a restarting
> tasks dies in an "ugly" way -- this will be observed and handled by the
> initiating task outside the container, which will gracefully report to
> the caller/user.

How would task exit be observed?  Are all tasks in a restarted
container guaranteed to be children (in the sense that wait(2) would
work) of the initiating task?


> Even if you close this hole, then any other failure later on during
> restart - even a failure to allocate kernel memory due to memory pressure,
> will give that undesired effect that you are trying to avoid.

Kernel memory allocation failure is not the kind of problem I'm trying
to address.  I am trying to address the case of restarting a checkpoint
image that needs features that are not present, where the set of
features used by the checkpoint image can be compared against the set
of features the platform provides.


> That said, any difference in the architecture that may cause restart to
> fail is probably best placed in cr_write_head_arch.

I think I explained in my earlier mail why the current implementation's
cr_write_head_arch doesn't help in this case:

> > Well okay then, cr_read_head_arch seems to be the right place in the
> > restart sequence for the architecture code to handle this.  However,
> > cr_write_head_arch (which produces the buffer that cr_read_head_arch
> > consumes) is not provided a reference to the task to be checkpointed,
> > nor can it assume that it's operating on current.  I need a reference
> > to a task before I can determine whether it's running in 32- or 64-bit
> > mode, or using the FPU, Altivec, SPE, whatever.
> > 
> > In any case, mixing 32- and 64-bit tasks across restart is something I
> > eventually want to support, not reject.  But the problem I've outlined
> > applies to FPU state and vector extensions (VMX, SPE), as well as
> > sanity-checking debug register (DABR) contents.  We'll need to be able
> > to error out gracefully from restart when a checkpoint image specifies a
> > feature unsupported by the current kernel or hardware.  But I don't see
> > how to do it with the current architecture.  Am I missing something?
> > 
> 
> More specifically, I envision restart to work like this:
> 
> 1) user invokes user-land utility (e.g. "cr --restart ..."
> 2) 'cr' will create a new container
> 3) 'cr' will start a child in that container
> 4) child will create rest of tree (in kernel or in user space - tbd)
> 5) each task in that tree will restore itself
> 6) 'cr' monitors this process
> 7) if all goes well - 'cr' report ok.
> 8) if something goes bad, 'cr' notices and notifies caller/user

Again, how would 'cr' obtain exit status for these tasks, and how would
it distinguish failure from normal operation?

  parent reply	other threads:[~2009-03-16 18:37 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1233182478-27113-1-git-send-email-ntl@pobox.com>
     [not found] ` <1233182478-27113-1-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-01-28 22:41   ` [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation Nathan Lynch
     [not found]     ` <1233182478-27113-2-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-01-29  6:41       ` Oren Laadan
2009-01-29 21:40         ` Nathan Lynch
2009-01-30  4:01         ` Serge E. Hallyn
     [not found]         ` <49814FA2.9060108-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-29 21:40           ` Nathan Lynch
2009-01-30  4:01           ` Serge E. Hallyn
     [not found]         ` <20090129214035.GB6913@localdomain>
2009-01-30  0:11           ` Oren Laadan
2009-01-30  0:11           ` Oren Laadan
     [not found]           ` <49824599.5030503@cs.columbia.edu>
2009-01-30 20:25             ` Nathan Lynch
     [not found]             ` <49824599.5030503-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-30 20:25               ` Nathan Lynch
2009-02-17  7:03           ` Nathan Lynch
2009-02-17  7:03           ` Nathan Lynch
     [not found]           ` <20090217010355.58afd5cf@thinkcentre.lan>
2009-02-24 19:58             ` Serge E. Hallyn
2009-02-24 21:11               ` Nathan Lynch
     [not found]                 ` <20090224151152.29e98b5f-4v5LP+xe+1byhTdZtsIeww@public.gmane.org>
2009-03-13  3:36                   ` Oren Laadan
     [not found]             ` <20090217010355.58afd5cf-4v5LP+xe+1byhTdZtsIeww@public.gmane.org>
2009-02-17 20:02               ` [PATCH 1/3 v2] powerpc: heckpoint/restart implementation Nathan Lynch
2009-03-13  3:31               ` [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation Oren Laadan
     [not found]                 ` <49B9D37A.1070503-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-13 15:42                   ` Cedric Le Goater
2009-03-16 18:37                 ` Nathan Lynch [this message]
2009-03-17  6:55                   ` Cedric Le Goater
2009-03-18  9:15                     ` Oren Laadan
2009-01-30  3:55       ` Serge E. Hallyn
2009-02-04  3:39       ` Benjamin Herrenschmidt
2009-01-30  3:55     ` Serge E. Hallyn
2009-02-04  3:39     ` Benjamin Herrenschmidt
     [not found]     ` <1233718789.16867.156.camel@pasglop>
2009-02-04 15:54       ` Serge E. Hallyn
2009-02-04 20:58         ` Benjamin Herrenschmidt
     [not found]         ` <20090204155406.GA2039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-02-04 20:58           ` Benjamin Herrenschmidt
     [not found]         ` <1233781099.4612.1.camel@pasglop>
2009-02-04 23:44           ` Oren Laadan
2009-02-04 23:44           ` Oren Laadan
     [not found]           ` <498A284E.4050501@cs.columbia.edu>
2009-02-05  0:16             ` Benjamin Herrenschmidt
     [not found]             ` <498A284E.4050501-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-02-05  0:16               ` Benjamin Herrenschmidt
     [not found]             ` <1233793012.4612.32.camel@pasglop>
2009-02-05  3:30               ` Oren Laadan
2009-02-05  3:30               ` Oren Laadan
2009-02-05 16:09               ` Serge E. Hallyn
2009-02-05 16:09               ` Serge E. Hallyn
     [not found]               ` <20090205160946.GF27410@us.ibm.com>
     [not found]                 ` <20090205160946.GF27410-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-02-05 21:01                   ` Benjamin Herrenschmidt
2009-02-05 21:01                 ` Benjamin Herrenschmidt
2009-01-28 22:41   ` [PATCH 2/3] powerpc: wire up checkpoint and restart syscalls Nathan Lynch
2009-01-28 22:41   ` [PATCH 3/3] allow checkpoint/restart on powerpc Nathan Lynch
2009-01-28 22:41 ` Nathan Lynch
     [not found] ` <1233182478-27113-4-git-send-email-ntl@pobox.com>
     [not found]   ` <1233182478-27113-4-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-01-30  4:10     ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090316133745.4f636979@thinkcentre.lan \
    --to=ntl@pobox.com \
    --cc=containers@lists.osdl.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=orenl@cs.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox