Linux Container Development
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org
Cc: Joseph Ruscio <jruscio-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>,
	Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [RFC][PATCH 2/2] CR: handle a single task with private memory maps
Date: Tue, 05 Aug 2008 12:20:55 -0400	[thread overview]
Message-ID: <48987DE7.3060408@cs.columbia.edu> (raw)
In-Reply-To: <20080805091955.GA5027@localdomain>



Louis Rilling wrote:
> On Mon, Aug 04, 2008 at 08:51:37PM -0700, Joseph Ruscio wrote:
>> As somewhat of a tangent to this discussion, I've been giving some  
>> thought to the general strategy we talked about during the summit. The  
>> checkpointing solution we built at Evergrid sits completely in userspace 
>> and is soley focused on checkpointing parallel codes (e.g. MPI). That 
>> approach required us to virtualize a whole slew of resources (e.g. PIDs) 
>> that will be far better supported in the kernel through this effort. On 
>> the other hand, there isn't anything inherent to checkpointing the memory 
>> in a process that requires it to be in a kernel. During a restart, you 
>> can map and load the memory from the checkpoint file in userspace as 
>> easily as in the kernel. Since the cost of checkpointing HPC codes is 
> 
> Hmm, for unusual mappings this may be not so easy to reproduce from
> userspace if binaries are statically linked. I agree that with
> dynamically linked applications, LD_PRELOAD allows one to record the
> actual memory mappings and restore them at restart.

I second that: unusual mapping can be hard to reproduce.

Besides, several important optimization are difficult to do in user-space,
if at all possible:

* detecting sharing (unless the application itself gives the OS an advice -
more on this below); In the kernel, this is detected easily using the inode
that represents a shared memory region in SHMFS

* detecting (and restoring) COW sharing: process A forks process B, so at
least initially the private memory of both is the same via COW; this can be
optimized to save the memory of only one instead of both, and restore this
COW relationship on restart.

* reducing checkpoint downtime using the COW technique that I described at
the summit: when processes are frozen, mark all dirty pages COW and keep a
reference, and write-back the contents only after the container is unfrozen.

Eh... and, yes, live migration :)

> 
>> fairly dominated by checkpointing their large memory footprints, memory 
>> checkpointing is an area of ongoing research with many different 
>> solutions.
>>
>> It might be desirable for the checkpointing implementation to be modular 
>> enough that a userspace application or library could select to handle 
>> certain resources on their own. Memory is the primary one that comes to 
>> mind.
> 
> I definitely agree with you about this flexibility. Actually in
> Kerrighed, during the next 3 years, we are going to study an API for
> collaborative checkpoint/restart between kernel and userspace, in order to
> allow such HPC apps to checkpoint huge memory efficiently (eg. when reaching
> states where saving small parts is enough), or to rebuild their data from
> partial/older states.
> I hope that this study will bring useful ideas that could be applied to
> containers as well.

Indeed it would add flexibility if an interface exists. One example is for
network connections in the case of a distributed MPI application, or if a
specific (otherwise unsupported for CR) device is involved.

As for memory, a clever way to hint the system about what parts of memory
are important, is to use something like an madvice() with a new flag, to
mark areas of interest/dis-interest.  Throw in a mechanism to notify tasks
(who request to be notified) of an upcoming checkpoint, end of successful
checkpoint, and completion of a successful restart - and you've got it all.

Oren.

> 
> Thanks,
> 
> Louis
> 

  reply	other threads:[~2008-08-05 16:20 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-30  3:27 [RFC][PATCH 2/2] CR: handle a single task with private memory maps Oren Laadan
     [not found] ` <Pine.LNX.4.64.0807292325290.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-07-30  4:51   ` KOSAKI Motohiro
     [not found]     ` <20080730132257.9DF2.KOSAKI.MOTOHIRO-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2008-07-30 18:22       ` Oren Laadan
2008-07-30 20:58   ` Dave Hansen
2008-07-30 22:07   ` Serge E. Hallyn
     [not found]     ` <20080730220752.GA3518-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 22:20       ` Oren Laadan
     [not found]         ` <4890E930.9090204-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 13:57           ` Louis Rilling
     [not found]             ` <20080731135703.GC22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 15:09               ` Oren Laadan
     [not found]                 ` <4891D5C2.8090000-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 15:58                   ` Louis Rilling
     [not found]                     ` <20080731155856.GH22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 16:28                       ` Oren Laadan
     [not found]                         ` <4891E849.1050701-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 17:50                           ` Louis Rilling
     [not found]                             ` <20080731175058.GI22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 19:12                               ` Oren Laadan
     [not found]                                 ` <48920EA0.1060608-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-01 10:26                                   ` Louis Rilling
     [not found]                                     ` <20080801102600.GJ22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-08-01 14:15                                       ` Oren Laadan
     [not found]                                         ` <48931A7E.1040302-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-01 18:00                                           ` Louis Rilling
     [not found]                                             ` <20080801180038.GL22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-08-01 18:51                                               ` Oren Laadan
     [not found]                                                 ` <48935B4D.7070302-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-04 10:16                                                   ` Louis Rilling
2008-08-05  2:37                                                     ` Oren Laadan
     [not found]                                                       ` <4897BCE0.1080508-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-05  3:51                                                         ` Joseph Ruscio
     [not found]                                                           ` <1FA56146-7C30-4C36-982D-A50AA8BC8392-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-05  9:19                                                             ` Louis Rilling
2008-08-05 16:20                                                               ` Oren Laadan [this message]
     [not found]                                                                 ` <48987DE7.3060408-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-06 15:41                                                                   ` Joseph Ruscio
     [not found]                                                                     ` <3A99F254-E9B3-484B-85B0-29023ADA04C4-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-07  9:25                                                                       ` Louis Rilling
2008-08-05 16:23                                                             ` Dave Hansen
2008-08-06 16:15                                                               ` Joseph Ruscio
     [not found]                                                                 ` <FE4D936E-06F1-45D2-8E7C-85D87149BDC0-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-07  9:29                                                                   ` Louis Rilling
2008-08-08 17:20                                                               ` Joseph Ruscio
     [not found]                                                                 ` <03CE5BD3-E84A-4617-93BC-722ECB846C63-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-08 17:24                                                                   ` Dave Hansen
2008-08-05  9:32                                                         ` Louis Rilling
2008-07-31 21:25           ` Serge E. Hallyn
     [not found] ` <20080730161535.GB22403@hawkmoon.kerlabs.com>
     [not found]   ` <20080730161535.GB22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-30 18:27     ` Oren Laadan
     [not found]       ` <4890B2A8.8010808-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 14:08         ` Louis Rilling
     [not found]           ` <20080731140844.GE22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 14:44             ` Oren Laadan
  -- strict thread matches above, loose matches on Subject: below --
2008-07-30 16:52 Serge E. Hallyn
     [not found] ` <20080730165249.GA23802-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 17:40   ` Dave Hansen
2008-07-31 13:59     ` Louis Rilling
     [not found]       ` <20080731135910.GD22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 14:14         ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48987DE7.3060408@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=jruscio-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox