All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>,
	Kapil Arya <kapil@ccs.neu.edu>, Gene Cooperman <gene@ccs.neu.edu>,
	linux-kernel@vger.kernel.org, xemul@sw.ru,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Sat, 20 Nov 2010 13:05:15 -0500 (EST)	[thread overview]
Message-ID: <4CE69B9B.7020302@cs.columbia.edu> (raw)
In-Reply-To: <4CE683E1.6010500@kernel.org>

Hi,

Based on discussion with Gene, I'd like to clarify key points and
difference between kernel and userspace approaches (specifically
linux-cr and dmtcp): three parts to break the long post...

part I: perpsectice about the types of scopes of c/r in discussion
part II: linux-cr design adn objectives
part III: comparison kernel/userspace approaches

[now relax, grab (another) cup of coffee and read on...]

PART I:  ==PERSPECTIVE==

A rough classification of c/r categories:

* container-c/r: important use-case, e.g. c/r and migration of an
  application containers like VPS (virtual private server), VDI
  (desktop) or  other self-contained application (e.g. Oracle server).
  Here _all_ the relevant processes are included in the checkpoint.

* standalone-c/r: another use-case is standalone-c/r where a set of
  processes is checkpointed, but not the entire environment, and then
  those processes are restarted in a different "eco-system".

* distributed-c/r: meaning several sets of processes, each running
  on a different host. (Each set may be a separate container there).

In container-c/r, the main challenge is to be _reliable_ in the sense
that a restart from a successful checkpoint should always succeed.

In standalone-c/r, the main challenge is that an application resumes
execution after a restart in a possible _different_ eco-system. Some
application don't care (e.g 'bc'). Other applications do care, and to
different degrees; for these we need "glue" to pacify the application.

There are generally three types of "glue":

(1) Modify the application or selected libraries to be c/r-aware, and
  notify it when restart completes. (e.g. CoCheck MPI library).
(2) Add a userspace helper that will run post-restart to do necessary
  trickery (eg. send a SIGWINCH to 'screen'; mount proper filesystem
  at the new host after migration; reconnect a socket to a peer).
(3) Use interposition on selected library calls and add wrapper code
  that will glue in what's missing (e.g. dbus or nscd calls to
  reconnect an application to those services).

IMPORTANT: the glueing method is _orthogonal_ to how the c/r is done !
We are strictly discussion the core c/r functionality.

(next part: linux-cr philosophy...)

Thanks,

Oren.

  parent reply	other threads:[~2010-11-20 18:05 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Tejun Heo
2010-11-02 21:47   ` Christoph Hellwig
2010-11-04  1:47     ` Nathan Lynch
2010-11-04  7:36       ` Tejun Heo
2010-11-04 16:04         ` Gene Cooperman
2010-11-04 20:45         ` Nathan Lynch
2010-11-06  6:48           ` Matt Helsley
2010-11-04  4:34     ` Oren Laadan
2010-11-04 14:25       ` Christoph Hellwig
2010-11-04  3:40   ` Kapil Arya
2010-11-04  8:05     ` Tejun Heo
2010-11-04 16:44       ` Gene Cooperman
2010-11-05  9:28         ` Tejun Heo
2010-11-05 23:18           ` Oren Laadan
2010-11-06 10:13             ` Tejun Heo
2010-11-06  0:36           ` Kapil Arya
2010-11-06 22:55             ` Oren Laadan
2010-11-07 19:42               ` Gene Cooperman
2010-11-07 21:30                 ` Oren Laadan
2010-11-07 23:05                   ` Gene Cooperman
2010-11-08  3:55                     ` Oren Laadan
2010-11-08 16:26                       ` Gene Cooperman
2010-11-08 18:14                         ` Oren Laadan
2010-11-08 18:37                           ` Gene Cooperman
2010-11-08 19:34                             ` Oren Laadan
2010-11-08 19:05                         ` Dan Smith
2010-11-17 11:14                           ` Tejun Heo
2010-11-17 15:33                             ` Dan Smith
2010-11-17 15:40                               ` Tejun Heo
2010-11-17 17:04                                 ` Alexey Dobriyan
2010-11-17 10:45             ` Tejun Heo
2010-11-17 12:12               ` Tejun Heo
2010-11-06  5:32           ` Matt Helsley
2010-11-06 15:01             ` Oren Laadan
2010-11-06 20:40             ` Gene Cooperman
2010-11-06 22:41               ` Oren Laadan
2010-11-07 18:49                 ` Gene Cooperman
     [not found]                   ` <20101107184927.GF31077-Rl5vdzG4YPwx/1z6v04GWfZ8FUJU4vz8@public.gmane.org>
2010-11-07 21:59                     ` Oren Laadan
2010-11-07 21:59                       ` Oren Laadan
2010-11-17 11:57                       ` Tejun Heo
2010-11-17 15:39                         ` Serge E. Hallyn
2010-11-17 15:46                           ` Tejun Heo
2010-11-18  9:13                             ` Pavel Emelyanov
     [not found]                               ` <4CE4EE21.6050305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2010-11-18  9:48                                 ` Tejun Heo
2010-11-18  9:48                                   ` Tejun Heo
2010-11-18 20:13                                   ` Jose R. Santos
2010-11-19  3:54                                   ` Serge Hallyn
2010-11-18 19:53                             ` Oren Laadan
2010-11-19  4:10                             ` Serge Hallyn
2010-11-19 14:04                               ` Tejun Heo
     [not found]                                 ` <4CE683E1.6010500-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2010-11-19 14:36                                   ` Kirill Korotaev
2010-11-19 14:36                                     ` Kirill Korotaev
     [not found]                                     ` <04F4899E-B5C7-4BAF-8F2F-05D507A91408-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2010-11-19 15:33                                       ` Tejun Heo
2010-11-19 15:33                                         ` Tejun Heo
2010-11-19 16:00                                         ` Alexey Dobriyan
2010-11-19 16:01                                           ` Alexey Dobriyan
2010-11-19 16:10                                             ` Tejun Heo
2010-11-19 16:25                                               ` Alexey Dobriyan
2010-11-19 16:06                                           ` Tejun Heo
2010-11-19 16:16                                             ` Alexey Dobriyan
2010-11-19 16:19                                               ` Tejun Heo
2010-11-19 16:27                                                 ` Alexey Dobriyan
     [not found]                                                   ` <AANLkTin7kd3crS+fTLLea5PhAii7B3dz=n7p7YtQ6d4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-19 16:32                                                     ` Tejun Heo
2010-11-19 16:32                                                       ` Tejun Heo
2010-11-19 16:38                                                       ` Alexey Dobriyan
2010-11-19 16:50                                                         ` Tejun Heo
2010-11-19 16:50                                                           ` Tejun Heo
2010-11-19 16:55                                                           ` Alexey Dobriyan
2010-11-20 17:58                                         ` Oren Laadan
2010-11-20 18:08                                   ` Oren Laadan
2010-11-20 18:08                                     ` Oren Laadan
2010-11-20 18:11                                   ` Oren Laadan
2010-11-20 18:11                                     ` Oren Laadan
     [not found]                                     ` <4CE69B8C.6050606-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-11-20 18:15                                       ` Oren Laadan
2010-11-20 18:15                                         ` Oren Laadan
2010-11-20 19:33                                         ` Tejun Heo
2010-11-21  8:18                                           ` Gene Cooperman
2010-11-21  8:18                                             ` Gene Cooperman
2010-11-21  8:21                                             ` Gene Cooperman
2010-11-22 18:02                                               ` Sukadev Bhattiprolu
2010-11-23 17:53                                               ` Oren Laadan
2010-11-24  3:50                                                 ` Kapil Arya
2010-11-25 16:04                                                   ` Oren Laadan
2010-11-29  4:09                                                     ` Gene Cooperman
2010-11-21 22:41                                             ` Grant Likely
2010-11-22 17:34                                             ` Oren Laadan
2010-11-22 17:18                                           ` Oren Laadan
2010-11-20 18:05                                 ` Oren Laadan [this message]
2010-11-17 22:17                         ` Matt Helsley
     [not found]                           ` <20101117221713.GA27736-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-11-18 10:06                             ` Tejun Heo
2010-11-18 10:06                               ` Tejun Heo
2010-11-18 20:25                             ` Oren Laadan
2010-11-18 20:25                               ` Oren Laadan
2010-11-07 21:44               ` Oren Laadan
2010-11-07 23:31                 ` Gene Cooperman
2010-11-05 22:24       ` Oren Laadan
2010-11-04  4:03   ` Oren Laadan
2010-11-04  9:43     ` Tejun Heo
2010-11-04 12:48       ` Luck, Tony
2010-11-04 13:06         ` Tejun Heo
2010-11-06 10:12       ` Matt Helsley
2010-11-06 11:03         ` Tejun Heo
2010-11-07 22:59         ` Davide Libenzi
2010-11-08  2:32           ` david
2010-11-18 20:41             ` Oren Laadan
2010-11-05  3:55     ` Kapil Arya
2010-11-05 11:57       ` Luck, Tony
2010-11-05 17:17         ` Gene Cooperman
2010-11-06  1:16           ` Matt Helsley
2010-11-06  4:06             ` Oren Laadan
2010-11-06  5:18               ` Matt Helsley
2010-11-06 21:00           ` Oren Laadan
2010-11-05 17:31       ` Sukadev Bhattiprolu
2010-11-06 21:05       ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01   ` Nathan Lynch
2010-11-11  6:27   ` Nathan Lynch
2010-11-17  5:29   ` Anton Blanchard
2010-11-17 11:08     ` Tejun Heo
2010-11-18  9:53     ` Alan Cox
2010-11-18 12:27       ` Alexey Dobriyan
2010-11-19  6:33     ` Gene Cooperman
2010-11-21 23:20     ` Grant Likely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE69B9B.7020302@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=containers@lists.osdl.org \
    --cc=ebiederm@xmission.com \
    --cc=gene@ccs.neu.edu \
    --cc=kapil@ccs.neu.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serge.hallyn@canonical.com \
    --cc=tj@kernel.org \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.