public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Cedric Le Goater <clg@fr.ibm.com>
To: Ray Bryant <raybry@mpdtxmail.amd.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>,
	linux-kernel@vger.kernel.org,
	Hubertus Franke <frankeh@watson.ibm.com>,
	Dave Hansen <haveblue@us.ibm.com>
Subject: Re: [RFC] [PATCH 00/13] Introduce task_pid api
Date: Tue, 15 Nov 2005 23:55:46 +0100	[thread overview]
Message-ID: <437A6772.6020700@fr.ibm.com> (raw)
In-Reply-To: <200511151430.35504.raybry@mpdtxmail.amd.com>

Ray Bryant wrote:

> Personally, I think that these assumptions are incorrect for a 
> checkpoint/restart facility.   I think that:
> 
> (1)  It is really only possible to checkpoint/restart a cooperative process.

What do you mean by cooperative ? that the code should be modified to
cooperate with a checkpoint/restart tool ? do you have something else in mind ?

> For this to work with uncooperative processes you have to figure out (for 
> example) how to save and restore the file system state.

Files are definitely very difficult to handle efficiently but there are
ways to deal with them. One way is not to deal with them at all and let the
application organize its data in such a way that we don't have to
checkpoint the file system, share storage is one solution, copying files
from a checkpointed node to another node is an other. It can be very
inefficient but it works.

But, I agree with you, we don't want to checkpoint a filesystem.

> (e. g. how do you get the file position set correctly for an open file in
> the restored program instance?)

well, if the file is available, lseek() should do the job. Pipes are more
difficult to handle than regular files in fact.

> And this doesn't even consider what to do with open network connections.

network connections are indeed very tricky. The network code is very
complex, very large, plenty of protocols to handle but it can be done for
TCP/IP by blocking the traffic, checkpointing the data and checkpointing
the PCBs. But tell me more, what are the main issues for you ?

Private interconnect are a challenge.

> Similarly, what does one do about the content of System V shared memory 
> regions or the contents of System V semaphores? 

Well, they have to be constrained in a known set of processes, or a
container, to make sure we are not checkpointing a moving target.

> I'm sure there are many more such problems we can come up with a careful
> study of the Linux/Unix API.

Oh yes, the UNIX API is very large but in checkpoint/restart we care more
about implementation. This can be tricky.

> (Note that "cooperation" in this context can also mean "willing to run inside 
> of a container of some kind that supports checkpoint/restart".)

Indeed !

We need an isolation mecanism to make sure we control the boundaries of an
application. We don't want any leaks when we initiate a checkpoint.

> So you can probably only checkpoint the process at certain points in its 
> lifetime, points which the application should be willing to identify in some 
> way. 

We do need to reach a quiescience point. a SIGSTOP is enough or a container
wide schedule(), a la software suspend. But no more.

> And I would argue that at such points in time, you can require that 
> the current register state doesn't include the results of a system call such 
> as getpid(), couldn't you?

Well, what if that register holds a virtualized pid, this is no more an
issue, nop ?

> (2)  Checkpoint/Restart really only makes sense for a long running, resource 
> intensive job.   (e. g. for a job that is doing a lot of work and hence, for 
> which, recovery is hard -- perhaps as hard as re-running the entire job).

HPC industry is indeed an obvious target.

However, we have successfully checkpinted desktop applications like
openoffice, thunderbird, mozilla, emacs, etc. We are still working on pty
in order to migrate terminals. We think in can also be useful in that area
and others.

> By their very nature, there are probably only a few such jobs running on the 
> system.    If there are lots of such jobs on the system, then re-running each 
> one can't be that hard, can it?

hmm, didn't get your point here ? can you elaborate ?

> So, I guess my question is wrt the task_pid API is the following:   Given that 
> there are a lot of other problems to solve before transparent checkpointing 
> of uncooperative processes is possible, why should this partial solution be 
> accepted into the main line kernel and "set in stone" so to speak?

Well, let's say that we want to present this one step after the other and
try to have each step brings some interesting value to the linux kernel.

Process aggregation is the first big step, other projects have shown
interest in this area, PAGG for instance. Isolation is another. The
virtualization step could be thought as dedicated to checkpoint/restart but
we're pretty sure it should help some projects like vserser that need to
virtualize some ancestor pid. On that subject, having a way to manage
cluster wide pids could be useful to HPC batch managers.

> Don't get me wrong, I would love for there to be a commonly accepted 
> checkpoint/restart API.    But I don't think that this can be done 
> transparently at the kernel level and without some cooperation from the 
> target task.

Well, we've already migrated some pretty ugly applications, database
engines, without modifying them :)

C.

  parent reply	other threads:[~2005-11-15 22:55 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-14 21:23 [RFC] [PATCH 00/13] Introduce task_pid api Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 01/13] Change pid accesses: drivers Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 02/13] Change pid accesses: most archs Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 03/13] Change pid accesses: filesystems Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 04/13] Change pid accesses: include/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 05/13] Change pid accesses: ipc Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 06/13] Change pid accesses: kernel/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 07/13] Change pid accesses: lib/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 08/13] Change pid accesses: mm/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 09/13] Change pid accesses: net/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 10/13] Change pid accesses: security/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 11/13] Change pid accesses: sound/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 12/13] Change pid accesses: ia64 and mips Serge E. Hallyn
2005-11-15 23:08   ` Keith Owens
2005-11-16 11:58     ` Serge E. Hallyn
2005-11-16 13:53     ` Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 13/13] Define new task_pid api Serge E. Hallyn
2005-11-14 23:36 ` [RFC] [PATCH 00/13] Introduce " Paul Jackson
2005-11-15  1:01   ` Serge E. Hallyn
2005-11-15  1:35     ` Paul Jackson
2005-11-15  1:51     ` Paul Jackson
2005-11-15  2:29       ` Serge E. Hallyn
2005-11-15  3:37         ` Paul Jackson
2005-11-15  5:15           ` Serge E. Hallyn
2005-11-15  6:35             ` Paul Jackson
2005-11-15  8:11               ` Serge E. Hallyn
2005-11-15  9:06                 ` Paul Jackson
2005-11-15 10:07                   ` Dave Hansen
2005-11-15 18:10                     ` Paul Jackson
2005-11-15 11:59                   ` Robin Holt
2005-11-15 13:32                   ` Serge E. Hallyn
2005-11-15 14:37                     ` Hubertus Franke
2005-11-15 18:39                       ` Paul Jackson
2005-11-15 18:54                         ` Hubertus Franke
2005-11-15 19:00                   ` Serge E. Hallyn
2005-11-15 19:17                     ` Hubertus Franke
2005-11-15 22:11                     ` Paul Jackson
2005-11-15 23:15                       ` Cedric Le Goater
2005-11-15 23:28                         ` Paul Jackson
2005-11-15 16:47             ` Greg KH
2005-11-15 17:08               ` Serge E. Hallyn
2005-11-15 17:33               ` Dave Hansen
2005-11-15  5:51   ` Serge E. Hallyn
2005-11-13 15:22     ` Pavel Machek
2005-11-16 19:36       ` Kyle Moffett
2005-11-16 20:36         ` Pavel Machek
2005-11-16 20:48           ` Dave Hansen
2005-11-19 23:30             ` Pavel Machek
2005-11-20 22:38               ` Serge E. Hallyn
2005-12-07 14:53                 ` Eric W. Biederman
2005-11-20 23:29               ` Nix
2005-11-16 21:07           ` Paul Jackson
2005-11-16 20:24       ` Dave Hansen
2005-11-15 13:34   ` Serge E. Hallyn
2005-11-15 11:17 ` Robin Holt
2005-11-15 12:01   ` Dave Hansen
2005-11-15 19:21 ` Ray Bryant
2005-11-15 19:41   ` Serge E. Hallyn
2005-11-15 20:30     ` Ray Bryant
2005-11-15 21:05       ` Serge E. Hallyn
2005-11-15 22:43         ` Paul Jackson
2005-11-15 22:55       ` Cedric Le Goater [this message]
2005-11-16  1:12         ` Paul Jackson
2005-12-07 14:46 ` Eric W. Biederman
2005-12-07 17:47   ` Dave Hansen
2005-12-07 17:55     ` Arjan van de Ven
2005-12-07 18:09       ` Dave Hansen
2005-12-07 19:00         ` Arjan van de Ven
2005-12-07 19:42           ` Eric W. Biederman
2005-12-07 22:13           ` Dave Hansen
2005-12-07 22:20             ` Arjan van de Ven
2005-12-12 10:55               ` Dave Airlie
2005-12-19 14:04                 ` Eric W. Biederman
2005-12-07 19:19     ` Eric W. Biederman
2005-12-07 21:40       ` Dave Hansen
2005-12-07 22:17         ` Eric W. Biederman
2004-12-14 15:23           ` Pavel Machek
2005-12-14 13:40             ` Arjan van de Ven
2005-12-14 16:29               ` Serge E. Hallyn
2005-12-07 22:31           ` Dave Hansen
2005-12-07 22:51             ` Eric W. Biederman
2005-12-08  5:42             ` Jeff Dike
2005-12-08 10:09             ` Andi Kleen
2005-12-07 22:17       ` Cedric Le Goater
  -- strict thread matches above, loose matches on Subject: below --
2005-11-16  2:24 Hua Zhong (hzhong)
2005-11-16 17:52 ` Bernard Blackham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=437A6772.6020700@fr.ibm.com \
    --to=clg@fr.ibm.com \
    --cc=frankeh@watson.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=raybry@mpdtxmail.amd.com \
    --cc=serue@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox