From: "Ray Bryant" <raybry@mpdtxmail.amd.com>
To: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: linux-kernel@vger.kernel.org,
"Hubertus Franke" <frankeh@watson.ibm.com>,
"Dave Hansen" <haveblue@us.ibm.com>
Subject: Re: [RFC] [PATCH 00/13] Introduce task_pid api
Date: Tue, 15 Nov 2005 15:30:34 -0500 [thread overview]
Message-ID: <200511151430.35504.raybry@mpdtxmail.amd.com> (raw)
In-Reply-To: <20051115194127.GB17287@sergelap.austin.ibm.com>
On Tuesday 15 November 2005 13:41, Serge E. Hallyn wrote:
> Quoting Ray Bryant (raybry@mpdtxmail.amd.com):
> > On Monday 14 November 2005 15:23, Serge E. Hallyn wrote:
> > > --
> > >
> > > I'm part of a project implementing checkpoint/restart processes.
> > > After a process or group of processes is checkpointed, killed, and
> > > restarted, the changing of pids could confuse them. There are many
> > > other such issues, but we wanted to start with pids.
> >
> > I've read through the rest of this thread, but it seems to me that the
> > real problems are in the basic assumptions you are making that are
> > driving the rest of this effort and perhaps we should be examining those
> > assumptions instead of your patch.
>
> Ok.
>
> > For example, from what I've read (particularly Hubertus's post that the
> > pid could be in a register), I'm inferring that what you want to do is to
> > be able to checkpoint/restart an arbitrary process at an arbitrary time
> > and without any special support for checkpoint/restart in that process.
>
> Yes.
>
> > Also (c. f. Dave Hansen's post on the number of Xen virtual machines
> > required), you appear to think that the number of processes on the
> > system for which checkpoint/restart should be enabled is large (more or
> > less the same as the number of processes on the system).
>
> Right.
>
> > Am I reading this correctly?
>
> As far as I can see, yes.
>
> -serge
Personally, I think that these assumptions are incorrect for a
checkpoint/restart facility. I think that:
(1) It is really only possible to checkpoint/restart a cooperative process.
For this to work with uncooperative processes you have to figure out (for
example) how to save and restore the file system state. (e. g. how do you
get the file position set correctly for an open file in the restored program
instance?) And this doesn't even consider what to do with open network
connections.
Similarly, what does one do about the content of System V shared memory
regions or the contents of System V semaphores? I'm sure there are many
more such problems we can come up with a careful study of the Linux/Unix API.
(Note that "cooperation" in this context can also mean "willing to run inside
of a container of some kind that supports checkpoint/restart".)
So you can probably only checkpoint the process at certain points in its
lifetime, points which the application should be willing to identify in some
way. And I would argue that at such points in time, you can require that
the current register state doesn't include the results of a system call such
as getpid(), couldn't you?
(2) Checkpoint/Restart really only makes sense for a long running, resource
intensive job. (e. g. for a job that is doing a lot of work and hence, for
which, recovery is hard -- perhaps as hard as re-running the entire job).
By their very nature, there are probably only a few such jobs running on the
system. If there are lots of such jobs on the system, then re-running each
one can't be that hard, can it?
So, I guess my question is wrt the task_pid API is the following: Given that
there are a lot of other problems to solve before transparent checkpointing
of uncooperative processes is possible, why should this partial solution be
accepted into the main line kernel and "set in stone" so to speak?
Don't get me wrong, I would love for there to be a commonly accepted
checkpoint/restart API. But I don't think that this can be done
transparently at the kernel level and without some cooperation from the
target task.
--
Ray Bryant
AMD Performance Labs Austin, Tx
512-602-0038 (o) 512-507-7807 (c)
next prev parent reply other threads:[~2005-11-15 20:31 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-11-14 21:23 [RFC] [PATCH 00/13] Introduce task_pid api Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 01/13] Change pid accesses: drivers Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 02/13] Change pid accesses: most archs Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 03/13] Change pid accesses: filesystems Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 04/13] Change pid accesses: include/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 05/13] Change pid accesses: ipc Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 06/13] Change pid accesses: kernel/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 07/13] Change pid accesses: lib/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 08/13] Change pid accesses: mm/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 09/13] Change pid accesses: net/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 10/13] Change pid accesses: security/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 11/13] Change pid accesses: sound/ Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 12/13] Change pid accesses: ia64 and mips Serge E. Hallyn
2005-11-15 23:08 ` Keith Owens
2005-11-16 11:58 ` Serge E. Hallyn
2005-11-16 13:53 ` Serge E. Hallyn
2005-11-14 21:23 ` [RFC] [PATCH 13/13] Define new task_pid api Serge E. Hallyn
2005-11-14 23:36 ` [RFC] [PATCH 00/13] Introduce " Paul Jackson
2005-11-15 1:01 ` Serge E. Hallyn
2005-11-15 1:35 ` Paul Jackson
2005-11-15 1:51 ` Paul Jackson
2005-11-15 2:29 ` Serge E. Hallyn
2005-11-15 3:37 ` Paul Jackson
2005-11-15 5:15 ` Serge E. Hallyn
2005-11-15 6:35 ` Paul Jackson
2005-11-15 8:11 ` Serge E. Hallyn
2005-11-15 9:06 ` Paul Jackson
2005-11-15 10:07 ` Dave Hansen
2005-11-15 18:10 ` Paul Jackson
2005-11-15 11:59 ` Robin Holt
2005-11-15 13:32 ` Serge E. Hallyn
2005-11-15 14:37 ` Hubertus Franke
2005-11-15 18:39 ` Paul Jackson
2005-11-15 18:54 ` Hubertus Franke
2005-11-15 19:00 ` Serge E. Hallyn
2005-11-15 19:17 ` Hubertus Franke
2005-11-15 22:11 ` Paul Jackson
2005-11-15 23:15 ` Cedric Le Goater
2005-11-15 23:28 ` Paul Jackson
2005-11-15 16:47 ` Greg KH
2005-11-15 17:08 ` Serge E. Hallyn
2005-11-15 17:33 ` Dave Hansen
2005-11-15 5:51 ` Serge E. Hallyn
2005-11-13 15:22 ` Pavel Machek
2005-11-16 19:36 ` Kyle Moffett
2005-11-16 20:36 ` Pavel Machek
2005-11-16 20:48 ` Dave Hansen
2005-11-19 23:30 ` Pavel Machek
2005-11-20 22:38 ` Serge E. Hallyn
2005-12-07 14:53 ` Eric W. Biederman
2005-11-20 23:29 ` Nix
2005-11-16 21:07 ` Paul Jackson
2005-11-16 20:24 ` Dave Hansen
2005-11-15 13:34 ` Serge E. Hallyn
2005-11-15 11:17 ` Robin Holt
2005-11-15 12:01 ` Dave Hansen
2005-11-15 19:21 ` Ray Bryant
2005-11-15 19:41 ` Serge E. Hallyn
2005-11-15 20:30 ` Ray Bryant [this message]
2005-11-15 21:05 ` Serge E. Hallyn
2005-11-15 22:43 ` Paul Jackson
2005-11-15 22:55 ` Cedric Le Goater
2005-11-16 1:12 ` Paul Jackson
2005-12-07 14:46 ` Eric W. Biederman
2005-12-07 17:47 ` Dave Hansen
2005-12-07 17:55 ` Arjan van de Ven
2005-12-07 18:09 ` Dave Hansen
2005-12-07 19:00 ` Arjan van de Ven
2005-12-07 19:42 ` Eric W. Biederman
2005-12-07 22:13 ` Dave Hansen
2005-12-07 22:20 ` Arjan van de Ven
2005-12-12 10:55 ` Dave Airlie
2005-12-19 14:04 ` Eric W. Biederman
2005-12-07 19:19 ` Eric W. Biederman
2005-12-07 21:40 ` Dave Hansen
2005-12-07 22:17 ` Eric W. Biederman
2004-12-14 15:23 ` Pavel Machek
2005-12-14 13:40 ` Arjan van de Ven
2005-12-14 16:29 ` Serge E. Hallyn
2005-12-07 22:31 ` Dave Hansen
2005-12-07 22:51 ` Eric W. Biederman
2005-12-08 5:42 ` Jeff Dike
2005-12-08 10:09 ` Andi Kleen
2005-12-07 22:17 ` Cedric Le Goater
-- strict thread matches above, loose matches on Subject: below --
2005-11-16 2:24 Hua Zhong (hzhong)
2005-11-16 17:52 ` Bernard Blackham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200511151430.35504.raybry@mpdtxmail.amd.com \
--to=raybry@mpdtxmail.amd.com \
--cc=frankeh@watson.ibm.com \
--cc=haveblue@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=serue@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox