public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch <ntl@pobox.com>
To: Grant Likely <grant.likely@secretlab.ca>
Cc: Oren Laadan <orenl@cs.columbia.edu>,
	ksummit-2010-discuss@lists.linux-foundation.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Thu, 11 Nov 2010 00:27:42 -0600	[thread overview]
Message-ID: <1289456863.4603.94.camel@tp-t61> (raw)
In-Reply-To: <AANLkTimOG-iFw-yg8rgNHJOEn49_v=0ZaDu_XK7KRRs1@mail.gmail.com>

On Mon, 2010-11-08 at 11:55 -0500, Grant Likely wrote:
> On Tue, Nov 2, 2010 at 3:30 PM, Oren Laadan <orenl@cs.columbia.edu> wrote:
> > Hi,
> >
> > Following the discussion yesterday, here is a linux-cr diff that
> > that is limited to changes to existing code.
> >
> > The diff doesn't include the eclone() patches. I also tried to strip
> > off the new c/r code (either code in new files, or new code within
> > #ifdef CONFIG_CHECKPOINT in existing files).
> >
> > I left a few such snippets in, e.g. c/r syscalls templates and
> > declaration of c/r specific methods in, e.g. file_operations.
> >
> > The remaining changes in this patch include new freezer state
> > ("CHECKPOINTING"), mostly refactoring of exsiting code, and a bit
> > of new helpers.
> >
> > Disclaimer: don't try to compile (or apply) - this is only intended
> > to give a ballpark of how the c/r patches change existing code.
> [...]
> >  159 files changed, 2031 insertions(+), 587 deletions(-)
> 
> FWIW...
> 
> This patch has far reaching changes which quite frankly scare me;
> primarily because c/r changes many long-held assumptions about how
> Linux processes work.  It needs to track a large amount of state with
> lots of corner cases, and the Linux process model is already quite
> complex.  I know this is a fluffy hand-waving critique, but without
> being convinced of a strong general-purpose use-case, it is hard to
> get excited about a solution that touches large amounts of common
> code.

For the most part the c/r patch set is "merely" adding code and not
changing the way existing code works -- I'm pretty sure we haven't had
to alter anything hairy like locking or object lifetime rules.  Maybe
I've had my head in this code for too long, but I'm not seeing how
assumptions about the process model are changed significantly.  All the
process-related APIs like fork, clone, exec, wait, and exit all work as
they have before and if you're not actively using C/R you'd never know
the capability is there.

As for the lack of a general-purpose use-case... well, it's not terribly
unusual for Linux to sustain significant changes to satisfy what some
may consider a niche need.  Things like NUMA support, CPU and memory
hotplug - these were not "generally" useful features when they were
introduced.  So I don't think we're trying to break new ground in that
respect.


> c/r of desktop processes doesn't seem interesting other that as a test
> case, but I can possibly be convinced about HPC, embedded, industrial,
> or telecom use-cases, but for custom/specific-purpose applications the
> question must be asked if a fully user space or joint user/kernel
> method would better solve the problem.

This is in fact a joint approach -- the process tree is recreated in
user space at restart (not to mention that the user is responsible for
providing the restarted job a coherent view of the filesystem).

In any case, with HPC, C/R isn't about just fault tolerance necessarily;
it's for load-balancing and migration too.  So the checkpoint operation
needs to be as fast and efficient as possible, and ideally the image
should be readable/writable as a stream e.g. over a socket.  User space
really isn't up to this - for example, a user space implementation
generally cannot know which user pages are safe to omit from the image
(at least not without faulting them all in).

Users who need C/R on Linux today are resorting to LD_PRELOAD hacks and
moribund out-of-tree kernel patches, and I'm afraid they're going to
keep doing that until Linux provides a better alternative built-in.



  parent reply	other threads:[~2010-11-11  6:27 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Tejun Heo
2010-11-02 21:47   ` Christoph Hellwig
2010-11-04  1:47     ` Nathan Lynch
2010-11-04  7:36       ` Tejun Heo
2010-11-04 16:04         ` Gene Cooperman
2010-11-04 20:45         ` Nathan Lynch
2010-11-06  6:48           ` Matt Helsley
2010-11-04  4:34     ` Oren Laadan
2010-11-04 14:25       ` Christoph Hellwig
2010-11-04  3:40   ` Kapil Arya
2010-11-04  8:05     ` Tejun Heo
2010-11-04 16:44       ` Gene Cooperman
2010-11-05  9:28         ` Tejun Heo
2010-11-05 23:18           ` Oren Laadan
2010-11-06 10:13             ` Tejun Heo
2010-11-06  0:36           ` Kapil Arya
2010-11-06 22:55             ` Oren Laadan
2010-11-07 19:42               ` Gene Cooperman
2010-11-07 21:30                 ` Oren Laadan
2010-11-07 23:05                   ` Gene Cooperman
2010-11-08  3:55                     ` Oren Laadan
2010-11-08 16:26                       ` Gene Cooperman
2010-11-08 18:14                         ` Oren Laadan
2010-11-08 18:37                           ` Gene Cooperman
2010-11-08 19:34                             ` Oren Laadan
2010-11-08 19:05                         ` Dan Smith
2010-11-17 11:14                           ` Tejun Heo
2010-11-17 15:33                             ` Dan Smith
2010-11-17 15:40                               ` Tejun Heo
2010-11-17 17:04                                 ` Alexey Dobriyan
2010-11-17 10:45             ` Tejun Heo
2010-11-17 12:12               ` Tejun Heo
2010-11-06  5:32           ` Matt Helsley
2010-11-06 15:01             ` Oren Laadan
2010-11-06 20:40             ` Gene Cooperman
2010-11-06 22:41               ` Oren Laadan
2010-11-07 18:49                 ` Gene Cooperman
2010-11-07 21:59                   ` Oren Laadan
2010-11-17 11:57                     ` Tejun Heo
2010-11-17 15:39                       ` Serge E. Hallyn
2010-11-17 15:46                         ` Tejun Heo
2010-11-18  9:13                           ` Pavel Emelyanov
2010-11-18  9:48                             ` Tejun Heo
2010-11-18 20:13                               ` Jose R. Santos
2010-11-19  3:54                               ` Serge Hallyn
2010-11-18 19:53                           ` Oren Laadan
2010-11-19  4:10                           ` Serge Hallyn
2010-11-19 14:04                             ` Tejun Heo
2010-11-19 14:36                               ` Kirill Korotaev
2010-11-19 15:33                                 ` Tejun Heo
2010-11-19 16:00                                   ` Alexey Dobriyan
2010-11-19 16:01                                     ` Alexey Dobriyan
2010-11-19 16:10                                       ` Tejun Heo
2010-11-19 16:25                                         ` Alexey Dobriyan
2010-11-19 16:06                                     ` Tejun Heo
2010-11-19 16:16                                       ` Alexey Dobriyan
2010-11-19 16:19                                         ` Tejun Heo
2010-11-19 16:27                                           ` Alexey Dobriyan
2010-11-19 16:32                                             ` Tejun Heo
2010-11-19 16:38                                               ` Alexey Dobriyan
2010-11-19 16:50                                                 ` Tejun Heo
2010-11-19 16:55                                                   ` Alexey Dobriyan
2010-11-20 17:58                                   ` Oren Laadan
2010-11-20 18:05                               ` Oren Laadan
2010-11-20 18:08                               ` Oren Laadan
2010-11-20 18:11                               ` Oren Laadan
2010-11-20 18:15                                 ` Oren Laadan
2010-11-20 19:33                                   ` Tejun Heo
2010-11-21  8:18                                     ` Gene Cooperman
2010-11-21  8:21                                       ` Gene Cooperman
2010-11-22 18:02                                         ` Sukadev Bhattiprolu
2010-11-23 17:53                                         ` Oren Laadan
2010-11-24  3:50                                           ` Kapil Arya
2010-11-25 16:04                                             ` Oren Laadan
2010-11-29  4:09                                               ` Gene Cooperman
2010-11-21 22:41                                       ` Grant Likely
2010-11-22 17:34                                       ` Oren Laadan
2010-11-22 17:18                                     ` Oren Laadan
2010-11-17 22:17                       ` Matt Helsley
2010-11-18 10:06                         ` Tejun Heo
2010-11-18 20:25                         ` Oren Laadan
2010-11-07 21:44               ` Oren Laadan
2010-11-07 23:31                 ` Gene Cooperman
2010-11-05 22:24       ` Oren Laadan
2010-11-04  4:03   ` Oren Laadan
2010-11-04  9:43     ` Tejun Heo
2010-11-04 12:48       ` Luck, Tony
2010-11-04 13:06         ` Tejun Heo
2010-11-06 10:12       ` Matt Helsley
2010-11-06 11:03         ` Tejun Heo
2010-11-07 22:59         ` Davide Libenzi
2010-11-08  2:32           ` david
2010-11-18 20:41             ` Oren Laadan
2010-11-05  3:55     ` Kapil Arya
2010-11-05 11:57       ` Luck, Tony
2010-11-05 17:17         ` Gene Cooperman
2010-11-06  1:16           ` Matt Helsley
2010-11-06  4:06             ` Oren Laadan
2010-11-06  5:18               ` Matt Helsley
2010-11-06 21:00           ` Oren Laadan
2010-11-05 17:31       ` Sukadev Bhattiprolu
2010-11-06 21:05       ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01   ` Nathan Lynch
2010-11-11  6:27   ` Nathan Lynch [this message]
2010-11-17  5:29   ` Anton Blanchard
2010-11-17 11:08     ` Tejun Heo
2010-11-18  9:53     ` Alan Cox
2010-11-18 12:27       ` Alexey Dobriyan
2010-11-19  6:33     ` Gene Cooperman
2010-11-21 23:20     ` Grant Likely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1289456863.4603.94.camel@tp-t61 \
    --to=ntl@pobox.com \
    --cc=grant.likely@secretlab.ca \
    --cc=hch@lst.de \
    --cc=ksummit-2010-discuss@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=orenl@cs.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox