public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: ksummit-2010-discuss@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Tue, 02 Nov 2010 22:35:21 +0100	[thread overview]
Message-ID: <4CD08419.5050803@kernel.org> (raw)
In-Reply-To: <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>

(cc'ing lkml too)
Hello,

On 11/02/2010 08:30 PM, Oren Laadan wrote:
> Following the discussion yesterday, here is a linux-cr diff that
> that is limited to changes to existing code.
> 
> The diff doesn't include the eclone() patches. I also tried to strip
> off the new c/r code (either code in new files, or new code within
> #ifdef CONFIG_CHECKPOINT in existing files).
> 
> I left a few such snippets in, e.g. c/r syscalls templates and 
> declaration of c/r specific methods in, e.g. file_operations.
> 
> The remaining changes in this patch include new freezer state
> ("CHECKPOINTING"), mostly refactoring of exsiting code, and a bit
> of new helpers.
> 
> Disclaimer: don't try to compile (or apply) - this is only intended
> to give a ballpark of how the c/r patches change existing code.

The patch size itself isn't too big but I still think it's one scary
patch mostly because the breadth of the code checkpointing needs to
modify and I suspect that probably is the biggest concern regarding
checkpoint-restart from implementation point of view.

FWIW, I'm not quite convinced checkpoint-restart can be something
which can be generally useful.  In controlled environments where the
target application behavior can be relatively well defined and
contained (including actions necessary to rollback in case something
goes bonkers), it would work and can be quite useful, but I'm afraid
the states which need to be saved and restored aren't defined well
enough to be generally applicable.  Not only is it a difficult
problem, it actually is impossible to define common set of states to
be saved and restored - it depends on each application.

As such, I have difficult time believing it can be something generally
useful.  IOW, I think talking about its usage in complex environments
like common desktops is mostly handwaving.  What about X sessions,
network connections, states established in other applications via dbus
or whatnot?  Which files need to be snapshotted together?  What about
shared mmaps?  These questions are not difficult to answer in generic
way, they are impossible.

There is a very distinctive difference between system wide
suspend/hibernation and process checkpointing.  Most programs are
already written with the conditions in mind which can be caused by
system level suspend/hibernation.  Most programs don't expect to be
scheduled and run in any definite amount of time.  There usually
are provisions for loss or failure of resources which are out of the
local system.  There are corner cases which are affected and those
programs contain code to respond to suspend/hibernation.  Please note
that this is about userland application behavior but not
implementation detail in the kernel.  It is a much more fundamental
property.

So, although checkpoint-restart can be very useful for certain
circumstances, I don't believe there can be a general implementation.
It inevitably needs to put somewhat strict restrictions on what the
applications being checkpointed are allowed to do.  And after my
train of thought reaches there, I fail to see what the advantages of
in-kernel implementation would be compared to something like the
following.

  http://dmtcp.sourceforge.net/

Sure, in-kernel implementation would be able to fake it better, but I
don't think it's anything major.  The coverage would be slightly
better but breaking the illusion wouldn't take much.  Just push it a
bit further and it will break all the same.  In addition, to be
useful, it would need userland framework or set of workarounds which
are aware of and can manipulate userland states anyway.  For workloads
for which checkpointing would be most beneficial (HPC for example), I
think something like the above would do just fine and it would make
much more sense to add small features to make userland checkpointing
work better than doing the whole thing in the kernel.

I think in-kernel checkpointing is in awkward place in terms of
tradeoff between its benefits and the added complexities to implement
it.  If you give up coverage slightly, userland checkpointing is
there.  If you need reliable coverage, proper virtualization isn't too
far away.  As such, FWIW, I fail to see enough justification for the
added complexity.  I'll be happy to be proven wrong tho.  :-)

Thank you.

-- 
tejun

       reply	other threads:[~2010-11-02 21:32 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` Tejun Heo [this message]
2010-11-02 21:47   ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Christoph Hellwig
2010-11-04  1:47     ` Nathan Lynch
2010-11-04  7:36       ` Tejun Heo
2010-11-04 16:04         ` Gene Cooperman
2010-11-04 20:45         ` Nathan Lynch
2010-11-06  6:48           ` Matt Helsley
2010-11-04  4:34     ` Oren Laadan
2010-11-04 14:25       ` Christoph Hellwig
2010-11-04  3:40   ` Kapil Arya
2010-11-04  8:05     ` Tejun Heo
2010-11-04 16:44       ` Gene Cooperman
2010-11-05  9:28         ` Tejun Heo
2010-11-05 23:18           ` Oren Laadan
2010-11-06 10:13             ` Tejun Heo
2010-11-06  0:36           ` Kapil Arya
2010-11-06 22:55             ` Oren Laadan
2010-11-07 19:42               ` Gene Cooperman
2010-11-07 21:30                 ` Oren Laadan
2010-11-07 23:05                   ` Gene Cooperman
2010-11-08  3:55                     ` Oren Laadan
2010-11-08 16:26                       ` Gene Cooperman
2010-11-08 18:14                         ` Oren Laadan
2010-11-08 18:37                           ` Gene Cooperman
2010-11-08 19:34                             ` Oren Laadan
2010-11-08 19:05                         ` Dan Smith
2010-11-17 11:14                           ` Tejun Heo
2010-11-17 15:33                             ` Dan Smith
2010-11-17 15:40                               ` Tejun Heo
2010-11-17 17:04                                 ` Alexey Dobriyan
2010-11-17 10:45             ` Tejun Heo
2010-11-17 12:12               ` Tejun Heo
2010-11-06  5:32           ` Matt Helsley
2010-11-06 15:01             ` Oren Laadan
2010-11-06 20:40             ` Gene Cooperman
2010-11-06 22:41               ` Oren Laadan
2010-11-07 18:49                 ` Gene Cooperman
2010-11-07 21:59                   ` Oren Laadan
2010-11-17 11:57                     ` Tejun Heo
2010-11-17 15:39                       ` Serge E. Hallyn
2010-11-17 15:46                         ` Tejun Heo
2010-11-18  9:13                           ` Pavel Emelyanov
2010-11-18  9:48                             ` Tejun Heo
2010-11-18 20:13                               ` Jose R. Santos
2010-11-19  3:54                               ` Serge Hallyn
2010-11-18 19:53                           ` Oren Laadan
2010-11-19  4:10                           ` Serge Hallyn
2010-11-19 14:04                             ` Tejun Heo
2010-11-19 14:36                               ` Kirill Korotaev
2010-11-19 15:33                                 ` Tejun Heo
2010-11-19 16:00                                   ` Alexey Dobriyan
2010-11-19 16:01                                     ` Alexey Dobriyan
2010-11-19 16:10                                       ` Tejun Heo
2010-11-19 16:25                                         ` Alexey Dobriyan
2010-11-19 16:06                                     ` Tejun Heo
2010-11-19 16:16                                       ` Alexey Dobriyan
2010-11-19 16:19                                         ` Tejun Heo
2010-11-19 16:27                                           ` Alexey Dobriyan
2010-11-19 16:32                                             ` Tejun Heo
2010-11-19 16:38                                               ` Alexey Dobriyan
2010-11-19 16:50                                                 ` Tejun Heo
2010-11-19 16:55                                                   ` Alexey Dobriyan
2010-11-20 17:58                                   ` Oren Laadan
2010-11-20 18:05                               ` Oren Laadan
2010-11-20 18:08                               ` Oren Laadan
2010-11-20 18:11                               ` Oren Laadan
2010-11-20 18:15                                 ` Oren Laadan
2010-11-20 19:33                                   ` Tejun Heo
2010-11-21  8:18                                     ` Gene Cooperman
2010-11-21  8:21                                       ` Gene Cooperman
2010-11-22 18:02                                         ` Sukadev Bhattiprolu
2010-11-23 17:53                                         ` Oren Laadan
2010-11-24  3:50                                           ` Kapil Arya
2010-11-25 16:04                                             ` Oren Laadan
2010-11-29  4:09                                               ` Gene Cooperman
2010-11-21 22:41                                       ` Grant Likely
2010-11-22 17:34                                       ` Oren Laadan
2010-11-22 17:18                                     ` Oren Laadan
2010-11-17 22:17                       ` Matt Helsley
2010-11-18 10:06                         ` Tejun Heo
2010-11-18 20:25                         ` Oren Laadan
2010-11-07 21:44               ` Oren Laadan
2010-11-07 23:31                 ` Gene Cooperman
2010-11-05 22:24       ` Oren Laadan
2010-11-04  4:03   ` Oren Laadan
2010-11-04  9:43     ` Tejun Heo
2010-11-04 12:48       ` Luck, Tony
2010-11-04 13:06         ` Tejun Heo
2010-11-06 10:12       ` Matt Helsley
2010-11-06 11:03         ` Tejun Heo
2010-11-07 22:59         ` Davide Libenzi
2010-11-08  2:32           ` david
2010-11-18 20:41             ` Oren Laadan
2010-11-05  3:55     ` Kapil Arya
2010-11-05 11:57       ` Luck, Tony
2010-11-05 17:17         ` Gene Cooperman
2010-11-06  1:16           ` Matt Helsley
2010-11-06  4:06             ` Oren Laadan
2010-11-06  5:18               ` Matt Helsley
2010-11-06 21:00           ` Oren Laadan
2010-11-05 17:31       ` Sukadev Bhattiprolu
2010-11-06 21:05       ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01   ` Nathan Lynch
2010-11-11  6:27   ` Nathan Lynch
2010-11-17  5:29   ` Anton Blanchard
2010-11-17 11:08     ` Tejun Heo
2010-11-18  9:53     ` Alan Cox
2010-11-18 12:27       ` Alexey Dobriyan
2010-11-19  6:33     ` Gene Cooperman
2010-11-21 23:20     ` Grant Likely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CD08419.5050803@kernel.org \
    --to=tj@kernel.org \
    --cc=ksummit-2010-discuss@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=orenl@cs.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox