public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Anton Blanchard <anton@au1.ibm.com>
Cc: Grant Likely <grant.likely@secretlab.ca>,
	Oren Laadan <orenl@cs.columbia.edu>,
	ksummit-2010-discuss@lists.linux-foundation.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	akpm@linux-foundation.org
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Wed, 17 Nov 2010 12:08:37 +0100	[thread overview]
Message-ID: <4CE3B7B5.2020507@kernel.org> (raw)
In-Reply-To: <20101117162922.0f874a8e@kryten>

Hello,

On 11/17/2010 06:29 AM, Anton Blanchard wrote:
> It seems like there are a number of questions around the utility of
> C/R so I'd like to take a step back from the technical discussion
> around implementation and hopefully convince you, Tejun (and anyone
> else interested) that C/R is something we want to solve in Linux.

I'm not arguing CR isn't that useful.  My argument was that it's a
solution for a fairly niche problems and that the implementation isn't
transparent at all for general use cases.

> Here at IBM we are working on the next generation of HPC systems. One
> example of this will be the NCSA Bluewaters supercomputer:

And, yeah, I agree that it is a very useful thing for HPC.

> You could argue that we should just add C/R capability to every HPC
> application and library people care about or rework them to be
> fault tolerant in software. Unfortunately I don't see either as being
> viable. There are so many applications, libraries and even programming
> languages in use for HPC that it would be a losing battle. If we
> did go down this route we would also be unable to leverage C/R for
> anything else. I can understand the concern around finding a general
> purpose case, but I do believe many other solid uses for C/R outside of
> HPC will emerge. For example, there was interest from the embedded guys
> during the KS discussion and I can easily imagine using C/R to bring up
> firefox faster on a TV.

Thanks for pointing out the use cases although for the last one it
would be much wiser to just use webkit.

> The problems found in HPC often turn into more general problems down
> the track. I think back to the heated discussions we had around SMP back
> in the early 2000s when we had 32 core POWER4s and SGI had similar sized
> machines. Now a 24 core machine fits in 1U and can be purchased for
> under $5k. NUMA support, CPU affinity and multi queue scheduling are
> other areas that initially had a very small user base but have since
> become important features for many users.

Sure, the pointy edges can discover general problems of future early.
At the same time, they also encounter problems which no one else would
care about ever, so in itself it isn't much of an argument.  I'm no
analyst and it is very difficult to foretell the future but comparing
CR to SMP and NUMA doesn't seem too valid to me.  In-kernel CR is
sandwiched between userland CR and virtualization.  Its problem space
is shrinking, not expanding.

Having a generally accepted standard CR implementation would certainly
be very nice and I understand that CR would be a much better fit for
HPC than virtualization, but I fail to see why it should be
implemented in kernel when userland implementation which doesn't
extend the kernel in any way already achieves most of what HPC
workload requires.  In this already sizeable thread, the only benefits
presented seem to be the ability to cover some more corner cases and
remote use cases in slightly more transparent manner.  Those are very
weak arguments for something as intrusive and backwards (in that it
dumps kernel states in binary blobs unrestrained by ABI) as in-kernel
CR and, as such, I don't really see the in-kernel CR surviving as a
mainline feature.

So, I think the best recourse would be identifying the specific
features which would help userland CR and improve them.  The in-kernel
CR people have been working on the problem for a long time now and
gotta know which parts are tricky and how to solve them.  In fact, I
don't think the work would be that widely different.  It would be
harder but those changes would benefit other use cases too instead of
only useful for in-kernel CR.

Thanks.

-- 
tejun

  reply	other threads:[~2010-11-17 11:08 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Tejun Heo
2010-11-02 21:47   ` Christoph Hellwig
2010-11-04  1:47     ` Nathan Lynch
2010-11-04  7:36       ` Tejun Heo
2010-11-04 16:04         ` Gene Cooperman
2010-11-04 20:45         ` Nathan Lynch
2010-11-06  6:48           ` Matt Helsley
2010-11-04  4:34     ` Oren Laadan
2010-11-04 14:25       ` Christoph Hellwig
2010-11-04  3:40   ` Kapil Arya
2010-11-04  8:05     ` Tejun Heo
2010-11-04 16:44       ` Gene Cooperman
2010-11-05  9:28         ` Tejun Heo
2010-11-05 23:18           ` Oren Laadan
2010-11-06 10:13             ` Tejun Heo
2010-11-06  0:36           ` Kapil Arya
2010-11-06 22:55             ` Oren Laadan
2010-11-07 19:42               ` Gene Cooperman
2010-11-07 21:30                 ` Oren Laadan
2010-11-07 23:05                   ` Gene Cooperman
2010-11-08  3:55                     ` Oren Laadan
2010-11-08 16:26                       ` Gene Cooperman
2010-11-08 18:14                         ` Oren Laadan
2010-11-08 18:37                           ` Gene Cooperman
2010-11-08 19:34                             ` Oren Laadan
2010-11-08 19:05                         ` Dan Smith
2010-11-17 11:14                           ` Tejun Heo
2010-11-17 15:33                             ` Dan Smith
2010-11-17 15:40                               ` Tejun Heo
2010-11-17 17:04                                 ` Alexey Dobriyan
2010-11-17 10:45             ` Tejun Heo
2010-11-17 12:12               ` Tejun Heo
2010-11-06  5:32           ` Matt Helsley
2010-11-06 15:01             ` Oren Laadan
2010-11-06 20:40             ` Gene Cooperman
2010-11-06 22:41               ` Oren Laadan
2010-11-07 18:49                 ` Gene Cooperman
2010-11-07 21:59                   ` Oren Laadan
2010-11-17 11:57                     ` Tejun Heo
2010-11-17 15:39                       ` Serge E. Hallyn
2010-11-17 15:46                         ` Tejun Heo
2010-11-18  9:13                           ` Pavel Emelyanov
2010-11-18  9:48                             ` Tejun Heo
2010-11-18 20:13                               ` Jose R. Santos
2010-11-19  3:54                               ` Serge Hallyn
2010-11-18 19:53                           ` Oren Laadan
2010-11-19  4:10                           ` Serge Hallyn
2010-11-19 14:04                             ` Tejun Heo
2010-11-19 14:36                               ` Kirill Korotaev
2010-11-19 15:33                                 ` Tejun Heo
2010-11-19 16:00                                   ` Alexey Dobriyan
2010-11-19 16:01                                     ` Alexey Dobriyan
2010-11-19 16:10                                       ` Tejun Heo
2010-11-19 16:25                                         ` Alexey Dobriyan
2010-11-19 16:06                                     ` Tejun Heo
2010-11-19 16:16                                       ` Alexey Dobriyan
2010-11-19 16:19                                         ` Tejun Heo
2010-11-19 16:27                                           ` Alexey Dobriyan
2010-11-19 16:32                                             ` Tejun Heo
2010-11-19 16:38                                               ` Alexey Dobriyan
2010-11-19 16:50                                                 ` Tejun Heo
2010-11-19 16:55                                                   ` Alexey Dobriyan
2010-11-20 17:58                                   ` Oren Laadan
2010-11-20 18:05                               ` Oren Laadan
2010-11-20 18:08                               ` Oren Laadan
2010-11-20 18:11                               ` Oren Laadan
2010-11-20 18:15                                 ` Oren Laadan
2010-11-20 19:33                                   ` Tejun Heo
2010-11-21  8:18                                     ` Gene Cooperman
2010-11-21  8:21                                       ` Gene Cooperman
2010-11-22 18:02                                         ` Sukadev Bhattiprolu
2010-11-23 17:53                                         ` Oren Laadan
2010-11-24  3:50                                           ` Kapil Arya
2010-11-25 16:04                                             ` Oren Laadan
2010-11-29  4:09                                               ` Gene Cooperman
2010-11-21 22:41                                       ` Grant Likely
2010-11-22 17:34                                       ` Oren Laadan
2010-11-22 17:18                                     ` Oren Laadan
2010-11-17 22:17                       ` Matt Helsley
2010-11-18 10:06                         ` Tejun Heo
2010-11-18 20:25                         ` Oren Laadan
2010-11-07 21:44               ` Oren Laadan
2010-11-07 23:31                 ` Gene Cooperman
2010-11-05 22:24       ` Oren Laadan
2010-11-04  4:03   ` Oren Laadan
2010-11-04  9:43     ` Tejun Heo
2010-11-04 12:48       ` Luck, Tony
2010-11-04 13:06         ` Tejun Heo
2010-11-06 10:12       ` Matt Helsley
2010-11-06 11:03         ` Tejun Heo
2010-11-07 22:59         ` Davide Libenzi
2010-11-08  2:32           ` david
2010-11-18 20:41             ` Oren Laadan
2010-11-05  3:55     ` Kapil Arya
2010-11-05 11:57       ` Luck, Tony
2010-11-05 17:17         ` Gene Cooperman
2010-11-06  1:16           ` Matt Helsley
2010-11-06  4:06             ` Oren Laadan
2010-11-06  5:18               ` Matt Helsley
2010-11-06 21:00           ` Oren Laadan
2010-11-05 17:31       ` Sukadev Bhattiprolu
2010-11-06 21:05       ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01   ` Nathan Lynch
2010-11-11  6:27   ` Nathan Lynch
2010-11-17  5:29   ` Anton Blanchard
2010-11-17 11:08     ` Tejun Heo [this message]
2010-11-18  9:53     ` Alan Cox
2010-11-18 12:27       ` Alexey Dobriyan
2010-11-19  6:33     ` Gene Cooperman
2010-11-21 23:20     ` Grant Likely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE3B7B5.2020507@kernel.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=anton@au1.ibm.com \
    --cc=grant.likely@secretlab.ca \
    --cc=hch@lst.de \
    --cc=ksummit-2010-discuss@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=orenl@cs.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox