From: Anton Blanchard <anton@au1.ibm.com>
To: Grant Likely <grant.likely@secretlab.ca>
Cc: Oren Laadan <orenl@cs.columbia.edu>,
ksummit-2010-discuss@lists.linux-foundation.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Christoph Hellwig <hch@lst.de>,
akpm@linux-foundation.org, tj@kernel.org
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Wed, 17 Nov 2010 16:29:22 +1100 [thread overview]
Message-ID: <20101117162922.0f874a8e@kryten> (raw)
In-Reply-To: <AANLkTimOG-iFw-yg8rgNHJOEn49_v=0ZaDu_XK7KRRs1@mail.gmail.com>
Hi Grant,
> This patch has far reaching changes which quite frankly scare me;
> primarily because c/r changes many long-held assumptions about how
> Linux processes work. It needs to track a large amount of state with
> lots of corner cases, and the Linux process model is already quite
> complex. I know this is a fluffy hand-waving critique, but without
> being convinced of a strong general-purpose use-case, it is hard to
> get excited about a solution that touches large amounts of common
> code.
>
> c/r of desktop processes doesn't seem interesting other that as a test
> case, but I can possibly be convinced about HPC, embedded, industrial,
> or telecom use-cases, but for custom/specific-purpose applications the
> question must be asked if a fully user space or joint user/kernel
> method would better solve the problem.
It seems like there are a number of questions around the utility of
C/R so I'd like to take a step back from the technical discussion
around implementation and hopefully convince you, Tejun (and anyone
else interested) that C/R is something we want to solve in Linux.
Here at IBM we are working on the next generation of HPC systems. One
example of this will be the NCSA Bluewaters supercomputer:
http://www.ncsa.illinois.edu/BlueWaters/
The aim is not to build yet another linpack special, but a supercomputer
that achieves more than 1 petaflop sustained on a wide range of
applications. There is also a strong focus on improving the
productivity and reliability of the cluster.
There are two usage scenarios for C/R in this environment:
1. Resource management. Any large HPC cluster should be 100% busy and
as such you will often fill in the gaps with low priority jobs which
may need to be preempted. These low priority jobs need to give up their
resources (memory, interconnect resources etc) whenever something
important comes in.
2. Fault tolerance. Failures are a fact of life for any decent sized
cluster. As the cluster gets larger these failures become very common.
Speaking from an industry perspective, MTBF rates measured in the order
of several hours for large commodity clusters are not surprising. We at
IBM improve on that with hardware and system design, but there is only
so much you can do. The failures also happen at the Linux kernel level
so even if we had 100% reliable systems we would still have issues.
Now this is the pointy end of HPC, but similar issues are happening in
the meat of the HPC market. One area we are seeing a lot of C/R
interest is the EDA space. As ICs become more and more complex the
amount of cluster compute power it takes to route, check, create masks
etc grows so large that system reliability becomes an issue. Some tool
vendors write their own application C/R, but there are a multitude of
in house applications that have no C/R capability today.
You could argue that we should just add C/R capability to every HPC
application and library people care about or rework them to be
fault tolerant in software. Unfortunately I don't see either as being
viable. There are so many applications, libraries and even programming
languages in use for HPC that it would be a losing battle. If we
did go down this route we would also be unable to leverage C/R for
anything else. I can understand the concern around finding a general
purpose case, but I do believe many other solid uses for C/R outside of
HPC will emerge. For example, there was interest from the embedded guys
during the KS discussion and I can easily imagine using C/R to bring up
firefox faster on a TV.
The problems found in HPC often turn into more general problems down
the track. I think back to the heated discussions we had around SMP back
in the early 2000s when we had 32 core POWER4s and SGI had similar sized
machines. Now a 24 core machine fits in 1U and can be purchased for
under $5k. NUMA support, CPU affinity and multi queue scheduling are
other areas that initially had a very small user base but have since
become important features for many users.
Anton
next prev parent reply other threads:[~2010-11-17 5:29 UTC|newest]
Thread overview: 123+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Tejun Heo
2010-11-02 21:47 ` Christoph Hellwig
2010-11-04 1:47 ` Nathan Lynch
2010-11-04 7:36 ` Tejun Heo
2010-11-04 16:04 ` Gene Cooperman
2010-11-04 20:45 ` Nathan Lynch
2010-11-06 6:48 ` Matt Helsley
2010-11-04 4:34 ` Oren Laadan
2010-11-04 14:25 ` Christoph Hellwig
2010-11-04 3:40 ` Kapil Arya
2010-11-04 8:05 ` Tejun Heo
2010-11-04 16:44 ` Gene Cooperman
2010-11-05 9:28 ` Tejun Heo
2010-11-05 23:18 ` Oren Laadan
2010-11-06 10:13 ` Tejun Heo
2010-11-06 0:36 ` Kapil Arya
2010-11-06 22:55 ` Oren Laadan
2010-11-07 19:42 ` Gene Cooperman
2010-11-07 21:30 ` Oren Laadan
2010-11-07 23:05 ` Gene Cooperman
2010-11-08 3:55 ` Oren Laadan
2010-11-08 16:26 ` Gene Cooperman
2010-11-08 18:14 ` Oren Laadan
2010-11-08 18:37 ` Gene Cooperman
2010-11-08 19:34 ` Oren Laadan
2010-11-08 19:05 ` Dan Smith
2010-11-17 11:14 ` Tejun Heo
2010-11-17 15:33 ` Dan Smith
2010-11-17 15:40 ` Tejun Heo
2010-11-17 17:04 ` Alexey Dobriyan
2010-11-17 10:45 ` Tejun Heo
2010-11-17 12:12 ` Tejun Heo
2010-11-06 5:32 ` Matt Helsley
2010-11-06 15:01 ` Oren Laadan
2010-11-06 20:40 ` Gene Cooperman
2010-11-06 22:41 ` Oren Laadan
2010-11-07 18:49 ` Gene Cooperman
[not found] ` <20101107184927.GF31077-Rl5vdzG4YPwx/1z6v04GWfZ8FUJU4vz8@public.gmane.org>
2010-11-07 21:59 ` Oren Laadan
2010-11-07 21:59 ` Oren Laadan
2010-11-17 11:57 ` Tejun Heo
2010-11-17 15:39 ` Serge E. Hallyn
2010-11-17 15:46 ` Tejun Heo
2010-11-18 9:13 ` Pavel Emelyanov
[not found] ` <4CE4EE21.6050305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2010-11-18 9:48 ` Tejun Heo
2010-11-18 9:48 ` Tejun Heo
2010-11-18 20:13 ` Jose R. Santos
2010-11-19 3:54 ` Serge Hallyn
2010-11-18 19:53 ` Oren Laadan
2010-11-19 4:10 ` Serge Hallyn
2010-11-19 14:04 ` Tejun Heo
2010-11-20 18:05 ` Oren Laadan
[not found] ` <4CE683E1.6010500-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2010-11-19 14:36 ` Kirill Korotaev
2010-11-19 14:36 ` Kirill Korotaev
[not found] ` <04F4899E-B5C7-4BAF-8F2F-05D507A91408-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2010-11-19 15:33 ` Tejun Heo
2010-11-19 15:33 ` Tejun Heo
2010-11-19 16:00 ` Alexey Dobriyan
2010-11-19 16:01 ` Alexey Dobriyan
2010-11-19 16:10 ` Tejun Heo
2010-11-19 16:25 ` Alexey Dobriyan
2010-11-19 16:06 ` Tejun Heo
2010-11-19 16:16 ` Alexey Dobriyan
2010-11-19 16:19 ` Tejun Heo
2010-11-19 16:27 ` Alexey Dobriyan
[not found] ` <AANLkTin7kd3crS+fTLLea5PhAii7B3dz=n7p7YtQ6d4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-19 16:32 ` Tejun Heo
2010-11-19 16:32 ` Tejun Heo
2010-11-19 16:38 ` Alexey Dobriyan
2010-11-19 16:50 ` Tejun Heo
2010-11-19 16:50 ` Tejun Heo
2010-11-19 16:55 ` Alexey Dobriyan
2010-11-20 17:58 ` Oren Laadan
2010-11-20 18:08 ` Oren Laadan
2010-11-20 18:08 ` Oren Laadan
2010-11-20 18:11 ` Oren Laadan
2010-11-20 18:11 ` Oren Laadan
[not found] ` <4CE69B8C.6050606-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-11-20 18:15 ` Oren Laadan
2010-11-20 18:15 ` Oren Laadan
2010-11-20 19:33 ` Tejun Heo
2010-11-21 8:18 ` Gene Cooperman
2010-11-21 8:18 ` Gene Cooperman
2010-11-21 8:21 ` Gene Cooperman
2010-11-22 18:02 ` Sukadev Bhattiprolu
2010-11-23 17:53 ` Oren Laadan
2010-11-24 3:50 ` Kapil Arya
2010-11-25 16:04 ` Oren Laadan
2010-11-29 4:09 ` Gene Cooperman
2010-11-21 22:41 ` Grant Likely
2010-11-22 17:34 ` Oren Laadan
2010-11-22 17:18 ` Oren Laadan
2010-11-17 22:17 ` Matt Helsley
[not found] ` <20101117221713.GA27736-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-11-18 10:06 ` Tejun Heo
2010-11-18 10:06 ` Tejun Heo
2010-11-18 20:25 ` Oren Laadan
2010-11-18 20:25 ` Oren Laadan
2010-11-07 21:44 ` Oren Laadan
2010-11-07 23:31 ` Gene Cooperman
2010-11-05 22:24 ` Oren Laadan
2010-11-04 4:03 ` Oren Laadan
2010-11-04 9:43 ` Tejun Heo
2010-11-04 12:48 ` Luck, Tony
2010-11-04 13:06 ` Tejun Heo
2010-11-06 10:12 ` Matt Helsley
2010-11-06 11:03 ` Tejun Heo
2010-11-07 22:59 ` Davide Libenzi
2010-11-08 2:32 ` david
2010-11-18 20:41 ` Oren Laadan
2010-11-05 3:55 ` Kapil Arya
2010-11-05 11:57 ` Luck, Tony
2010-11-05 17:17 ` Gene Cooperman
2010-11-06 1:16 ` Matt Helsley
2010-11-06 4:06 ` Oren Laadan
2010-11-06 5:18 ` Matt Helsley
2010-11-06 21:00 ` Oren Laadan
2010-11-05 17:31 ` Sukadev Bhattiprolu
2010-11-06 21:05 ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01 ` Nathan Lynch
2010-11-11 6:27 ` Nathan Lynch
2010-11-17 5:29 ` Anton Blanchard [this message]
2010-11-17 11:08 ` Tejun Heo
2010-11-18 9:53 ` Alan Cox
2010-11-18 12:27 ` Alexey Dobriyan
2010-11-19 6:33 ` Gene Cooperman
2010-11-21 23:20 ` Grant Likely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101117162922.0f874a8e@kryten \
--to=anton@au1.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=grant.likely@secretlab.ca \
--cc=hch@lst.de \
--cc=ksummit-2010-discuss@lists.linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=orenl@cs.columbia.edu \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.