From: ebiederm@xmission.com (Eric W. Biederman)
To: Larry McVoy <lm@bitmover.com>
Cc: "Martin J. Bligh" <mbligh@aracnet.com>,
William Lee Irwin III <wli@holomorphy.com>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
"Brown, Len" <len.brown@intel.com>,
Giuliano Pochini <pochini@shiny.it>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Scaling noise
Date: 07 Sep 2003 15:18:19 -0600 [thread overview]
Message-ID: <m11xusnvqc.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20030904010653.GD5227@work.bitmover.com>
Larry McVoy <lm@bitmover.com> writes:
> Here's a thought. Maybe the next kernel summit needs to have a CC cluster
> BOF or whatever. I'd be happy to show up, describe what it is that I see
> and have you all try and poke holes in it. If the net result was that you
> walked away with the same picture in your head that I have that would be
> cool. Heck, I'll sponser it and buy beer and food if you like.
Larry CC clusters are an idiotic development target.
The development target should be non coherent clusters.
1) NUMA machines are smaller, more expensive, and less available than
their non cache coherent counter parts.
2) If you can solve the communications problems for a non cache
coherent counter part the solution will also work on a NUMA
machine.
3) People on a NUMA machine can always punt and over share. On a non
cache coherent cluster when people punt they don't share. Not
sharing increases scalability and usually performance.
4) Small start up companies can do non-coherent clusters, and can
scale up. You have to be a substantial company to build a NUMA
machine.
5) NUMA machines are slow. There is not a single NUMA machine in the
top 10 of the top500 supercomputers list. Likely this has more to
do with system sizes supported by the manufacture than inherent
process inferiority, but it makes a difference.
SSI is good and it helps. But that is not the primary management
problem on a large system. The larger you get the imperfection of
your materials tends to be an increasingly dominate factor in
management problems.
For example I routinely reproduce cases where the BIOS does not work
around hardware bugs in a single boot that the motherboard vendors
cannot even reproduce.
Another example is Google who have given up entirely on machines
always working, and has built the software to be robust about error
detection and recovery.
And the SSI solutions are evolving. But the problems are hard.
How do you build a distributed filesystem that scales?
How do you do process migration across machines?
How do you checkpoint a distributed job?
How do you properly build a cluster job scheduler?
How do you handle simultaneous similar actions by a group of nodes?
How do you usefully predict, detect, and isolate hardware failures so
as not to cripple the cluster?
etc.
Eric
next prev parent reply other threads:[~2003-09-07 21:18 UTC|newest]
Thread overview: 152+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-09-03 9:41 Scaling noise Brown, Len
2003-09-03 11:02 ` Geert Uytterhoeven
2003-09-03 11:19 ` Larry McVoy
2003-09-03 11:47 ` Matthias Andree
2003-09-03 18:00 ` William Lee Irwin III
2003-09-03 18:05 ` Larry McVoy
2003-09-03 18:15 ` William Lee Irwin III
2003-09-03 18:15 ` Larry McVoy
2003-09-03 18:26 ` William Lee Irwin III
2003-09-03 18:32 ` Alan Cox
2003-09-03 19:46 ` William Lee Irwin III
2003-09-03 20:13 ` Alan Cox
2003-09-03 20:31 ` William Lee Irwin III
2003-09-03 20:48 ` Martin J. Bligh
2003-09-03 21:21 ` William Lee Irwin III
2003-09-03 21:29 ` Martin J. Bligh
2003-09-03 21:51 ` William Lee Irwin III
2003-09-03 21:46 ` Martin J. Bligh
2003-09-04 0:07 ` Mike Fedyk
2003-09-04 1:06 ` Larry McVoy
2003-09-04 1:10 ` Larry McVoy
2003-09-04 1:32 ` William Lee Irwin III
2003-09-04 1:46 ` David Lang
2003-09-04 1:51 ` William Lee Irwin III
2003-09-04 2:33 ` SSI clusters on NUMA (was Re: Scaling noise) Martin J. Bligh
2003-09-04 3:02 ` David Lang
2003-09-04 4:44 ` Martin J. Bligh
2003-09-04 2:31 ` Scaling noise Martin J. Bligh
2003-09-04 2:40 ` Mike Fedyk
2003-09-04 2:50 ` Martin J. Bligh
2003-09-04 3:49 ` Mike Fedyk
2003-09-04 2:48 ` Steven Cole
2003-09-04 17:05 ` Daniel Phillips
2003-09-07 21:18 ` Eric W. Biederman [this message]
2003-09-07 23:07 ` Larry McVoy
2003-09-07 23:47 ` Eric W. Biederman
2003-09-08 0:57 ` Larry McVoy
2003-09-08 3:55 ` Eric W. Biederman
2003-09-08 4:47 ` Stephen Satchell
2003-09-08 5:25 ` Larry McVoy
2003-09-08 8:32 ` Eric W. Biederman
2003-09-04 0:58 ` Larry McVoy
2003-09-04 1:12 ` William Lee Irwin III
2003-09-04 2:49 ` Larry McVoy
2003-09-04 3:15 ` William Lee Irwin III
2003-09-04 3:38 ` Nick Piggin
2003-09-05 1:34 ` Robert White
2003-09-03 19:11 ` Steven Cole
2003-09-03 19:36 ` William Lee Irwin III
-- strict thread matches above, loose matches on Subject: below --
2003-09-10 15:14 John Bradford
2003-09-10 10:01 John Bradford
2003-09-10 11:35 ` Alan Cox
2003-09-10 13:46 ` Bill Davidsen
2003-09-08 6:21 Brown, Len
2003-09-08 9:21 ` Eric W. Biederman
2003-09-03 17:07 Brown, Len
2003-09-03 17:32 ` Larry McVoy
2003-09-03 18:07 ` William Lee Irwin III
2003-09-03 18:07 ` Larry McVoy
2003-09-03 18:25 ` William Lee Irwin III
2003-09-03 23:47 ` Larry McVoy
2003-09-03 23:52 ` William Lee Irwin III
2003-09-03 23:55 ` Martin J. Bligh
2003-09-03 18:28 ` Valdis.Kletnieks
2003-09-03 18:31 ` Alan Cox
2003-09-03 20:11 ` Diego Calleja García
2003-09-03 18:11 ` Alan Cox
2003-09-03 19:56 ` Daniel Gryniewicz
2003-09-03 18:17 ` Martin J. Bligh
2003-09-04 0:36 ` Larry McVoy
2003-09-04 2:21 ` Martin J. Bligh
2003-09-04 2:34 ` Larry McVoy
2003-09-04 2:48 ` Martin J. Bligh
2003-09-04 3:02 ` Larry McVoy
2003-09-04 3:46 ` Gerrit Huizenga
2003-09-04 4:41 ` Martin J. Bligh
2003-09-10 15:02 ` Timothy Miller
2003-09-10 15:12 ` Larry McVoy
2003-09-28 1:51 ` Paul Jakma
2003-09-28 3:13 ` Steven Cole
2003-09-29 0:47 ` Paul Jakma
2003-10-22 1:22 ` Paul Jakma
2003-10-22 3:46 ` Steven Cole
2003-09-04 3:16 ` David Lang
2003-09-04 3:45 ` William Lee Irwin III
2003-09-04 4:51 ` Martin J. Bligh
2003-09-04 3:47 ` Davide Libenzi
2003-09-04 4:16 ` Larry McVoy
2003-09-04 7:43 ` Davide Libenzi
[not found] <rx83.88x.5@gated-at.bofh.it>
[not found] ` <rxrp.8wt.1@gated-at.bofh.it>
[not found] ` <rxB3.gg.1@gated-at.bofh.it>
[not found] ` <rxB6.gg.5@gated-at.bofh.it>
[not found] ` <rydL.17V.1@gated-at.bofh.it>
[not found] ` <rGXO.5g9.7@gated-at.bofh.it>
2003-09-03 15:33 ` Ihar 'Philips' Filipau
2003-09-03 7:10 John Bradford
2003-09-03 7:38 ` Mike Fedyk
2003-09-03 11:14 ` Larry McVoy
2003-09-08 20:05 ` bill davidsen
2003-09-03 5:02 Samium Gromoff
2003-09-03 4:03 Larry McVoy
2003-09-03 4:12 ` Roland Dreier
2003-09-03 4:20 ` Larry McVoy
2003-09-03 15:12 ` Martin J. Bligh
2003-09-03 4:18 ` Anton Blanchard
2003-09-03 4:29 ` Larry McVoy
2003-09-03 4:33 ` CaT
2003-09-03 5:08 ` Larry McVoy
2003-09-03 5:44 ` Mikael Abrahamsson
2003-09-03 6:12 ` Bernd Eckenfels
2003-09-03 12:09 ` Alan Cox
2003-09-03 15:10 ` Martin J. Bligh
2003-09-03 16:01 ` Jörn Engel
2003-09-03 16:21 ` Martin J. Bligh
2003-09-03 19:41 ` Mike Fedyk
2003-09-03 20:11 ` Martin J. Bligh
2003-09-04 20:36 ` Rik van Riel
2003-09-04 20:47 ` Martin J. Bligh
2003-09-04 21:30 ` William Lee Irwin III
2003-09-03 8:11 ` Giuliano Pochini
2003-09-03 14:25 ` Steven Cole
2003-09-03 12:47 ` Antonio Vargas
2003-09-03 15:31 ` Steven Cole
2003-09-04 1:50 ` Daniel Phillips
2003-09-04 1:52 ` Larry McVoy
2003-09-04 4:42 ` David S. Miller
2003-09-08 19:40 ` bill davidsen
2003-09-04 2:18 ` William Lee Irwin III
2003-09-04 2:19 ` Steven Cole
2003-09-04 2:35 ` William Lee Irwin III
2003-09-04 2:40 ` Steven Cole
2003-09-04 3:20 ` Nick Piggin
2003-09-04 3:07 ` Daniel Phillips
2003-09-08 19:27 ` bill davidsen
2003-09-08 19:12 ` bill davidsen
2003-09-03 16:37 ` Kurt Wall
2003-09-06 15:08 ` Pavel Machek
2003-09-08 13:38 ` Alan Cox
2003-09-09 6:11 ` Rob Landley
2003-09-09 16:07 ` Ricardo Bugalho
2003-09-10 5:14 ` Rob Landley
2003-09-10 5:45 ` David Mosberger
2003-09-10 10:10 ` Ricardo Bugalho
2003-09-03 6:28 ` Anton Blanchard
2003-09-03 6:55 ` Nick Piggin
2003-09-03 15:23 ` Martin J. Bligh
2003-09-03 15:39 ` Larry McVoy
2003-09-03 15:50 ` Martin J. Bligh
2003-09-04 0:49 ` Larry McVoy
2003-09-04 2:21 ` Daniel Phillips
2003-09-04 2:35 ` Martin J. Bligh
2003-09-04 2:46 ` Larry McVoy
2003-09-04 4:58 ` David S. Miller
2003-09-04 4:49 ` David S. Miller
2003-09-08 19:50 ` bill davidsen
2003-09-08 23:39 ` Peter Chubb
2003-09-03 17:16 ` William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m11xusnvqc.fsf@ebiederm.dsl.xmission.com \
--to=ebiederm@xmission.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lm@bitmover.com \
--cc=mbligh@aracnet.com \
--cc=pochini@shiny.it \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox