All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Sandy Harris <pashley@storm.ca>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: McVoy's Clusters (was Re: latest linus-2.5 BK broken)
Date: 20 Jun 2002 23:16:32 -0600	[thread overview]
Message-ID: <m1znxprz2n.fsf@frodo.biederman.org> (raw)
In-Reply-To: <3D11F7B9.27C74922@storm.ca>

Sandy Harris <pashley@storm.ca> writes:

> [ I removed half a dozen cc's on this, and am just sending to the
>   list. Do people actually want the cc's?]
> 
> Larry McVoy wrote:
> 
> > > Checkpointing buys three things.  The ability to preempt jobs, the
> > > ability to migrate processes,

 
> For large multi-processor systems, it isn't clear that those matter
> much. 

The systems that are built because there is no machine that can
run your compute intensive application fast enough they matter quite
a bit.
 
> What combination of resources and loads do you think preemption
> and migration are need for?

Good answers have already been given.
The problem domain I am looking at are compute clusters.  The
solutions are useful elsewhere but in compute clusters they are
extremely valuable.

> > > and the ability to recover from failed nodes, (assuming the 
> > > failed hardware didn't corrupt your jobs checkpoint).
> 
> That matters, but it isn't entirely clear that it needs to be done
> in the kernel. 

I agree, glibc would be fine, but it must be below the level of
the application.   Generally it is a pretty onerous task to checkpoint 
a random program.  For a proof attempt to checkpoint your X desktop,
the infrastructure is there to do it.  

Every application must be capable of checkpointing it for the cluster
batch scheduler to take advantage of it.

Example case.
[Preemption]
You start job 1, a compute intensive application that runs for 4 days,
on 100 cpus.  Your job is low priority.

In comes job2, a high priority job that runs for 4 hours and needs 256
cpus.

job1 is preempted.  With checkpoint support it can be saved and
restarted later.  Without checkpointing support it is simply killed.

[Migration]
Migration is needed for failing hardware or to get low priority jobs
out of the way onto less capable nodes that are going unused.

Or to restart a job that failed on other hardware.

Eric

  parent reply	other threads:[~2002-06-21  5:26 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-06-18 17:18 latest linus-2.5 BK broken James Simmons
2002-06-18 17:46 ` Robert Love
2002-06-18 18:51   ` Rusty Russell
2002-06-18 18:43     ` Zwane Mwaikambo
2002-06-18 18:56     ` Linus Torvalds
2002-06-18 18:59       ` Robert Love
2002-06-18 20:05       ` Rusty Russell
2002-06-18 20:05         ` Linus Torvalds
2002-06-18 20:31           ` Rusty Russell
2002-06-18 20:41             ` Linus Torvalds
2002-06-18 21:12               ` Benjamin LaHaise
2002-06-18 21:08                 ` Cort Dougan
2002-06-18 21:47                   ` Linus Torvalds
2002-06-19 12:29                     ` Eric W. Biederman
2002-06-19 17:27                       ` Linus Torvalds
2002-06-20  3:57                         ` Eric W. Biederman
2002-06-20  5:24                           ` Larry McVoy
2002-06-20  7:26                             ` Andreas Dilger
2002-06-20 14:54                             ` Eric W. Biederman
2002-06-20 15:41                             ` McVoy's Clusters (was Re: latest linus-2.5 BK broken) Sandy Harris
2002-06-20 17:10                               ` William Lee Irwin III
2002-06-20 20:42                                 ` Timothy D. Witham
2002-06-21  5:16                               ` Eric W. Biederman [this message]
2002-06-22 14:14                               ` Kai Henningsen
2002-06-20 16:30                           ` latest linus-2.5 BK broken Cort Dougan
2002-06-20 17:15                             ` Linus Torvalds
2002-06-21  6:15                               ` Eric W. Biederman
2002-06-21 17:50                                 ` Larry McVoy
2002-06-21 17:55                                   ` Robert Love
2002-06-21 18:09                                   ` Linux, the microkernel (was Re: latest linus-2.5 BK broken) Jeff Garzik
2002-06-21 18:46                                     ` Cort Dougan
2002-06-21 20:25                                       ` Daniel Phillips
2002-06-22  1:07                                         ` Horst von Brand
2002-06-22  1:23                                           ` Larry McVoy
2002-06-22 12:41                                             ` Roman Zippel
2002-06-23 15:15                                             ` Sandy Harris
2002-06-23 17:29                                               ` Jakob Oestergaard
2002-06-24  6:27                                               ` Craig I. Hagan
2002-06-24 13:06                                                 ` J.A. Magallon
2002-06-24 10:59                                               ` Eric W. Biederman
2002-06-21 19:34                                     ` Rob Landley
2002-06-22 15:31                                       ` Alan Cox
2002-06-22 12:24                                         ` Rob Landley
2002-06-22 19:00                                           ` Ruth Ivimey-Cook
2002-06-22 21:09                                         ` jdow
2002-06-23 17:56                                           ` John Alvord
2002-06-23 20:48                                             ` jdow
2002-06-23 21:40                                         ` [OT] " Xavier Bestel
2002-06-22 18:25                                   ` latest linus-2.5 BK broken Eric W. Biederman
2002-06-22 19:26                                     ` Larry McVoy
2002-06-22 22:25                                       ` Eric W. Biederman
2002-06-22 23:10                                         ` Larry McVoy
2002-06-23  6:34                                       ` William Lee Irwin III
2002-06-23 22:56                                       ` Kai Henningsen
2002-06-20 17:16                             ` RW Hawkins
2002-06-20 17:23                               ` Cort Dougan
2002-06-20 20:40                             ` Martin Dalecki
2002-06-20 20:53                               ` Linus Torvalds
2002-06-20 21:27                                 ` Martin Dalecki
2002-06-20 21:37                                   ` Linus Torvalds
2002-06-20 21:59                                     ` Martin Dalecki
2002-06-20 22:18                                       ` Linus Torvalds
2002-06-20 22:41                                         ` Martin Dalecki
2002-06-21  0:09                                           ` Allen Campbell
2002-06-21  7:43                                       ` Zwane Mwaikambo
2002-06-21 21:02                                       ` Rob Landley
2002-06-22  3:57                                         ` (RFC)i386 arch autodetect( was Re: latest linus-2.5 BK broken ) Matthew D. Pitts
2002-06-22  4:54                                           ` William Lee Irwin III
2002-06-21 16:01                                     ` Re: latest linus-2.5 BK broken Sandy Harris
2002-06-21 20:38                                   ` Rob Landley
2002-06-20 21:13                               ` Timothy D. Witham
2002-06-21 19:53                               ` Rob Landley
2002-06-21  5:34                             ` Eric W. Biederman
2002-06-19 10:21                   ` Padraig Brady
2002-06-18 21:45                 ` Bill Huey
2002-06-18 20:55             ` Robert Love
2002-06-19 13:31               ` Rusty Russell
2002-06-18 19:29     ` Benjamin LaHaise
2002-06-18 19:19       ` Zwane Mwaikambo
2002-06-18 19:49         ` Benjamin LaHaise
2002-06-18 19:27           ` Zwane Mwaikambo
2002-06-18 20:13       ` Rusty Russell
2002-06-18 20:21         ` Linus Torvalds
2002-06-18 22:03         ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2002-06-20 17:23 McVoy's Clusters (was Re: latest linus-2.5 BK broken) Jesse Pollard
2002-06-20 17:43 ` Nick LeRoy
2002-06-20 18:32   ` Jesse Pollard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1znxprz2n.fsf@frodo.biederman.org \
    --to=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pashley@storm.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.