* State of linux checkpointing?
@ 2004-04-28 17:15 Neal D. Becker
2004-04-28 20:23 ` Jeff Garzik
0 siblings, 1 reply; 9+ messages in thread
From: Neal D. Becker @ 2004-04-28 17:15 UTC (permalink / raw)
To: linux-kernel
I wonder if there is a checkpointing that will work with 2.6 kernels?
I only need relatively basic checkpointing. No sockets or fancy stuff.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: State of linux checkpointing?
2004-04-28 17:15 State of linux checkpointing? Neal D. Becker
@ 2004-04-28 20:23 ` Jeff Garzik
2004-04-28 23:17 ` Tim Connors
0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2004-04-28 20:23 UTC (permalink / raw)
To: Neal D. Becker; +Cc: linux-kernel
Neal D. Becker wrote:
> I wonder if there is a checkpointing that will work with 2.6 kernels?
>
> I only need relatively basic checkpointing. No sockets or fancy stuff.
You only need checkpointing when your application programmers are lazy
and don't care about data integrity. :)
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: State of linux checkpointing?
2004-04-28 20:23 ` Jeff Garzik
@ 2004-04-28 23:17 ` Tim Connors
2004-04-29 1:24 ` Neal Becker
0 siblings, 1 reply; 9+ messages in thread
From: Tim Connors @ 2004-04-28 23:17 UTC (permalink / raw)
To: linux-kernel
Jeff Garzik <jgarzik@pobox.com> said on Wed, 28 Apr 2004 16:23:00 -0400:
> Neal D. Becker wrote:
> > I wonder if there is a checkpointing that will work with 2.6 kernels?
> >
> > I only need relatively basic checkpointing. No sockets or fancy stuff.
>
> You only need checkpointing when your application programmers are lazy
> and don't care about data integrity. :)
Or you are running some kind of cluster where you want the
applications to be checkpointed transparently without the application
knowing the details of how or when they will be swapped out (but this
will need sockets anyway, so won't happen anytime soon).
'Tis a pain that the alpha cluster here can suspend long running jobs
for a pile of smaller jobs, and then resume, but the linux cluster can
do no such fanciness (yes, we do manual checkpointing, but it's prone
to bugs - and finding such a bug after 30 days of compute time really
sucks balls).
--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
Beware of Programmers who carry screwdrivers.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: State of linux checkpointing?
2004-04-28 23:17 ` Tim Connors
@ 2004-04-29 1:24 ` Neal Becker
2004-04-29 16:31 ` Thomas Davis
0 siblings, 1 reply; 9+ messages in thread
From: Neal Becker @ 2004-04-29 1:24 UTC (permalink / raw)
To: linux-kernel
Tim Connors wrote:
> Jeff Garzik <jgarzik@pobox.com> said on Wed, 28 Apr 2004 16:23:00 -0400:
>> Neal D. Becker wrote:
>> > I wonder if there is a checkpointing that will work with 2.6 kernels?
>> >
>> > I only need relatively basic checkpointing. No sockets or fancy stuff.
>>
>> You only need checkpointing when your application programmers are lazy
>> and don't care about data integrity. :)
>
> Or you are running some kind of cluster where you want the
> applications to be checkpointed transparently without the application
> knowing the details of how or when they will be swapped out (but this
> will need sockets anyway, so won't happen anytime soon).
>
I want checkpointing for:
1) Protect against job interruption due to system crash, operator error,
power loss, whatever
2) Job mygration. Even manual job mygration would be nice.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: State of linux checkpointing?
2004-04-29 1:24 ` Neal Becker
@ 2004-04-29 16:31 ` Thomas Davis
2004-04-29 17:12 ` Tim Connors
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Davis @ 2004-04-29 16:31 UTC (permalink / raw)
To: Neal Becker; +Cc: linux-kernel
Neal Becker wrote:
>
> I want checkpointing for:
>
> 1) Protect against job interruption due to system crash, operator error,
> power loss, whatever
>
> 2) Job mygration. Even manual job mygration would be nice.
>
>
Two possible solutions:
1) http://ftg.lbl.gov/checkpoint
and
2) http://www.meiosys.com
thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: State of linux checkpointing?
2004-04-29 16:31 ` Thomas Davis
@ 2004-04-29 17:12 ` Tim Connors
2004-04-29 19:50 ` Neal D. Becker
0 siblings, 1 reply; 9+ messages in thread
From: Tim Connors @ 2004-04-29 17:12 UTC (permalink / raw)
To: Neal Becker; +Cc: linux-kernel
Thomas Davis <tadavis@lbl.gov> said on Thu, 29 Apr 2004 09:31:14 -0700:
> Neal Becker wrote:
> >
> > I want checkpointing for:
> >
> > 1) Protect against job interruption due to system crash, operator error,
> > power loss, whatever
> >
> > 2) Job mygration. Even manual job mygration would be nice.
>
> Two possible solutions:
>
> 1) http://ftg.lbl.gov/checkpoint
Oooh. Shiny.
That looks relatively new? I haven't come across it before...
--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
Yesterday, after years of trying, I finally managed to take a photo of a
subway train that said "INSTRUCTION CAR" just so that someday I can caption
it "...but where's the DATA CDR?" when I'm ready to make a joke that's
nerdy even by the standards of jokes about LISP. -- James "Kibo" Perry
^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <1Q3nZ-2hv-3@gated-at.bofh.it>]
* Re: State of linux checkpointing?
@ 2004-05-10 16:23 Nur Hussein
0 siblings, 0 replies; 9+ messages in thread
From: Nur Hussein @ 2004-05-10 16:23 UTC (permalink / raw)
To: linux-kernel
Tim Connors wrote:
> Oooh. Shiny.
>
There's also another checkpointing project called epckpt written
by Eduardo Pinheiro, which I've created into a kernel patch, ported
to 2.4.22 and added some other stuff for my Masters project:
http://marauder.googgun.com/~obiwan/kernel
You can download both epckpt and the other stuff at the URL above, if
anyone's interested. Unfortunately though, it's also only for 2.4.x.
-= Nur Hussein =-
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-05-10 16:22 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-28 17:15 State of linux checkpointing? Neal D. Becker
2004-04-28 20:23 ` Jeff Garzik
2004-04-28 23:17 ` Tim Connors
2004-04-29 1:24 ` Neal Becker
2004-04-29 16:31 ` Thomas Davis
2004-04-29 17:12 ` Tim Connors
2004-04-29 19:50 ` Neal D. Becker
[not found] <1Q3nZ-2hv-3@gated-at.bofh.it>
[not found] ` <1Q40J-2Mx-9@gated-at.bofh.it>
[not found] ` <1Q6F8-53t-1@gated-at.bofh.it>
[not found] ` <1Q8xl-6x2-17@gated-at.bofh.it>
2004-04-29 9:42 ` Ihar 'Philips' Filipau
-- strict thread matches above, loose matches on Subject: below --
2004-05-10 16:23 Nur Hussein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox