Re: Switching Kernels without Rebooting?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Switching Kernels without Rebooting?
       [not found] <NOEJJDACGOHCKNCOGFOMOEKECGAA.davids@webmaster.com>
@ 2001-07-10 20:43 ` C. Slater
  2001-07-11  3:50   ` FORT David
  2001-07-11  9:10   ` Helge Hafting
  0 siblings, 2 replies; 56+ messages in thread
From: C. Slater @ 2001-07-10 20:43 UTC (permalink / raw)
  To: linux-kernel

>
> >     - Replace all saved structures
>
> > what if the layout of these changes as it often does?
>
> You would want to convert all structures into a neutral encoding scheme
> that would support transferring structures across versions. BER comes to
> mind, as it provides for an easy way to ignore stuff you don't understand
> and support multiple versions of the same object in a single encoding.
>
> However, this would be a truly massive task. And the big challenge would
be
> what to do when an older kernel doesn't understand something essential. It
> could be simplified significantly by supporting live replacement only of
> kernels of the same version, but this seems to defeat much of the purpose.
>
> DS

I don't think that it would be possible to switch kernels when one was not
properly set up to do it, if thats what you mean. You could only switch
between kernels that have been compiled to support live switching.

I do see you'r point with the datastructures changeing. We would need to use
some format that all properly setup kernels could understand, then we would
only need to write enough to convert the structs to the middle format and
back when they change. I am not familer with BER, but if it is suitable, it
may help.

Are you saying that swaping the kernels out altogether would be a massive
task, or that saveing/restoring the datastructures would be a massive task.

  Colin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-10 20:43 ` Switching Kernels without Rebooting? C. Slater
@ 2001-07-11  3:50   ` FORT David
  2001-07-11  9:10   ` Helge Hafting
  1 sibling, 0 replies; 56+ messages in thread
From: FORT David @ 2001-07-11  3:50 UTC (permalink / raw)
  To: C. Slater; +Cc: linux-kernel

C. Slater wrote:

>
>>>    - Replace all saved structures
>>>
>>>what if the layout of these changes as it often does?
>>>
>>You would want to convert all structures into a neutral encoding scheme
>>that would support transferring structures across versions. BER comes to
>>mind, as it provides for an easy way to ignore stuff you don't understand
>>and support multiple versions of the same object in a single encoding.
>>
>>However, this would be a truly massive task. And the big challenge would
>>
>be
>
>>what to do when an older kernel doesn't understand something essential. It
>>could be simplified significantly by supporting live replacement only of
>>kernels of the same version, but this seems to defeat much of the purpose.
>>
>>DS
>>
>
>I don't think that it would be possible to switch kernels when one was not
>properly set up to do it, if thats what you mean. You could only switch
>between kernels that have been compiled to support live switching.
>
>I do see you'r point with the datastructures changeing. We would need to use
>some format that all properly setup kernels could understand, then we would
>only need to write enough to convert the structs to the middle format and
>back when they change. I am not familer with BER, but if it is suitable, it
>may help.
>
>Are you saying that swaping the kernels out altogether would be a massive
>task, or that saveing/restoring the datastructures would be a massive task.
>
>  Colin
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
I remembered that this thread was longly discussed 1 or 2 years ago on 
linux-future
and came to no conclusive end.

-- 
 HomePage: http://www.enlightened-popo.net  
-- This was sent by Djinn running Linux 2.4.5 -- 




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-10 20:43 ` Switching Kernels without Rebooting? C. Slater
  2001-07-11  3:50   ` FORT David
@ 2001-07-11  9:10   ` Helge Hafting
  2001-07-11 15:41     ` C. Slater
  2001-07-11 22:12     ` Paul Jakma
  1 sibling, 2 replies; 56+ messages in thread
From: Helge Hafting @ 2001-07-11  9:10 UTC (permalink / raw)
  To: C. Slater; +Cc: linux-kernel

"C. Slater" wrote:

> I don't think that it would be possible to switch kernels when one was not
> properly set up to do it, if thats what you mean. You could only switch
> between kernels that have been compiled to support live switching.
> 
Sure.  
> I do see you'r point with the datastructures changeing. We would need to use
> some format that all properly setup kernels could understand, 

That seems completely out of question.  The structures a 2.4.7 
kernel understands might be insufficient to express the setup 
a future 2.6.9 kernel is using to do its stuff better.  (And vice
versa, if future kernels drop a 2.4.7 feature deemed obsolete.
But what if that feature is in use when you decide to upgrade?) 
You can easily deal with simple stuff like struct
rearrangement and type conversions, but what to do when whole data
structures
change completely?  

Example: something changes from two linked lists representation to a
single tree or 4 hashtables.  You'll have a very hard time inventing
a generic data format to deal with that kind of changes.  It might
happen.  Look at differences in 2.2 and 2.4 VM with the big pagecache
change in early 2.3.  And the dentry cache that suddenly appeared.

And of course the rules change too, from time to time.  
Many releases have a list of "active pages".  what kind exactly is that?
The rules may change, what to do if the new kernel don't allow
one particular kind of page on that list, but the old running kernel
have a bunch?

This was jsut some made-up examples, I guess you'll run into a ton
of such issues.  New releases aren't simply fixes and tweaks, there
are frequent design changes.

> Are you saying that swaping the kernels out altogether would be a massive
> task, or that saveing/restoring the datastructures would be a massive task.

All you need to swap kernel images is memory.  Swapping structures
can't be done in a generic way, you'll need code that convert the
structures of one particular kernel release to those of a
particular other kernel.  And I don't think you'll have the usual
kernel developers do that.

A "long-term uptime" distro might do this kind of work for a few
selected kernels, but I cannot imagine it happen for the regular
ones.

Helge Hafting

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11  9:10   ` Helge Hafting
@ 2001-07-11 15:41     ` C. Slater
  2001-07-11 18:11       ` Switching Kernels without Rebooting? [MOSIX] Carlos O'Donell Jr.
  2001-07-12 10:16       ` Switching Kernels without Rebooting? Helge Hafting
  2001-07-11 22:12     ` Paul Jakma
  1 sibling, 2 replies; 56+ messages in thread
From: C. Slater @ 2001-07-11 15:41 UTC (permalink / raw)
  To: linux-kernel

Unless we find some other way to do it, i think we will have to limit this
to only switching between kernels with the same minor version. We probably
would not beable to swap between 2.4 and 2.6 anyways, though it depends on
what changes are made.

  Colin

----- Original Message -----
From: "Helge Hafting" <helgehaf@idb.hist.no>
To: "C. Slater" <cslater@wcnet.org>
Cc: <linux-kernel@vger.kernel.org>
Sent: Wednesday, July 11, 2001 5:10 AM
Subject: Re: Switching Kernels without Rebooting?


> "C. Slater" wrote:
>
> > I don't think that it would be possible to switch kernels when one was
not
> > properly set up to do it, if thats what you mean. You could only switch
> > between kernels that have been compiled to support live switching.
> >
> Sure.
> > I do see you'r point with the datastructures changeing. We would need to
use
> > some format that all properly setup kernels could understand,
>
> That seems completely out of question.  The structures a 2.4.7
> kernel understands might be insufficient to express the setup
> a future 2.6.9 kernel is using to do its stuff better.  (And vice
> versa, if future kernels drop a 2.4.7 feature deemed obsolete.
> But what if that feature is in use when you decide to upgrade?)
> You can easily deal with simple stuff like struct
> rearrangement and type conversions, but what to do when whole data
> structures
> change completely?
>
> Example: something changes from two linked lists representation to a
> single tree or 4 hashtables.  You'll have a very hard time inventing
> a generic data format to deal with that kind of changes.  It might
> happen.  Look at differences in 2.2 and 2.4 VM with the big pagecache
> change in early 2.3.  And the dentry cache that suddenly appeared.
>
> And of course the rules change too, from time to time.
> Many releases have a list of "active pages".  what kind exactly is that?
> The rules may change, what to do if the new kernel don't allow
> one particular kind of page on that list, but the old running kernel
> have a bunch?
>
> This was jsut some made-up examples, I guess you'll run into a ton
> of such issues.  New releases aren't simply fixes and tweaks, there
> are frequent design changes.
>
> > Are you saying that swaping the kernels out altogether would be a
massive
> > task, or that saveing/restoring the datastructures would be a massive
task.
>
> All you need to swap kernel images is memory.  Swapping structures
> can't be done in a generic way, you'll need code that convert the
> structures of one particular kernel release to those of a
> particular other kernel.  And I don't think you'll have the usual
> kernel developers do that.
>
> A "long-term uptime" distro might do this kind of work for a few
> selected kernels, but I cannot imagine it happen for the regular
> ones.
>
> Helge Hafting


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting? [MOSIX]
  2001-07-11 15:41     ` C. Slater
@ 2001-07-11 18:11       ` Carlos O'Donell Jr.
  2001-07-12 10:16       ` Switching Kernels without Rebooting? Helge Hafting
  1 sibling, 0 replies; 56+ messages in thread
From: Carlos O'Donell Jr. @ 2001-07-11 18:11 UTC (permalink / raw)
  To: linux-kernel

On Wed, Jul 11, 2001 at 11:41:54AM -0400, C. Slater wrote:
> Unless we find some other way to do it, i think we will have to limit this
> to only switching between kernels with the same minor version. We probably
> would not beable to swap between 2.4 and 2.6 anyways, though it depends on
> what changes are made.
> 
>   Colin

Just thinking...

If you had enough money, and were inclined enough, one could setup the
following system:

- 2 Boxes, Running MOSIX (similar processors).

a. Start processes on Box 1.
b. Migrate processes to Box 2.

If the need to upgrade the kernel arises, you can migrate the processes
back to Box 1. Upgrade the kenrel on Box 2, recompile MOSIX.
If the first two digits of the MOSIX version are the same, you can migrate
the processes back to Box 2 (now running the latest kernel).

The stubs inplace for your process will run local kernel functions that
are not specifically host dependant, thus taking advantage of the newer
kernel features, and possibly newer hardware on Box 2, at an application
level.

Obviously, Box 1 could be smaller and less expensive.
Take note that if Box 1 were to fail, you process would die, since the
kernel stubs need to be in place on the original machine.

There are many cons to this system, but I will not ruin the decidely
happy mood of this linux-future-istic conversation ;)

Cheers,
Carlos O'Donell Jr.
-------------------------
Baldric Project
http://www.baldric.uwo.ca
-------------------------

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 15:41     ` C. Slater
  2001-07-11 18:11       ` Switching Kernels without Rebooting? [MOSIX] Carlos O'Donell Jr.
@ 2001-07-12 10:16       ` Helge Hafting
  1 sibling, 0 replies; 56+ messages in thread
From: Helge Hafting @ 2001-07-12 10:16 UTC (permalink / raw)
  To: C. Slater, linux-kernel

"C. Slater" wrote:
> 
> Unless we find some other way to do it, i think we will have to limit this
> to only switching between kernels with the same minor version. We probably
> would not beable to swap between 2.4 and 2.6 anyways, though it depends on
> what changes are made.

Minor versions won't help you.  Different minor versions try to stay
interface-compatible with each other.  But data structures not
exposed to interfaces can still be rewritten completely.

Lots of nice ideas and implementations have piled up for 2.5.  Those
who proves immensely successfull in 2.5 may get backported to 2.4
once they get enough testing.  Try reading a few months worth of
kernel patches and you'll see that things change in stable kernels
too.

Helge Hafting

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11  9:10   ` Helge Hafting
  2001-07-11 15:41     ` C. Slater
@ 2001-07-11 22:12     ` Paul Jakma
  2001-07-11 22:14       ` Rik van Riel
                         ` (3 more replies)
  1 sibling, 4 replies; 56+ messages in thread
From: Paul Jakma @ 2001-07-11 22:12 UTC (permalink / raw)
  To: Helge Hafting; +Cc: C. Slater, linux-kernel

On Wed, 11 Jul 2001, Helge Hafting wrote:

> That seems completely out of question.  The structures a 2.4.7
> kernel understands might be insufficient to express the setup
> a future 2.6.9 kernel is using to do its stuff better.

however, it might be handy if say you needed to upgrade a stable
kernel due to a bug fix or security update.

no?

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org
PGP5 key: http://www.clubi.ie/jakma/publickey.txt
-------------------------------------------
Fortune:
I found Rome a city of bricks and left it a city of marble.
		-- Augustus Caesar


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:12     ` Paul Jakma
@ 2001-07-11 22:14       ` Rik van Riel
  2001-07-11 22:36         ` C. Slater
                           ` (2 more replies)
  2001-07-11 22:46       ` Switching Kernels without Rebooting? Kip Macy
                         ` (2 subsequent siblings)
  3 siblings, 3 replies; 56+ messages in thread
From: Rik van Riel @ 2001-07-11 22:14 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Helge Hafting, C. Slater, linux-kernel

On Wed, 11 Jul 2001, Paul Jakma wrote:
> On Wed, 11 Jul 2001, Helge Hafting wrote:
>
> > That seems completely out of question.  The structures a 2.4.7
> > kernel understands might be insufficient to express the setup
> > a future 2.6.9 kernel is using to do its stuff better.
>
> however, it might be handy if say you needed to upgrade a stable
> kernel due to a bug fix or security update.

One thing which always surprises me in this discussion
(it comes up about once a year, it seems) is that
nobody participating in this discussion ever starts
writing any code for it.

Is this a feature which is only wanted by people who
don't want to code, or is this just a signal that the
amount of trouble involved just isn't worth it?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:14       ` Rik van Riel
@ 2001-07-11 22:36         ` C. Slater
  2001-07-11 23:44           ` Andreas Dilger
  2001-07-12 15:32           ` Rik van Riel
  2001-07-11 22:36         ` David Schwartz
  2001-07-12  7:23         ` Kai Henningsen
  2 siblings, 2 replies; 56+ messages in thread
From: C. Slater @ 2001-07-11 22:36 UTC (permalink / raw)
  To: linux-kernel

Does it come up often? Well, I have a sourceforge project setup and am
currently only waiting on finalizing how it's going to be done. So we have
about proved the first possibility wrong, and if you ever hear anything else
about this in a while, we will have proved the second wrong too. Soo, while
we are at it, ill say, that if anyone wants to help with it, email me. We
especialy need people that either have ideas on how to do this or have a
good knowledge of the kernel, mainly memory, processes, and initilization.

  Colin

----- Original Message -----
From: Rik van Riel <riel@conectiva.com.br>
To: Paul Jakma <paul@clubi.ie>
Cc: Helge Hafting <helgehaf@idb.hist.no>; C. Slater <cslater@wcnet.org>;
<linux-kernel@vger.kernel.org>
Sent: Wednesday, July 11, 2001 06:14 PM
Subject: Re: Switching Kernels without Rebooting?


> On Wed, 11 Jul 2001, Paul Jakma wrote:
> > On Wed, 11 Jul 2001, Helge Hafting wrote:
> >
> > > That seems completely out of question.  The structures a 2.4.7
> > > kernel understands might be insufficient to express the setup
> > > a future 2.6.9 kernel is using to do its stuff better.
> >
> > however, it might be handy if say you needed to upgrade a stable
> > kernel due to a bug fix or security update.
>
> One thing which always surprises me in this discussion
> (it comes up about once a year, it seems) is that
> nobody participating in this discussion ever starts
> writing any code for it.
>
> Is this a feature which is only wanted by people who
> don't want to code, or is this just a signal that the
> amount of trouble involved just isn't worth it?
>
> Rik
> --
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
>
> http://www.surriel.com/ http://distro.conectiva.com/
>
> Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:36         ` C. Slater
@ 2001-07-11 23:44           ` Andreas Dilger
  2001-07-12  1:17             ` C. Slater
  2001-07-12 10:12             ` Ralf Baechle
  2001-07-12 15:32           ` Rik van Riel
  1 sibling, 2 replies; 56+ messages in thread
From: Andreas Dilger @ 2001-07-11 23:44 UTC (permalink / raw)
  To: C. Slater; +Cc: linux-kernel

Colin Slater writes:
> Does it come up often? Well, I have a sourceforge project setup and am
> currently only waiting on finalizing how it's going to be done. So we have
> about proved the first possibility wrong, and if you ever hear anything else
> about this in a while, we will have proved the second wrong too. Soo, while
> we are at it, ill say, that if anyone wants to help with it, email me. We
> especialy need people that either have ideas on how to do this or have a
> good knowledge of the kernel, mainly memory, processes, and initilization.

Not to be overly negative, I don't intend this email as an insult, but rather
as a "shed a little light" on the discussion email.  I would be _happy_ if
you actually succeed in your project, but your comments come out as follows:

a) we want this "sounds real good" feature
b) we don't know how we will do it, beyond some hand waving ideas
c) we want kernel experts who know what they are doing to help us
d) kernel experts who have replied so far (negatively) don't know what
   they are talking about, so please butt out
e) you have "started coding" by setting up a sourceforge project

Note that you are talking about a VERY difficult problem, which is
not available on 99.9% of systems out there.  Maybe on a few highly
specialized *nixes which were designed for this (Sequent or such),
and probably have extra hardware support to help along.  I'm _pretty_
sure that Solaris and AIX and HP/UX do NOT do this, and don't you think
they would want to if it were easy?  It would be easier than under
Linux from the perspective that their kernels change far less often,
and have relatively static interfaces.

The best proposal I've heard so far was to use MOSIX to do live job
migration between machines, and then upgrade the kernel like normal.
In the end, it is the jobs that are running on the kernel, and not
the kernel or the individual machine that are the most important.  One
person pointed out that there is a single point of failure in the
MOSIX "stub" machine, which doesn't help you in the end (how do you
update the kernel there?).  If you can figure a way to enhance MOSIX
to allow migrating the MOSIX "stub" processes to another machine, you
will have solved your problem in a much easier way, IMHO.

Note also that you need to look at the _specific_ reason why you want to
do live kernel upgrades, besides it "sounds real good".  If you have such
tight uptime deadlines that you can't take 5 minutes of downtime to boot
a new kernel, then you are probably using a load balancing cluster anyways
in case of hardware failure, so live kernel updates are not needed here.

Note that all real-world high-availability systems I ever worked on
still allowed for SCHEDULED maintenance downtime, but highly frowned
upon UNSCHEDULED downtime.  Even IBM's S/390 99.999% uptime numbers
exclude downtime for SCHEDULED outages, which are simply a fact of life.

Please prove everyone wrong by developing a way to do this, or even
showing a proof-of-concept (i.e. a user-space framework for translating
every kernel data structures from one kernel version to another, that
works across, say, a large fraction of the 2.2 kernel, or maybe from
2.4.0-test until 2.4.current).  It doesn't have to be in-kernel (yet).

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 23:44           ` Andreas Dilger
@ 2001-07-12  1:17             ` C. Slater
  2001-07-12 15:39               ` Rik van Riel
  2001-07-12 10:12             ` Ralf Baechle
  1 sibling, 1 reply; 56+ messages in thread
From: C. Slater @ 2001-07-12  1:17 UTC (permalink / raw)
  To: linux-kernel

I will say that you are incredibly correct. Accualy rather funny.

> Not to be overly negative, I don't intend this email as an insult, but
rather
> as a "shed a little light" on the discussion email.  I would be _happy_ if
> you actually succeed in your project, but your comments come out as
follows:

> a) we want this "sounds real good" feature
 But at least it sounds good.

> b) we don't know how we will do it, beyond some hand waving ideas
We don't. We would like to change that.

> c) we want kernel experts who know what they are doing to help us
  Quite correct

> d) kernel experts who have replied so far (negatively) don't know what
>    they are talking about, so please butt out
We would like any information that they have. I hope they do not.

> e) you have "started coding" by setting up a sourceforge project
That line is hillarious to me. And you are right! I merely intended to show
that we are trying to go somewhere beyond a mailing list thread. To avoid
anything more i will say *trying* agin.

> Note that you are talking about a VERY difficult problem, which is
> not available on 99.9% of systems out there.  Maybe on a few highly
> specialized *nixes which were designed for this (Sequent or such),
> and probably have extra hardware support to help along.  I'm _pretty_
> sure that Solaris and AIX and HP/UX do NOT do this, and don't you think
> they would want to if it were easy?  It would be easier than under
> Linux from the perspective that their kernels change far less often,
> and have relatively static interfaces.
>
> The best proposal I've heard so far was to use MOSIX to do live job
> migration between machines, and then upgrade the kernel like normal.
> In the end, it is the jobs that are running on the kernel, and not
> the kernel or the individual machine that are the most important.  One
> person pointed out that there is a single point of failure in the
> MOSIX "stub" machine, which doesn't help you in the end (how do you
> update the kernel there?).  If you can figure a way to enhance MOSIX
> to allow migrating the MOSIX "stub" processes to another machine, you
> will have solved your problem in a much easier way, IMHO.

Unfortunatly I have not heard this yet. I have not been able to look at the
list
archives to see all of what has been posted there.

> Note also that you need to look at the _specific_ reason why you want to
> do live kernel upgrades, besides it "sounds real good".  If you have such
> tight uptime deadlines that you can't take 5 minutes of downtime to boot
> a new kernel, then you are probably using a load balancing cluster anyways
> in case of hardware failure, so live kernel updates are not needed here.
>
> Note that all real-world high-availability systems I ever worked on
> still allowed for SCHEDULED maintenance downtime, but highly frowned
> upon UNSCHEDULED downtime.  Even IBM's S/390 99.999% uptime numbers
> exclude downtime for SCHEDULED outages, which are simply a fact of life

> Please prove everyone wrong by developing a way to do this, or even
> showing a proof-of-concept (i.e. a user-space framework for translating
> every kernel data structures from one kernel version to another, that
> works across, say, a large fraction of the 2.2 kernel, or maybe from
> 2.4.0-test until 2.4.current).  It doesn't have to be in-kernel (yet).
>
> Cheers, Andreas
> --
> Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
>                  \  would they cancel out, leaving him still hungry?"
> http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

Thanks for you'r insight. Will try.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  1:17             ` C. Slater
@ 2001-07-12 15:39               ` Rik van Riel
  2001-07-12 16:23                 ` Albert D. Cahalan
  0 siblings, 1 reply; 56+ messages in thread
From: Rik van Riel @ 2001-07-12 15:39 UTC (permalink / raw)
  To: C. Slater; +Cc: linux-kernel

On Wed, 11 Jul 2001, C. Slater wrote:

> > a) we want this "sounds real good" feature
>  But at least it sounds good.

And nothing wrong with that. It seems an excelent
opportunity to learn lots about every part of the
kernel.

> > b) we don't know how we will do it, beyond some hand waving ideas
> We don't. We would like to change that.
>
> > c) we want kernel experts who know what they are doing to help us
>   Quite correct

I guess there are two things to do here:

(1) analyse the general idea of what you want to achieve,
    breaking it down in sub-goals which may be achievable

(2) learn about how the kernel works, you may want to go to

	http://kernelnewbies.org/

I won't have time to put in a project as huge and difficult
as upgrading the kernel "live", but I'll be around to try
and teach people about how the kernel works.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 15:39               ` Rik van Riel
@ 2001-07-12 16:23                 ` Albert D. Cahalan
  2001-07-12 17:37                   ` Mike Borrelli
                                     ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Albert D. Cahalan @ 2001-07-12 16:23 UTC (permalink / raw)
  To: Rik van Riel; +Cc: C. Slater, linux-kernel

Rik van Riel writes:

> I won't have time to put in a project as huge and difficult
> as upgrading the kernel "live", but I'll be around to try
> and teach people about how the kernel works.

I think I see a business opportunity here.

Live upgrades require data structure conversion and other horrors.
You can't just write the code and expect it to maintain itself.
You'd need to rewrite half of it every time, for every patch level.

The 24x7 places might be willing to pay somebody to do this.
It's consulting work really. The customer says "I want to go
from 2.4.8 to 2.4.12", you say "OK, $320405 please.", and you
make a custom upgrade procedure for them.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 16:23                 ` Albert D. Cahalan
@ 2001-07-12 17:37                   ` Mike Borrelli
  2001-07-12 18:05                   ` Rik van Riel
  2001-07-12 18:48                   ` Chris Friesen
  2 siblings, 0 replies; 56+ messages in thread
From: Mike Borrelli @ 2001-07-12 17:37 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: Rik van Riel, C. Slater, linux-kernel

How often would a company that demands 24x7 uptime /want/ to upgrade their 
kernel?  It seems to me that when the choice been decided to take that 
kind of a step in a production environment, that someone has done lots of 
tests with the new target kernel, so that even if they don't have the 
extra hardware to bring up another server in parallel, the most downtime 
that would be suffered would be the time it takes to do two boots (boot 
the new kernel, find out it doesn't work, reboot the old one.)

Not to discourage anyone, but is this really necessary, or is it something 
to be worked on just to say that it can be done?

Just a random comment from someone who knows very little.

Regards,
Mike

On Thu Jul 12 12:23:31 2001 Albert D. Cahalan said...
> Rik van Riel writes:
> 
> > I won't have time to put in a project as huge and difficult
> > as upgrading the kernel "live", but I'll be around to try
> > and teach people about how the kernel works.
> 
> I think I see a business opportunity here.
> 
> Live upgrades require data structure conversion and other horrors.
> You can't just write the code and expect it to maintain itself.
> You'd need to rewrite half of it every time, for every patch level.
> 
> The 24x7 places might be willing to pay somebody to do this.
> It's consulting work really. The customer says "I want to go
> from 2.4.8 to 2.4.12", you say "OK, $320405 please.", and you
> make a custom upgrade procedure for them.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 16:23                 ` Albert D. Cahalan
  2001-07-12 17:37                   ` Mike Borrelli
@ 2001-07-12 18:05                   ` Rik van Riel
  2001-07-13 10:07                     ` Pau Aliagas
  2001-07-12 18:48                   ` Chris Friesen
  2 siblings, 1 reply; 56+ messages in thread
From: Rik van Riel @ 2001-07-12 18:05 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: C. Slater, linux-kernel

On Thu, 12 Jul 2001, Albert D. Cahalan wrote:

> I think I see a business opportunity here.

	[snip technically risky idea]

> The 24x7 places might be willing to pay somebody to do this.

Unlikely. They need hardware redundancy anyway, so they'll
just upgrade their cluster node-by-node, without doing
risky and potentially data-corrupting things like live
kernel upgrades.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 18:05                   ` Rik van Riel
@ 2001-07-13 10:07                     ` Pau Aliagas
  0 siblings, 0 replies; 56+ messages in thread
From: Pau Aliagas @ 2001-07-13 10:07 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Albert D. Cahalan, C. Slater, linux-kernel

On Thu, 12 Jul 2001, Rik van Riel wrote:

> On Thu, 12 Jul 2001, Albert D. Cahalan wrote:
>
> > I think I see a business opportunity here.
>
> 	[snip technically risky idea]
>
> > The 24x7 places might be willing to pay somebody to do this.
>
> Unlikely. They need hardware redundancy anyway, so they'll
> just upgrade their cluster node-by-node, without doing
> risky and potentially data-corrupting things like live
> kernel upgrades.

I see business in a different way: instead of ISP or ASP you provide a
backup cluster node where you can migrate your processes before rebooting.
Everything keeps on working, no magic involved.

So we can invent the CNP (Cluster Node Provider)

Pau


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 16:23                 ` Albert D. Cahalan
  2001-07-12 17:37                   ` Mike Borrelli
  2001-07-12 18:05                   ` Rik van Riel
@ 2001-07-12 18:48                   ` Chris Friesen
  2 siblings, 0 replies; 56+ messages in thread
From: Chris Friesen @ 2001-07-12 18:48 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-kernel

"Albert D. Cahalan" wrote:

> The 24x7 places might be willing to pay somebody to do this.
> It's consulting work really. The customer says "I want to go
> from 2.4.8 to 2.4.12", you say "OK, $320405 please.", and you
> make a custom upgrade procedure for them.

Speaking as someone who is working on what will eventually be a five 9's project
based on linux, there is almost zero chance that we would make use of something
like this.  Applications and kernels are tested together and verified together,
and the likelihood of changing either one and not the other one is very low (and
in fact they are shipped together as a single image).

We have hardware redundancy, and upgrades are controlled by the application,
since it knows exactly what state must be transferred and what the differences
are between versions.  After all the state has been transferred we then do an IP
takeover so that the rest of the system knows to talk to the new side.  At this
point we can test the new side for a while.  If we're satisfied with how its
performing, we can then take down the inactive side and upgrade it and then
bring it back into sync with the active side.  If we don't like it, we can
always abort and switch back to the old version.

-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 23:44           ` Andreas Dilger
  2001-07-12  1:17             ` C. Slater
@ 2001-07-12 10:12             ` Ralf Baechle
  1 sibling, 0 replies; 56+ messages in thread
From: Ralf Baechle @ 2001-07-12 10:12 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: C. Slater, linux-kernel

On Wed, Jul 11, 2001 at 05:44:45PM -0600, Andreas Dilger wrote:

> The best proposal I've heard so far was to use MOSIX to do live job
> migration between machines, and then upgrade the kernel like normal.
> In the end, it is the jobs that are running on the kernel, and not
> the kernel or the individual machine that are the most important.  One
> person pointed out that there is a single point of failure in the
> MOSIX "stub" machine, which doesn't help you in the end (how do you
> update the kernel there?).  If you can figure a way to enhance MOSIX
> to allow migrating the MOSIX "stub" processes to another machine, you
> will have solved your problem in a much easier way, IMHO.

Virtual machines a la VM are also nice for this.  Build a HA cluster from
two VMs, then upgrade one after another.  All that's required is HA stuff
as it already is available.

  Ralf

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:36         ` C. Slater
  2001-07-11 23:44           ` Andreas Dilger
@ 2001-07-12 15:32           ` Rik van Riel
  1 sibling, 0 replies; 56+ messages in thread
From: Rik van Riel @ 2001-07-12 15:32 UTC (permalink / raw)
  To: C. Slater; +Cc: linux-kernel

On Wed, 11 Jul 2001, C. Slater wrote:

> Does it come up often? Well, I have a sourceforge project setup and am
> currently only waiting on finalizing how it's going to be done.

I hope you have fun waiting.

If you're really serious about this feature, however,
you may want to start looking into the technical
details behind your wish to get an idea of exactly
how much work it would be to implement this feature.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: Switching Kernels without Rebooting?
  2001-07-11 22:14       ` Rik van Riel
  2001-07-11 22:36         ` C. Slater
@ 2001-07-11 22:36         ` David Schwartz
  2001-07-12  7:23         ` Kai Henningsen
  2 siblings, 0 replies; 56+ messages in thread
From: David Schwartz @ 2001-07-11 22:36 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel


> One thing which always surprises me in this discussion
> (it comes up about once a year, it seems) is that
> nobody participating in this discussion ever starts
> writing any code for it.

> Is this a feature which is only wanted by people who
> don't want to code, or is this just a signal that the
> amount of trouble involved just isn't worth it?

> Rik
> --

	Doesn't it make sense to decide on a feature set and method of
implementation _before_ you begin coding? Or does it make sense to just
start coding something that might never work or do what anybody wants?

	When you decide to implement something, do you usually code before you
decide exactly what it is you're trying to implement and whether anybody
wants it? I certainly don't.

	This isn't a very good example because this a rather bad idea overall. But
if you think it's stupid and will never work, just say that. Kill with legal
blows, especially when you're right.

	DS


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:14       ` Rik van Riel
  2001-07-11 22:36         ` C. Slater
  2001-07-11 22:36         ` David Schwartz
@ 2001-07-12  7:23         ` Kai Henningsen
  2001-07-12 10:05           ` Helge Hafting
                             ` (2 more replies)
  2 siblings, 3 replies; 56+ messages in thread
From: Kai Henningsen @ 2001-07-12  7:23 UTC (permalink / raw)
  To: linux-kernel

riel@conectiva.com.br (Rik van Riel)  wrote on 11.07.01 in <Pine.LNX.4.33L.0107111913010.9899-100000@imladris.rielhome.conectiva>:

> One thing which always surprises me in this discussion
> (it comes up about once a year, it seems) is that
> nobody participating in this discussion ever starts
> writing any code for it.
>
> Is this a feature which is only wanted by people who
> don't want to code, or is this just a signal that the
> amount of trouble involved just isn't worth it?

Maybe it's a sign that the people who *would* be able to contribute have  
all looked at the problem already (surely most people are annoyed how a  
reboot interrupts everything), and have already concluded for themselves  
that it's not possible with reasonable effort ... but there is a steady  
influx of new people who don't understand enough of the problem and have  
to ask.

What I'd *really* like (but don't see how to get there) would be a "save  
system state, shutdown, change kernel and/or hardware, reboot, restore  
state" system (where state is like "I'm logged in on this console, in this  
current directory, and under X I have Netscape running and this page  
displayed" but I don't care about the exact state of Squid or even if my  
ISDN line is dialled in, because those "fix themselves").

I suspect to do this right would need a means of storing per-process state  
controlled by the process (because only that process knows what needs to  
be saved, and what can easily be reconstructed - for example, open file  
descriptors to a place where we store cookies don't need to be saved, just  
routinely reopened), and then every user-visible non-transient program  
needs to implement it - and I don't see *that* happen in the next ten  
years.

But it *does* have the advantage of not needing to save kernel-internal  
state.

MfG Kai

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  7:23         ` Kai Henningsen
@ 2001-07-12 10:05           ` Helge Hafting
  2001-07-13  6:50             ` Kai Henningsen
  2001-07-12 17:58           ` Hua Zhong
  2001-07-12 23:24           ` swsusp again [was Re: Switching Kernels without Rebooting?] Pavel Machek
  2 siblings, 1 reply; 56+ messages in thread
From: Helge Hafting @ 2001-07-12 10:05 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: linux-kernel

Kai Henningsen wrote:

> What I'd *really* like (but don't see how to get there) would be a "save
> system state, shutdown, change kernel and/or hardware, reboot, restore
> state" system (where state is like "I'm logged in on this console, in this
> current directory, and under X I have Netscape running and this page
> displayed" but I don't care about the exact state of Squid or even if my
> ISDN line is dialled in, because those "fix themselves").

Consider os/2 then.  All workplace-shell aware programs is supposed to
save
state in this way.  And yes - they do start up in the same state after
reboot if you want to.  Editors come up on the page you left, filesystem
folders comes up, and so on.  

> and then every user-visible non-transient program
> needs to implement it - and I don't see *that* happen in the next ten
> years.

Consider a patch for konqueror or a few other webpage/fs-view programs
and you'll go a long way - all in userspace.

Helge Hafting

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 10:05           ` Helge Hafting
@ 2001-07-13  6:50             ` Kai Henningsen
  0 siblings, 0 replies; 56+ messages in thread
From: Kai Henningsen @ 2001-07-13  6:50 UTC (permalink / raw)
  To: linux-kernel

helgehaf@idb.hist.no (Helge Hafting)  wrote on 12.07.01 in <3B4D7685.9AC1DED@idb.hist.no>:

> Kai Henningsen wrote:
>
> > What I'd *really* like (but don't see how to get there) would be a "save
> > system state, shutdown, change kernel and/or hardware, reboot, restore
> > state" system (where state is like "I'm logged in on this console, in this
> > current directory, and under X I have Netscape running and this page
> > displayed" but I don't care about the exact state of Squid or even if my
> > ISDN line is dialled in, because those "fix themselves").
>
> Consider os/2 then.  All workplace-shell aware programs is supposed to
> save
> state in this way.

The keyword is "supposed". Because I remember from my OS/2 days that most  
didn't.

OTOH, Borland's DOS IDE does. It's a mixed bag.

>  And yes - they do start up in the same state after
> reboot if you want to.  Editors come up on the page you left, filesystem
> folders comes up, and so on.

Most programs from IBM got it right, most others didn't, as far as I can  
recall.

> > and then every user-visible non-transient program
> > needs to implement it - and I don't see *that* happen in the next ten
> > years.
>
> Consider a patch for konqueror or a few other webpage/fs-view programs
> and you'll go a long way - all in userspace.

Well, Netscape *can* sort of do it (for one window).

But how do I make it happen for bash? login? xdm? Amd so on ... anyway, I  
simply don't have the time for such a project. I'm spread too thin as it  
is.

MfG Kai

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  7:23         ` Kai Henningsen
  2001-07-12 10:05           ` Helge Hafting
@ 2001-07-12 17:58           ` Hua Zhong
  2001-07-12 23:24           ` swsusp again [was Re: Switching Kernels without Rebooting?] Pavel Machek
  2 siblings, 0 replies; 56+ messages in thread
From: Hua Zhong @ 2001-07-12 17:58 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: linux-kernel

-> kaih@khms.westfalen.de (Kai Henningsen)  wrote:
> riel@conectiva.com.br (Rik van Riel)  wrote on 11.07.01 in <Pine.LNX.4.33L.0107111913010.9899-100000@imladris.rielhome.conectiva>: 
> I suspect to do this right would need a means of storing per-process state  
> controlled by the process (because only that process knows what needs to  
> be saved, and what can easily be reconstructed - for example, open file  
> descriptors to a place where we store cookies don't need to be saved, just  
> routinely reopened), and then every user-visible non-transient program  
> needs to implement it - and I don't see *that* happen in the next ten  
> years.

This would be the easiest way to do in the sense that application authors take care of their own stuff, and kernel developpers only need to define rules/interfaces.

One scheme is that we can define a new signal number (e.g., SIGCKPT).  When we send the signal to the process, it checkpoints itself (saves everything it needs for a restart).  Then we define another signal (e.e., SIGRSUM).  When we send the signal to it, it then knows that it should resume from the last checkpointed point.  This is user-level checkpoint/restart, and there are already certain packages available (Condor, libckpt, etc).

If we want total transparency (i.e., applications don't need to be aware and everything is taken care of by the kernel), then the kernel needs substantial changes (I've written a kernel module to do this).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* swsusp again [was Re: Switching Kernels without Rebooting?]
  2001-07-12  7:23         ` Kai Henningsen
  2001-07-12 10:05           ` Helge Hafting
  2001-07-12 17:58           ` Hua Zhong
@ 2001-07-12 23:24           ` Pavel Machek
  2001-07-13 21:08             ` Alan Cox
  2 siblings, 1 reply; 56+ messages in thread
From: Pavel Machek @ 2001-07-12 23:24 UTC (permalink / raw)
  To: Kai Henningsen, linux-kernel

Hi!

> What I'd *really* like (but don't see how to get there) would be a "save  
> system state, shutdown, change kernel and/or hardware, reboot, restore  
> state" system (where state is like "I'm logged in on this console, in this  
> current directory, and under X I have Netscape running and this page  
> displayed" but I don't care about the exact state of Squid or even if my  
> ISDN line is dialled in, because those "fix themselves").

Suspend-to-disk, change hardware, restore-from-disk, load neccessary
modules seems quite easy to do with swsusp. It is very different from
suspend-to-disk, change kernel, restore-from-disk (which is guaranteed
to kill you if kernel changes size).

								Pavel
-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: swsusp again [was Re: Switching Kernels without Rebooting?]
  2001-07-12 23:24           ` swsusp again [was Re: Switching Kernels without Rebooting?] Pavel Machek
@ 2001-07-13 21:08             ` Alan Cox
  0 siblings, 0 replies; 56+ messages in thread
From: Alan Cox @ 2001-07-13 21:08 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Kai Henningsen, linux-kernel

> Suspend-to-disk, change hardware, restore-from-disk, load neccessary
> modules seems quite easy to do with swsusp. It is very different from
> suspend-to-disk, change kernel, restore-from-disk (which is guaranteed
> to kill you if kernel changes size).

It works for most hw changes. I've used swsusp to replace a burned out 3c509
without rebooting 8)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:12     ` Paul Jakma
  2001-07-11 22:14       ` Rik van Riel
@ 2001-07-11 22:46       ` Kip Macy
  2001-07-11 23:02         ` Rik van Riel
  2001-07-12  0:31         ` Jesse Pollard
  2001-07-11 23:36       ` H. Peter Anvin
  2001-07-12  7:23       ` Ville Herva
  3 siblings, 2 replies; 56+ messages in thread
From: Kip Macy @ 2001-07-11 22:46 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Helge Hafting, C. Slater, linux-kernel

In the future when Linux is more heavily used at the enterprise level
there will likely be upgrade/revert modules to allow such a transition to
take place.

			-Kip

On Wed, 11 Jul 2001, Paul Jakma wrote:

> On Wed, 11 Jul 2001, Helge Hafting wrote:
> 
> > That seems completely out of question.  The structures a 2.4.7
> > kernel understands might be insufficient to express the setup
> > a future 2.6.9 kernel is using to do its stuff better.
> 
> however, it might be handy if say you needed to upgrade a stable
> kernel due to a bug fix or security update.
> 
> no?
> 
> regards,
> -- 
> Paul Jakma	paul@clubi.ie	paul@jakma.org
> PGP5 key: http://www.clubi.ie/jakma/publickey.txt
> -------------------------------------------
> Fortune:
> I found Rome a city of bricks and left it a city of marble.
> 		-- Augustus Caesar
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:46       ` Switching Kernels without Rebooting? Kip Macy
@ 2001-07-11 23:02         ` Rik van Riel
  2001-07-12  0:31         ` Jesse Pollard
  1 sibling, 0 replies; 56+ messages in thread
From: Rik van Riel @ 2001-07-11 23:02 UTC (permalink / raw)
  To: Kip Macy; +Cc: Paul Jakma, Helge Hafting, C. Slater, linux-kernel

On Wed, 11 Jul 2001, Kip Macy wrote:

> In the future when Linux is more heavily used at the enterprise level
> there will likely be upgrade/revert modules to allow such a transition
> to take place.

Only if somebody takes the trouble to write them, which
isn't something I see happening in the near future.

Not only would this feature be a LOT of work, it would
(probably) also be very invasive all over the kernel.
OTOH, if the kernel was compiled with -g maybe it'd have
enough info to locate its data structures ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:46       ` Switching Kernels without Rebooting? Kip Macy
  2001-07-11 23:02         ` Rik van Riel
@ 2001-07-12  0:31         ` Jesse Pollard
  2001-07-12  1:10           ` Hua Zhong
  1 sibling, 1 reply; 56+ messages in thread
From: Jesse Pollard @ 2001-07-12  0:31 UTC (permalink / raw)
  To: Kip Macy, Paul Jakma; +Cc: Helge Hafting, C. Slater, linux-kernel

On Wed, 11 Jul 2001, Kip Macy wrote:
>In the future when Linux is more heavily used at the enterprise level
>there will likely be upgrade/revert modules to allow such a transition to
>take place.

I use some of the largest UNIX supercomputers ever built (IBM SP, Cray T3E,
SV1, YMP, XMP, J90, SGI Origin). None of them can start of a new kernel from an
earlier version. There are too many things that will fail:

	Any network activity
	Active disk I/O
	Locked memory
	File modification
	File structures
	Disk structures (yes they change...)
	Clock Synchronization (SMP and cluster)
	Shared memory (SMP and cluster)
	semaphores (SMP and cluster)
	login sessions
	device status
	shared disks and distributed file systems (cluster)
	pipes

Before you even try switching kernels, first implement a process
checkpoint/restart. The process must be resumed after a boot using the same
kernel, with all I/O resumed. Now get it accepted into the kernel.

Anything else is just another name for "reboot using new kernel".

-- 
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: jesse@cats-chateau.net

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  0:31         ` Jesse Pollard
@ 2001-07-12  1:10           ` Hua Zhong
  0 siblings, 0 replies; 56+ messages in thread
From: Hua Zhong @ 2001-07-12  1:10 UTC (permalink / raw)
  To: jesse; +Cc: Kip Macy, Paul Jakma, Helge Hafting, C. Slater, linux-kernel

-> Jesse Pollard <jesse@cats-chateau.net>  wrote:
> Before you even try switching kernels, first implement a process
> checkpoint/restart. The process must be resumed after a boot using the same
> kernel, with all I/O resumed. Now get it accepted into the kernel.
> 
> Anything else is just another name for "reboot using new kernel".

Exactly.  You may want to take a look at http://www.checkpointing.org



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:12     ` Paul Jakma
  2001-07-11 22:14       ` Rik van Riel
  2001-07-11 22:46       ` Switching Kernels without Rebooting? Kip Macy
@ 2001-07-11 23:36       ` H. Peter Anvin
  2001-07-12  7:23       ` Ville Herva
  3 siblings, 0 replies; 56+ messages in thread
From: H. Peter Anvin @ 2001-07-11 23:36 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.LNX.4.33.0107112310590.962-100000@fogarty.jakma.org>
By author:    Paul Jakma <paul@clubi.ie>
In newsgroup: linux.dev.kernel
>
> On Wed, 11 Jul 2001, Helge Hafting wrote:
> 
> > That seems completely out of question.  The structures a 2.4.7
> > kernel understands might be insufficient to express the setup
> > a future 2.6.9 kernel is using to do its stuff better.
> 
> however, it might be handy if say you needed to upgrade a stable
> kernel due to a bug fix or security update.
> 
> no?
> 

No.  You have no guarantee that the state or state mangler won't
propagate the bug into the new kernel, even if it has been fixed.
Since many, if not most, bug fixes or security upgrades are related to
state getting mucked up, this is a very serious thing.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 22:12     ` Paul Jakma
                         ` (2 preceding siblings ...)
  2001-07-11 23:36       ` H. Peter Anvin
@ 2001-07-12  7:23       ` Ville Herva
  3 siblings, 0 replies; 56+ messages in thread
From: Ville Herva @ 2001-07-12  7:23 UTC (permalink / raw)
  To: linux-kernel

On Wed, Jul 11, 2001 at 11:12:12PM +0100, you [Paul Jakma] claimed:
> On Wed, 11 Jul 2001, Helge Hafting wrote:
> 
> > That seems completely out of question.  The structures a 2.4.7
> > kernel understands might be insufficient to express the setup
> > a future 2.6.9 kernel is using to do its stuff better.
> 
> however, it might be handy if say you needed to upgrade a stable
> kernel due to a bug fix or security update.
> 
> no?

<clueless>
In that case you might get a way with a simpler approach. Perhaps you could
just replace the changed function(s) with new ones and scan the kernel for
calls to them. Each call should then be changed to point to the new
function. This might work provided the function interfaces don't change
(which might just be true for simple maintenance bug fixes and security
fixes.) It might even be useful for kernel development.

Of course this takes complex locking and the details are propably very
thorny.

I'm not sure if this is possible, IANAKH. But AFAIK this is roughly what
MSVC6.0 edit and continue does for userspace programs. 
</clueless>

-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
@ 2001-07-13  1:11 tas
  2001-07-13  3:45 ` Ian Stirling
  0 siblings, 1 reply; 56+ messages in thread
From: tas @ 2001-07-13  1:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ian Stirling

 > I've just suspended to disk after the list line, pulled the power 
supplies,
 > taken the RAM chip out, shorted the pins to make really sure, then 
powered
 > back up.

FYI: Taking the memory module out and shorting its pins together is a 
great way to unnecessarily risk zapping your RAM with ESD, and a 
terrible way to ensure that its contents are erased.  When the DRAM is 
not being accessed (by definition true when you remove power), the gate 
capacitors that form the DRAM array are floating unconnected and cannot 
be intentionally discharged.  You just have to wait for good old leakage 
to kill the bits.  A minute should be more than enough.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-13  1:11 tas
@ 2001-07-13  3:45 ` Ian Stirling
  0 siblings, 0 replies; 56+ messages in thread
From: Ian Stirling @ 2001-07-13  3:45 UTC (permalink / raw)
  To: linux-kernel

> 
>  > I've just suspended to disk after the list line, pulled the power 
> supplies,
>  > taken the RAM chip out, shorted the pins to make really sure, then 
> powered
>  > back up.
> 
> FYI: Taking the memory module out and shorting its pins together is a 
> great way to unnecessarily risk zapping your RAM with ESD, and a 
> terrible way to ensure that its contents are erased.  When the DRAM is 
> not being accessed (by definition true when you remove power), the gate 
> capacitors that form the DRAM array are floating unconnected and cannot 
> be intentionally discharged.  You just have to wait for good old leakage 
> to kill the bits.  A minute should be more than enough.

I know, I observed antistatic precautions, and did wait a couple of minutes
(while making a coffe).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
@ 2001-07-12 15:32 Jesse Pollard
  0 siblings, 0 replies; 56+ messages in thread
From: Jesse Pollard @ 2001-07-12 15:32 UTC (permalink / raw)
  To: ralf, Jesse Pollard; +Cc: Andreas Dilger, C. Slater, linux-kernel

---------  Received message begins Here  ---------

> 
> On Thu, Jul 12, 2001 at 07:23:06AM -0500, Jesse Pollard wrote:
> 
> > That isn't even the same problem.
> 
> Sure - the original problem is hard to solve so I suggest to cheat a bit :)
> 
> > First, processes do not survive the upgrade.
> 
> You care about services to continue or only want an entry for an uptime
> contest?

Yes to the first, no to the second.

Processes need to continue if it takes days to arrive at a solution. If
the system DOES need to go down, then the process needs to be checkpointed.
After the outage, the process is resumed.

This is NOT easy. The last system that did it reliably (in the systems
I work with) is UNICOS 7. It did not try to save processes that had open
network connections (even NFS) or pipes. Between  UNICOS 7-10, it was
attempted to include pipes and sockets, provided both ends of the communication
were controlled by the same host (socket to a local daemon, both processes
in the pipe within the same batch job). This didn't work (well, partly worked:
pipes seem to work, but sockets didn't). During this time more and more
processes failed on restart, unless they were contrained to only single
process events. Cluster systems - no chance. It seems impossible to force
a synchronous checkpoint across a cluster (well - theoretically possible).

The problem was that it may take 10-20 minutes to checkpoint a single process.
During that time the corresponding process on another node approaches the
checkpoint location, and fails due to a network timeout. Distributed batch
job dies.

I've seen some processes (single process now) take over a half an hour
to checkpoint (120 MWword (64bit words) = 960 MB being written to disk.
First it has to stop the process syncronously with all file activity (might
take 5 minutes for all buffers to complete). Then the kernel saves the active
process memory (the 960MB - 5-10 minutes), then all outstanding I/O buffers and status 
structures (scatter/gather, reformat, write - might take another 5 minutes)
During the entire time, the system would be doing other I/O for other processes
not being checkpointed (daemons, interactive logins, etc). When the process
reached 4-8GB in size, stopping a batch stream could take over an hour.

During the outage, drivers could be updated, scheduling parameters altered,
hardware fixes like raid disk replacements or cpu, just low level activity.
Anything that affected the file structure (ie changing dates, relocated
files, renamed files...) would cause the checkpointed process to fail to
restart.

The restart procedure had to allocate memory for I/O buffers (cache buffers),
reload them, reload the process private structures, verify that files remained
consistant with parameters in the private structures, reset file pointer
locations for any open files, reload pipe buffers. Then repeat for the
process at the other end of the pipe. After all pipes and processes are
reloaded (without any consistency errors) all processes involved would
be entered in the run queue

The architecture of the Cray YMP systems simplified a LOT of the activity.
1. The hardware did NOT support paging..
2. All data structures were contigeous in memory (excluding only the cache
   buffers for pipes, and disk.
3. All data structures contained only offset location (relative to the
   physical address of the process private data structure). The process
   memory ALWAYS followed the process private data structure.
4. Buffer cache pointers were independant of the user process, only the
   queue identifiers were needed in the process private space, not pointers
   to the queue.

Note: a process that was swapped out was really swapped out (all memory). It
looked like (from the documentation) it was a slightly simplified form of
a checkpoint file.

None of this applies to other Cray hardware (T3, SV1). The SV1 is most
similar to the YMP line, but because of the more "cluster" operations
I'm less familar with how the checkpoint/restart works across the SV1.

The uptime contest is still lost because the system DID go down.

Process checkpoint/restart has been advertised for SGI IRIX systems,
but I've not seen it (first release didn't work if files were open).

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
@ 2001-07-12 12:23 Jesse Pollard
  2001-07-12 14:55 ` Ralf Baechle
  0 siblings, 1 reply; 56+ messages in thread
From: Jesse Pollard @ 2001-07-12 12:23 UTC (permalink / raw)
  To: ralf, Andreas Dilger; +Cc: C. Slater, linux-kernel

Ralf Baechle <ralf@uni-koblenz.de>:
> On Wed, Jul 11, 2001 at 05:44:45PM -0600, Andreas Dilger wrote:
> 
> > The best proposal I've heard so far was to use MOSIX to do live job
> > migration between machines, and then upgrade the kernel like normal.
> > In the end, it is the jobs that are running on the kernel, and not
> > the kernel or the individual machine that are the most important.  One
> > person pointed out that there is a single point of failure in the
> > MOSIX "stub" machine, which doesn't help you in the end (how do you
> > update the kernel there?).  If you can figure a way to enhance MOSIX
> > to allow migrating the MOSIX "stub" processes to another machine, you
> > will have solved your problem in a much easier way, IMHO.
> 
> Virtual machines a la VM are also nice for this.  Build a HA cluster from
> two VMs, then upgrade one after another.  All that's required is HA stuff
> as it already is available.

That isn't even the same problem.
First, processes do not survive the upgrade.
Second, the upgrade must still be compatable with the host OS.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 12:23 Jesse Pollard
@ 2001-07-12 14:55 ` Ralf Baechle
  0 siblings, 0 replies; 56+ messages in thread
From: Ralf Baechle @ 2001-07-12 14:55 UTC (permalink / raw)
  To: Jesse Pollard; +Cc: Andreas Dilger, C. Slater, linux-kernel

On Thu, Jul 12, 2001 at 07:23:06AM -0500, Jesse Pollard wrote:

> That isn't even the same problem.

Sure - the original problem is hard to solve so I suggest to cheat a bit :)

> First, processes do not survive the upgrade.

You care about services to continue or only want an entry for an uptime
contest?

  Ralf

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
@ 2001-07-12  4:48 Frank Davis
  2001-07-12  5:08 ` John Alvord
  0 siblings, 1 reply; 56+ messages in thread
From: Frank Davis @ 2001-07-12  4:48 UTC (permalink / raw)
  To: linux-kernel

Hello all,
  I believe that if such a project is to be undertaken, it first
needs to be designed, then coded. I agree that is a difficult problem...As
for its feasiblity, I'm unsure. Maybe the reason this topic comes up
here from time to time is because it hasn't been shown to be a bad
idea. It might be be, but if we don't start somewhere, then we'll never
really know, and the debate will continue. Just my .02 cents.
Regards,
-Frank 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  4:48 Frank Davis
@ 2001-07-12  5:08 ` John Alvord
  2001-07-13  9:10   ` Chuck Hemker
  0 siblings, 1 reply; 56+ messages in thread
From: John Alvord @ 2001-07-12  5:08 UTC (permalink / raw)
  To: linux-kernel

On Thu, 12 Jul 2001 00:48:15 -0400 (EDT), Frank Davis
<fdavis@andrew.cmu.edu> wrote:

>Hello all,
>  I believe that if such a project is to be undertaken, it first
>needs to be designed, then coded. I agree that is a difficult problem...As
>for its feasiblity, I'm unsure. Maybe the reason this topic comes up
>here from time to time is because it hasn't been shown to be a bad
>idea. It might be be, but if we don't start somewhere, then we'll never
>really know, and the debate will continue. Just my .02 cents.
>Regards,

This topic comes up once a twice a year.

Usually this topic comes to a grinding halt when someone points out
that drivers can be created modular. They can be loaded and unloaded
without rebooting Linux. One project used that technique to
load/unload different schedulers. While this satisfies only part of
the need, it is usually enough to satisfy the tinker-er.

A more recent development is UML - User Mode Linux - where you can run
a nearly complete Linux image in user mode. That way you can fiddle
with file systems to your hearts content without rebooting the main
system. I suspect that will satisfy others.

john alvord

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  5:08 ` John Alvord
@ 2001-07-13  9:10   ` Chuck Hemker
  0 siblings, 0 replies; 56+ messages in thread
From: Chuck Hemker @ 2001-07-13  9:10 UTC (permalink / raw)
  To: linux-kernel

On 12-Jul-01 John Alvord wrote:
> On Thu, 12 Jul 2001 00:48:15 -0400 (EDT), Frank Davis
> <fdavis@andrew.cmu.edu> wrote:
> 
>>Hello all,
>>  I believe that if such a project is to be undertaken, it first
>>needs to be designed, then coded. I agree that is a difficult problem...As
>>for its feasiblity, I'm unsure. Maybe the reason this topic comes up
>>here from time to time is because it hasn't been shown to be a bad
>>idea. It might be be, but if we don't start somewhere, then we'll never
>>really know, and the debate will continue. Just my .02 cents.
>>Regards,
> 
> This topic comes up once a twice a year.
> 
> Usually this topic comes to a grinding halt when someone points out
> that drivers can be created modular. They can be loaded and unloaded
> without rebooting Linux. One project used that technique to
> load/unload different schedulers. While this satisfies only part of
> the need, it is usually enough to satisfy the tinker-er.

One problem with this is many of the modules may be difficult to replace
because they are in use.

If someone did want to spend time on a project like this, one place they could
start would be to try to make some of the modules hot replaceable.

As an example that pops to mind would be a scsi driver:

1. Tell the kernel to stop sending it commands.
2. wait for things in progress to complete.
3. save whatever state you need to.
4. remove old.
5. start up new.
6. start restoring state.
6. reset scsi bus.
7. reprobe for devices?
8. finish restore state.
9. tell the kernel we are available.

This example was chosen not because I think the scsi drivers are buggy. :)
It was chosen type of module that someone might want to replace, but couldn't
because it was in use (a file system mounted on it).  
Maybe a network card would be easier to start with, with similar requirements.  
Then you could hope all the patches will be for modules. :)

I also haven't looked at the code to see if it was possible. :)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: Switching Kernels without Rebooting?
@ 2001-07-12  1:03 Torrey Hoffman
  2001-07-12  1:24 ` C. Slater
  2001-07-12 20:47 ` Wilfried Weissmann
  0 siblings, 2 replies; 56+ messages in thread
From: Torrey Hoffman @ 2001-07-12  1:03 UTC (permalink / raw)
  To: 'jesse@cats-chateau.net', Kip Macy, Paul Jakma
  Cc: Helge Hafting, C. Slater, linux-kernel

Jesse Pollard wrote:

[why switching kernels is very hard, and...]

> Before you even try switching kernels, first implement a process
> checkpoint/restart. The process must be resumed after a boot 
> using the same
> kernel, with all I/O resumed. Now get it accepted into the kernel.

Hear, hear!  That would be a useful feature, maybe not network servers, 
but for pure number crunching apps it would save people having to write 
all the state saving and recovery that is needed now for long term 
computations.

For bonus points, make it work for clusters to synchronously save and
restore state for the apps running on all the nodes at once...

Torrey

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  1:03 Torrey Hoffman
@ 2001-07-12  1:24 ` C. Slater
  2001-07-12 10:07   ` Jesse Pollard
  2001-07-12 23:17   ` Pavel Machek
  2001-07-12 20:47 ` Wilfried Weissmann
  1 sibling, 2 replies; 56+ messages in thread
From: C. Slater @ 2001-07-12  1:24 UTC (permalink / raw)
  To: linux-kernel

Would anyone else like to point out some other task somewhat related 
and have me do it? :-)

> > Before you even try switching kernels, first implement a process
> > checkpoint/restart. The process must be resumed after a boot
> > using the same
> > kernel, with all I/O resumed. Now get it accepted into the kernel.
> 
> Hear, hear!  That would be a useful feature, maybe not network servers, 
> but for pure number crunching apps it would save people having to write 
> all the state saving and recovery that is needed now for long term 
> computations.

Get a computer with hibernation support. That's just about what it is.

> 
> For bonus points, make it work for clusters to synchronously save and
> restore state for the apps running on all the nodes at once...

Bash script.

> 
> Torrey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  1:24 ` C. Slater
@ 2001-07-12 10:07   ` Jesse Pollard
  2001-07-12 12:11     ` Ian Stirling
  2001-07-12 23:17   ` Pavel Machek
  1 sibling, 1 reply; 56+ messages in thread
From: Jesse Pollard @ 2001-07-12 10:07 UTC (permalink / raw)
  To: C. Slater, linux-kernel

On Wed, 11 Jul 2001, C. Slater wrote:
>Would anyone else like to point out some other task somewhat related 
>and have me do it? :-)
>
>> > Before you even try switching kernels, first implement a process
>> > checkpoint/restart. The process must be resumed after a boot
>> > using the same
>> > kernel, with all I/O resumed. Now get it accepted into the kernel.
>> 
>> Hear, hear!  That would be a useful feature, maybe not network servers, 
>> but for pure number crunching apps it would save people having to write 
>> all the state saving and recovery that is needed now for long term 
>> computations.
>
>Get a computer with hibernation support. That's just about what it is.

Bzzzt wrong anser. Hibernation stops the entire kernel. checkpoint restart
stops processes, saves the entire state of the process. hibernation
is just halt the processor.

>> 
>> For bonus points, make it work for clusters to synchronously save and
>> restore state for the apps running on all the nodes at once...
>
>Bash script.

doesn't work - remember once the kernel is suspended it can't tell
another system that is has done so.

A full checkpoint/restart can potentially allow a process to migrate
from one node to another. It also allows other processing to be done
while the process is checkpointed:

	a. how do you reconstruct a software raid 5 while the system
	   is "suspended"
	b. how do you migrate to a different platform if the system is
	   suspended

Answer - you can't.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: jesse@cats-chateau.net

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 10:07   ` Jesse Pollard
@ 2001-07-12 12:11     ` Ian Stirling
  2001-07-12 12:54       ` Jesse Pollard
  0 siblings, 1 reply; 56+ messages in thread
From: Ian Stirling @ 2001-07-12 12:11 UTC (permalink / raw)
  To: linux-kernel

> 
> On Wed, 11 Jul 2001, C. Slater wrote:
> >Would anyone else like to point out some other task somewhat related 
> >and have me do it? :-)
> >
> >> > Before you even try switching kernels, first implement a process
> >> > checkpoint/restart. The process must be resumed after a boot
> >> > using the same
> >> > kernel, with all I/O resumed. Now get it accepted into the kernel.
> >> 
> >> Hear, hear!  That would be a useful feature, maybe not network servers, 
> >> but for pure number crunching apps it would save people having to write 
> >> all the state saving and recovery that is needed now for long term 
> >> computations.
> >
> >Get a computer with hibernation support. That's just about what it is.
> 
> Bzzzt wrong anser. Hibernation stops the entire kernel. checkpoint restart
> stops processes, saves the entire state of the process. hibernation
> is just halt the processor.

Hibernation may not be.
I've just suspended to disk after the list line, pulled the power supplies,
taken the RAM chip out, shorted the pins to make really sure, then powered
back up.
Everything just resumed fine.

All I'd need to do kernel migration is a quick vi of the
disk file.

(well, almost)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 12:11     ` Ian Stirling
@ 2001-07-12 12:54       ` Jesse Pollard
  2001-07-12 14:15         ` Michael H. Warfield
  0 siblings, 1 reply; 56+ messages in thread
From: Jesse Pollard @ 2001-07-12 12:54 UTC (permalink / raw)
  To: root, linux-kernel

---------  Received message begins Here  ---------

> 
> > 
> > On Wed, 11 Jul 2001, C. Slater wrote:
> > >Would anyone else like to point out some other task somewhat related 
> > >and have me do it? :-)
> > >
> > >> > Before you even try switching kernels, first implement a process
> > >> > checkpoint/restart. The process must be resumed after a boot
> > >> > using the same
> > >> > kernel, with all I/O resumed. Now get it accepted into the kernel.
> > >> 
> > >> Hear, hear!  That would be a useful feature, maybe not network servers, 
> > >> but for pure number crunching apps it would save people having to write 
> > >> all the state saving and recovery that is needed now for long term 
> > >> computations.
> > >
> > >Get a computer with hibernation support. That's just about what it is.
> > 
> > Bzzzt wrong anser. Hibernation stops the entire kernel. checkpoint restart
> > stops processes, saves the entire state of the process. hibernation
> > is just halt the processor.
> 
> Hibernation may not be.
> I've just suspended to disk after the list line, pulled the power supplies,
> taken the RAM chip out, shorted the pins to make really sure, then powered
> back up.
> Everything just resumed fine.
> 
> All I'd need to do kernel migration is a quick vi of the
> disk file.
> 
> (well, almost)

That sounds more like a memory dump to disk, and reload after power restored.
Either that or possibly a separate power supply for RAM (something like a
trickle discharge capacitor; I've read that some capacitors can hold a charge
for about 3 days. Whether that would work for a large RAM or not, I have no
idea).

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12 12:54       ` Jesse Pollard
@ 2001-07-12 14:15         ` Michael H. Warfield
  0 siblings, 0 replies; 56+ messages in thread
From: Michael H. Warfield @ 2001-07-12 14:15 UTC (permalink / raw)
  To: Jesse Pollard; +Cc: root, linux-kernel

On Thu, Jul 12, 2001 at 07:54:10AM -0500, Jesse Pollard wrote:

> > > On Wed, 11 Jul 2001, C. Slater wrote:
> > > >Would anyone else like to point out some other task somewhat related 
> > > >and have me do it? :-)
> > > >
> > > >> > Before you even try switching kernels, first implement a process
> > > >> > checkpoint/restart. The process must be resumed after a boot
> > > >> > using the same
> > > >> > kernel, with all I/O resumed. Now get it accepted into the kernel.
> > > >> 
> > > >> Hear, hear!  That would be a useful feature, maybe not network servers, 
> > > >> but for pure number crunching apps it would save people having to write 
> > > >> all the state saving and recovery that is needed now for long term 
> > > >> computations.
> > > >
> > > >Get a computer with hibernation support. That's just about what it is.
> > > 
> > > Bzzzt wrong anser. Hibernation stops the entire kernel. checkpoint restart
> > > stops processes, saves the entire state of the process. hibernation
> > > is just halt the processor.
> > 
> > Hibernation may not be.
> > I've just suspended to disk after the list line, pulled the power supplies,
> > taken the RAM chip out, shorted the pins to make really sure, then powered
> > back up.
> > Everything just resumed fine.
> > 
> > All I'd need to do kernel migration is a quick vi of the
> > disk file.
> > 
> > (well, almost)

> That sounds more like a memory dump to disk, and reload after power restored.
> Either that or possibly a separate power supply for RAM (something like a
> trickle discharge capacitor; I've read that some capacitors can hold a charge
> for about 3 days. Whether that would work for a large RAM or not, I have no
> idea).

	It's a suspend to disk.  Lots of Laptops can do it and my Toshiba
Tecra 8100 can do it from the BIOS if I have a magic Windows partition with
an appropriate suspend file in it (which would be unencrypted, which would
be unacceptable - so I had to look for a Linux solution for the suspend
to disk problem).

	Check out the swsusp project up at Source Forge
<http://sourceforge.net/projects/swsusp/>.  It allows me to suspend
into the swap space by hitting Alt-SysRQ-D.  Great for changing
batteries on laptops (and, no, normal suspend does not survive a battery
change) but also REALLY GREAT for forensic security analysis of compromised
systems.  I hit the console of a compromised system and hit Alt-SysRq-D
and it flushs the dirty buffers, dumps memory to swap (preserving all
my "volatiles") and the shuts down.  I can snapshot the hard drive and
then restart the system where it left off for live running analysis.  If
that gets screwed up, I can restore the image again and restart again from
the same spot again.  I've also got all the memory and CPU state in that
disk image for "in-vitro" analysis by tools like Weitse's "The Coroner's
Toolkit".

	But that doesn't solve ANY of the problems with changing the kernel
itself.  Suspending and restoring the system is the easy part (and swsusp
still has some problems restoring X Windows).  Restoring a system to
a different kernel is orders of magnitude worse, if not down right
impossible for all the reasons given over internal structures and
interfaces.

	I would LOVE to have something like swsusp in the main line kernel,
however, just so I didn't have to convince IT departments to apply this
custom kernel patch to their production systems BEFORE they get their butts
kicked by some snott nosed script kiddie.  :-/

> -------------------------------------------------------------------------
> Jesse I Pollard, II
> Email: pollard@navo.hpc.mil
> 
> Any opinions expressed are solely my own.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw@WittsEnd.com
  (The Mad Wizard)      |  (678) 463-0932   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  1:24 ` C. Slater
  2001-07-12 10:07   ` Jesse Pollard
@ 2001-07-12 23:17   ` Pavel Machek
  1 sibling, 0 replies; 56+ messages in thread
From: Pavel Machek @ 2001-07-12 23:17 UTC (permalink / raw)
  To: C. Slater, linux-kernel

Hi!

> Would anyone else like to point out some other task somewhat related 
> and have me do it? :-)

Ummm, I need someone to cook me lunch tommorow ;-).

> > > Before you even try switching kernels, first implement a process
> > > checkpoint/restart. The process must be resumed after a boot
> > > using the same
> > > kernel, with all I/O resumed. Now get it accepted into the kernel.
> > 
> > Hear, hear!  That would be a useful feature, maybe not network servers, 
> > but for pure number crunching apps it would save people having to write 
> > all the state saving and recovery that is needed now for long term 
> > computations.
> 
> Get a computer with hibernation support. That's just about what it
> is.

No. Hibernation can be done (see sw_susp patches). This is per-process
-> different. And you could implement that "live upgrade" similar
way. Checkpoint all. Reboot with new kernel. Restart all. That's close
enough to live upgrade.

(Ouch, what are you going to do with programs that behave differently
on different kernel releases? What if you have X using some kernel
driver that goes away in new release?)

-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-12  1:03 Torrey Hoffman
  2001-07-12  1:24 ` C. Slater
@ 2001-07-12 20:47 ` Wilfried Weissmann
  1 sibling, 0 replies; 56+ messages in thread
From: Wilfried Weissmann @ 2001-07-12 20:47 UTC (permalink / raw)
  To: Torrey Hoffman
  Cc: 'jesse@cats-chateau.net', Kip Macy, Paul Jakma,
	Helge Hafting, C. Slater, linux-kernel

Torrey Hoffman wrote:
> 
> Jesse Pollard wrote:
> 
> [why switching kernels is very hard, and...]
> 
> > Before you even try switching kernels, first implement a process
> > checkpoint/restart. The process must be resumed after a boot
> > using the same
> > kernel, with all I/O resumed. Now get it accepted into the kernel.
> 
> Hear, hear!  That would be a useful feature, maybe not network servers,
> but for pure number crunching apps it would save people having to write
> all the state saving and recovery that is needed now for long term
> computations.

There is a checkpointing and resumeing lib at
ftp://gutemine.geo.uni-koeln.de/pub/chkpt/
I am not sure if it has been ported to linux yet, but it might be worth
a look.

> 
> For bonus points, make it work for clusters to synchronously save and
> restore state for the apps running on all the nodes at once...
> 
> Torrey

bye,
Wilfried

^ permalink raw reply	[flat|nested] 56+ messages in thread

[parent not found: <994895240.21189@whiskey.enposte.net>]

* Re: Switching Kernels without Rebooting?
       [not found] <994895240.21189@whiskey.enposte.net>
@ 2001-07-12  0:10 ` Stuart Lynne
  0 siblings, 0 replies; 56+ messages in thread
From: Stuart Lynne @ 2001-07-12  0:10 UTC (permalink / raw)
  To: linux-kernel

In article <994895240.21189@whiskey.enposte.net>,
Andreas Dilger <adilger@turbolinux.com> wrote:

>The best proposal I've heard so far was to use MOSIX to do live job
>migration between machines, and then upgrade the kernel like normal.
>In the end, it is the jobs that are running on the kernel, and not
>the kernel or the individual machine that are the most important.  One
>person pointed out that there is a single point of failure in the
>MOSIX "stub" machine, which doesn't help you in the end (how do you
>update the kernel there?).  If you can figure a way to enhance MOSIX
>to allow migrating the MOSIX "stub" processes to another machine, you
>will have solved your problem in a much easier way, IMHO.

If you then think of using VMWare or S/390 style methods of running multiple
copies of Linux on a single system you can now consider migrating processes
to a new kernel on the same system.


-- 
                                            __O 
Lineo - For Embedded Linux Solutions      _-\<,_ 
PGP Fingerprint: 28 E2 A0 15 99 62 9A 00 (_)/ (_) 88 EC A3 EE 2D 1C 15 68
Stuart Lynne <sl@fireplug.net>       www.fireplug.net        604-461-7532

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
@ 2001-07-11  9:52 David Balazic
  2001-07-11 10:08 ` Laramie Leavitt
  2001-07-11 15:19 ` C. Slater
  0 siblings, 2 replies; 56+ messages in thread
From: David Balazic @ 2001-07-11  9:52 UTC (permalink / raw)
  To: cslater, linux-kernel@vger.kernel.org

C. Slater (cslater@wcnet.org) wrote :

> Hi, i was just thinking about if it would be possible to switch kernels 
> without haveing to restart the entire system. Sort of a "Live kernel 
> replacement". It sort of goes along with the hot-swap-everything ideas. I 
> was thinking something like 
> - Take all the structs related to userspace memory and processes 
> - Save them to a reserved area of memory 
> - Halt the kernel, mostly 
> - Wipe kernel-space memory clean to avoid confusion 
> - Load new kernel into memory 
> - Replace all saved structures 
> - Start kernel running agin 
> 
> This seems like the easiest way to do it. The biggest problem is that there 
> would be somewhere about 30 seconds where all processes would be frozen. 

This is not a problem at all, because UNIX does not guarantee that
a process will get at least one CPU slice every X seconds.
( read : UNIX is not a real time system )

soft-suspend "freezes" processes for several hours anyway ...

Note that there is a patch for hot replacing a kernel, which is equivalent
to rebooting, but much faster :
Two Kernel Monte (Linux loading Linux on x86)
http://www.scyld.com/products/beowulf/software/monte.html


> This could cause problems with tcp/ip connections timeing out say on a 
> webserver, but it would be more managable than a few minutes downtime to 
> restart the machine.

[ rest snipped ]

-- 
David Balazic
--------------
"Be excellent to each other." - Bill & Ted
- - - - - - - - - - - - - - - - - - - - - -

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: Switching Kernels without Rebooting?
  2001-07-11  9:52 David Balazic
@ 2001-07-11 10:08 ` Laramie Leavitt
  2001-07-11 19:12   ` H. Peter Anvin
  2001-07-11 15:19 ` C. Slater
  1 sibling, 1 reply; 56+ messages in thread
From: Laramie Leavitt @ 2001-07-11 10:08 UTC (permalink / raw)
  To: David Balazic, cslater, linux-kernel

>
> This is not a problem at all, because UNIX does not guarantee that
> a process will get at least one CPU slice every X seconds.
> ( read : UNIX is not a real time system )
>
> soft-suspend "freezes" processes for several hours anyway ...
>
> Note that there is a patch for hot replacing a kernel, which is equivalent
> to rebooting, but much faster :
> Two Kernel Monte (Linux loading Linux on x86)
> http://www.scyld.com/products/beowulf/software/monte.html
>

So if the Two Kernel Monte patch was combined with the
system suspend/resume in swap patch then you add some
transitions so that the code path does this:

1-  Suspend->Monte
2-  Monte->Load new Kernel
3-  Load->Resume.

If it was just for very similar kernels, i.e. most
-pre and -ac kernels it would probably work fine.
If not, then you could just do the Monte route.

Laramie


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11 10:08 ` Laramie Leavitt
@ 2001-07-11 19:12   ` H. Peter Anvin
  0 siblings, 0 replies; 56+ messages in thread
From: H. Peter Anvin @ 2001-07-11 19:12 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <JKEGJJAJPOLNIFPAEDHLGEEFDFAA.laramie.leavitt@btinternet.com>
By author:    "Laramie Leavitt" <laramie.leavitt@btinternet.com>
In newsgroup: linux.dev.kernel
> 
> So if the Two Kernel Monte patch was combined with the
> system suspend/resume in swap patch then you add some
> transitions so that the code path does this:
> 
> 1-  Suspend->Monte
> 2-  Monte->Load new Kernel
> 3-  Load->Resume.
> 
> If it was just for very similar kernels, i.e. most
> -pre and -ac kernels it would probably work fine.
> If not, then you could just do the Monte route.
> 

The problem is that "freezing" the kernel state and then
reconstructing it into a form USABLE BY ANOTHER KERNEL (not even
necessarily another kernel version) is unbelievably hard; furthermore,
it imposes a severe constrains about the kind of changes you're
allowed to make during your kernel development.

It's a bad idea, folks. Give it up.

     -hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-11  9:52 David Balazic
  2001-07-11 10:08 ` Laramie Leavitt
@ 2001-07-11 15:19 ` C. Slater
  1 sibling, 0 replies; 56+ messages in thread
From: C. Slater @ 2001-07-11 15:19 UTC (permalink / raw)
  To: linux-kernel

> This is not a problem at all, because UNIX does not guarantee that
> a process will get at least one CPU slice every X seconds.
> ( read : UNIX is not a real time system )

It is not a problem when a system is isolated from all other systems, but if
we do this while some program is in a tcp/ip session, like a webserver, the
program will not beable to respond to an outside computer for the time while
we are swaping and initilizing kernels.  The tcp connection will time out on
the side of the other computer then. But this is still quite managable
compared to a minute or 2 for a system to totaly reboot itself.

> soft-suspend "freezes" processes for several hours anyway ...

Yes, so it will not be a problem at least with processes dieing because they
did not get message X at time Y.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Switching Kernels without Rebooting?
@ 2001-07-10 18:42 C. Slater
  2001-07-10 18:50 ` Chris Wedgwood
  2001-07-10 21:11 ` Jesper Juhl
  0 siblings, 2 replies; 56+ messages in thread
From: C. Slater @ 2001-07-10 18:42 UTC (permalink / raw)
  To: linux-kernel

Hi, i was just thinking about if it would be possible to switch kernels
without haveing to restart the entire system. Sort of a "Live kernel
replacement". It sort of goes along with the hot-swap-everything ideas. I
was thinking something like
- Take all the structs related to userspace memory and processes
- Save them to a reserved area of memory
- Halt the kernel, mostly
- Wipe kernel-space memory clean to avoid confusion
- Load new kernel into memory
- Replace all saved structures
- Start kernel running agin

This seems like the easiest way to do it. The biggest problem is that there
would be somewhere about 30 seconds where all processes would be frozen.
This could cause problems with  tcp/ip connections timeing out say on a
webserver, but it would be more managable than a few minutes downtime to
restart the machine. There is one other way i can think of, something like
- Copy entire kernel memory to another reserved area of memory
- Start new kernel running as a "secondary kernel"
- Transfer control from "Primary kernel" to "Secondary Kernel"
- Load new kernel where the kernel was previously located
- Start new kernel running as a "Secondary Kernel" agin
- Transfer control between kernels
- Kill and remove temporary kernel

This system could result in nearly zero downtime, but would require more
memory, be more complicated, and would require significant modifications to
allow for a "Secondary Kernel" to be runing. Anyways, I think this could be
a nice feature of the kernel in situations where zero downtime is required.
Yes, it might be a case of "creeping featurism", but if you think so, then
tell me. If you would be interested in helping with it, send me a message,
if there is any support for it. Please CC: me any messages, it would be
quite helpful since i do not recieve the mailing list due to the excessive
volume. If you don't I will pick it up in the archives, but not as soon.
Thanks.

     Colin


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-10 18:42 C. Slater
@ 2001-07-10 18:50 ` Chris Wedgwood
  2001-07-10 21:11 ` Jesper Juhl
  1 sibling, 0 replies; 56+ messages in thread
From: Chris Wedgwood @ 2001-07-10 18:50 UTC (permalink / raw)
  To: C. Slater; +Cc: linux-kernel

On Tue, Jul 10, 2001 at 02:42:12PM -0400, C. Slater wrote:

    Hi, i was just thinking about if it would be possible to switch
    kernels without haveing to restart the entire system.

Pre-solaris 8 sun were promising this

    Sort of a "Live kernel replacement". It sort of goes along with
    the hot-swap-everything ideas. I was thinking something like

    - Take all the structs related to userspace memory and processes
    - Save them to a reserved area of memory
    - Halt the kernel, mostly

what about timing critical things? you mention networking, but there
are others

    - Wipe kernel-space memory clean to avoid confusion
    - Load new kernel into memory

    - Replace all saved structures

what if the layout of these changes as it often does?

    - Start kernel running agin

    This seems like the easiest way to do it. The biggest problem is
    that there would be somewhere about 30 seconds where all processes
    would be frozen.

It seems like difficult to implement solution for little gain. Linux
can be booted _very_ quickly on modern machines, probably about 15s
for most hardware.  If you use burn linux into the rom of use a
flashdisk (or similar solution), you can have everything rebooted in
under five seconds.

The zflinux chips/machines probably boot in half that, maybe less (as
tested on a prototype many months ago).

  --cw

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Switching Kernels without Rebooting?
  2001-07-10 18:42 C. Slater
  2001-07-10 18:50 ` Chris Wedgwood
@ 2001-07-10 21:11 ` Jesper Juhl
  1 sibling, 0 replies; 56+ messages in thread
From: Jesper Juhl @ 2001-07-10 21:11 UTC (permalink / raw)
  To: C. Slater; +Cc: linux-kernel

C. Slater wrote:

> Hi, i was just thinking about if it would be possible to switch kernels
> without haveing to restart the entire system. Sort of a "Live kernel
> replacement". It sort of goes along with the hot-swap-everything ideas. I

I actually suggested the exact same thing back in 1998 ( Link to post in 
archives: http://uwsg.iu.edu/hypermail/linux/kernel/9808.1/1282.html ), 
but I never recieved much response. As I remember it, the emails I 
recieved where along the line of; "too much effort for too little gain, 
use clustering instead". I would still be very interrested in such a 
feature, but like back in 1998 this is still *way* out of my league to 
try to implement (but I'd be happy to help in testing :).

Best regards,
Jesper Juhl
juhl@eisenstein.dk

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2001-07-13 21:09 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <NOEJJDACGOHCKNCOGFOMOEKECGAA.davids@webmaster.com>
2001-07-10 20:43 ` Switching Kernels without Rebooting? C. Slater
2001-07-11  3:50   ` FORT David
2001-07-11  9:10   ` Helge Hafting
2001-07-11 15:41     ` C. Slater
2001-07-11 18:11       ` Switching Kernels without Rebooting? [MOSIX] Carlos O'Donell Jr.
2001-07-12 10:16       ` Switching Kernels without Rebooting? Helge Hafting
2001-07-11 22:12     ` Paul Jakma
2001-07-11 22:14       ` Rik van Riel
2001-07-11 22:36         ` C. Slater
2001-07-11 23:44           ` Andreas Dilger
2001-07-12  1:17             ` C. Slater
2001-07-12 15:39               ` Rik van Riel
2001-07-12 16:23                 ` Albert D. Cahalan
2001-07-12 17:37                   ` Mike Borrelli
2001-07-12 18:05                   ` Rik van Riel
2001-07-13 10:07                     ` Pau Aliagas
2001-07-12 18:48                   ` Chris Friesen
2001-07-12 10:12             ` Ralf Baechle
2001-07-12 15:32           ` Rik van Riel
2001-07-11 22:36         ` David Schwartz
2001-07-12  7:23         ` Kai Henningsen
2001-07-12 10:05           ` Helge Hafting
2001-07-13  6:50             ` Kai Henningsen
2001-07-12 17:58           ` Hua Zhong
2001-07-12 23:24           ` swsusp again [was Re: Switching Kernels without Rebooting?] Pavel Machek
2001-07-13 21:08             ` Alan Cox
2001-07-11 22:46       ` Switching Kernels without Rebooting? Kip Macy
2001-07-11 23:02         ` Rik van Riel
2001-07-12  0:31         ` Jesse Pollard
2001-07-12  1:10           ` Hua Zhong
2001-07-11 23:36       ` H. Peter Anvin
2001-07-12  7:23       ` Ville Herva
2001-07-13  1:11 tas
2001-07-13  3:45 ` Ian Stirling
  -- strict thread matches above, loose matches on Subject: below --
2001-07-12 15:32 Jesse Pollard
2001-07-12 12:23 Jesse Pollard
2001-07-12 14:55 ` Ralf Baechle
2001-07-12  4:48 Frank Davis
2001-07-12  5:08 ` John Alvord
2001-07-13  9:10   ` Chuck Hemker
2001-07-12  1:03 Torrey Hoffman
2001-07-12  1:24 ` C. Slater
2001-07-12 10:07   ` Jesse Pollard
2001-07-12 12:11     ` Ian Stirling
2001-07-12 12:54       ` Jesse Pollard
2001-07-12 14:15         ` Michael H. Warfield
2001-07-12 23:17   ` Pavel Machek
2001-07-12 20:47 ` Wilfried Weissmann
     [not found] <994895240.21189@whiskey.enposte.net>
2001-07-12  0:10 ` Stuart Lynne
2001-07-11  9:52 David Balazic
2001-07-11 10:08 ` Laramie Leavitt
2001-07-11 19:12   ` H. Peter Anvin
2001-07-11 15:19 ` C. Slater
2001-07-10 18:42 C. Slater
2001-07-10 18:50 ` Chris Wedgwood
2001-07-10 21:11 ` Jesper Juhl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox