public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Just an offer
       [not found] <20020517122946.18213.qmail@bilmuh.ege.edu.tr>
@ 2002-05-17 12:41 ` Thomas 'Dent' Mirlacher
  2002-05-17 13:13   ` Tomasz Rola
  2002-05-17 13:01 ` Richard B. Johnson
  2002-05-21  2:42 ` Petro
  2 siblings, 1 reply; 11+ messages in thread
From: Thomas 'Dent' Mirlacher @ 2002-05-17 12:41 UTC (permalink / raw)
  To: Halil Demirezen; +Cc: alan, linux-kernel

On 17 May 2002, Halil Demirezen wrote:

> 
> I wonder if there is a way of making the kernel decide whether it can boot successfully or not. For example, lets think of that i am compiling an update kernel not on the local machine but on any other pc using telnet or ssh emulators. And eventually it is time to reboot the machine and and run on the new kernel. However there has been an error during the compiling. - such as misconfiguration. Normally the machine will not boot and halt. So, is not there any way to reboot itself from the previous kernel after some time that it realizes it cannot boot properly. Maybe there is such a way. But, if not, this is an imaginary. Because i usually see these kind of problems ;)


usually you'll use a hardware watchdog for that purpose. - but you need
to be sure to instruct your bootloader to boot the old image when the watchdog
reboots your machine ...

	tm
-- 
in some way i do, and in some way i don't.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
       [not found] <20020517122946.18213.qmail@bilmuh.ege.edu.tr>
  2002-05-17 12:41 ` Just an offer Thomas 'Dent' Mirlacher
@ 2002-05-17 13:01 ` Richard B. Johnson
  2002-05-17 14:19   ` Tomas Szepe
  2002-05-21  2:42 ` Petro
  2 siblings, 1 reply; 11+ messages in thread
From: Richard B. Johnson @ 2002-05-17 13:01 UTC (permalink / raw)
  To: Halil Demirezen; +Cc: alan, linux-kernel

On 17 May 2002, Halil Demirezen wrote:

> 
> I wonder if there is a way of making the kernel decide
> whether it can boot successfully or not. For example, lets
> think of that i am compiling an update kernel not on the local
> machine but on any other pc using telnet or ssh emulators. And
> eventually it is time to reboot the machine and and run on the new
> kernel. However there has been an error during the compiling. - such
> as misconfiguration. Normally the machine will not boot and halt. So,
> is not there any way to reboot itself from the previous kernel
> after
> some time that it realizes it cannot boot properly. Maybe there is
> such
> a way. But, if not, this is an imaginary. Because i usually see these
> kind of problems ;)
> 
>    Bye.


Initially, I thought this was an dumb question, but it's not! If you
are doing a lot of work on kernels remotely, it just might be a
good reason to configure your remote machine(s) to boot off the network.

Then, since the kernel you are booting is local to your machine,
the boot-server, you can change it at will until you get it right.

The remaining problem is how one trips a reboot if the remote machine
doesn't come up correctly. That problem can be handled by temporarily
changing panic() to a hard reset.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
  2002-05-17 12:41 ` Just an offer Thomas 'Dent' Mirlacher
@ 2002-05-17 13:13   ` Tomasz Rola
  0 siblings, 0 replies; 11+ messages in thread
From: Tomasz Rola @ 2002-05-17 13:13 UTC (permalink / raw)
  To: Thomas 'Dent' Mirlacher; +Cc: Halil Demirezen, alan, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 17 May 2002, Thomas 'Dent' Mirlacher wrote:

> On 17 May 2002, Halil Demirezen wrote:
> 
> > 
> > I wonder if there is a way of making the kernel decide whether it can boot successfully or not. For example, lets think of that i am compiling an update kernel not on the local machine but on any other pc using telnet or ssh emulators. And eventually it is time to reboot the machine and and run on the new kernel. However there has been an error during the compiling. - such as misconfiguration. Normally the machine will not boot and halt. So, is not there any way to reboot itself from the previous kernel after some time that it realizes it cannot boot properly. Maybe there is such a way. But, if not, this is an imaginary. Because i usually see these kind of problems ;)
> 
> 
> usually you'll use a hardware watchdog for that purpose. - but you need
> to be sure to instruct your bootloader to boot the old image when the watchdog
> reboots your machine ...

And this can be rather difficult to do with telnet :-)...

bye
T.

- --
** A C programmer asked whether computer had Buddha's nature.      **
** As the answer, master did "rm -rif" on the programmer's home    **
** directory. And then the C programmer became enlightened...      **
**                                                                 **
** Tomasz Rola          mailto:tomasz_rola@bigfoot.com             **


-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
Charset: noconv

iQA/AwUBPOUCDxETUsyL9vbiEQIJIACgreomBqpaNZSzPl+uKRvCCvczudUAn3df
z/JbB/7CXjlwpJhHhE8lHMW3
=vaAU
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
       [not found] <Pine.LNX.3.95.1020517085300.4551A-100000@chaos.analogic.co m>
@ 2002-05-17 13:25 ` Anton Altaparmakov
  0 siblings, 0 replies; 11+ messages in thread
From: Anton Altaparmakov @ 2002-05-17 13:25 UTC (permalink / raw)
  To: root; +Cc: Halil Demirezen, alan, linux-kernel

At 14:01 17/05/02, Richard B. Johnson wrote:
>The remaining problem is how one trips a reboot if the remote machine
>doesn't come up correctly. That problem can be handled by temporarily
>changing panic() to a hard reset.

As long as you have a second machine colocated with the first one, you can 
connect an optocoupler to one of the data lines on the parallel port of the 
"stable" machine and to the motherboard reset connector on the "unstable" 
machine. You then raise the appropriate data line on the "stable" machine 
and lower it again (I keep it at high for 100msecs) and the "unstable" 
machine is hard reset... So you just need an old 386 with a parallel port 
and a network card that will never be reset and that can reboot multiple 
machines (each data line can connect to another machine, etc...).

Best regards,

         Anton

ps. Kudos for the idea go to Rogier Wolff and not to me... I just 
implemented his idea locally and can only say it works wonders. (-:


-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
  2002-05-17 13:01 ` Richard B. Johnson
@ 2002-05-17 14:19   ` Tomas Szepe
  2002-05-17 14:48     ` Richard B. Johnson
  0 siblings, 1 reply; 11+ messages in thread
From: Tomas Szepe @ 2002-05-17 14:19 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Halil Demirezen, alan, linux-kernel

> The remaining problem is how one trips a reboot if the remote machine
> doesn't come up correctly. That problem can be handled by temporarily
> changing panic() to a hard reset.

Trouble is, this couldn't "detect" problems like unresolved symbols in
ethernet drivers or a troublesome fix that makes init/mount malfunction
and many more common issues that make you have to get in the car and
drive off to reset the damn beast.

-- 
"when you do things right, people won't be sure you've done anything at all."
- god to bender

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
  2002-05-17 14:19   ` Tomas Szepe
@ 2002-05-17 14:48     ` Richard B. Johnson
  2002-05-17 16:30       ` Athanasius
  0 siblings, 1 reply; 11+ messages in thread
From: Richard B. Johnson @ 2002-05-17 14:48 UTC (permalink / raw)
  To: Tomas Szepe; +Cc: Halil Demirezen, alan, linux-kernel

On Fri, 17 May 2002, Tomas Szepe wrote:

> > The remaining problem is how one trips a reboot if the remote machine
> > doesn't come up correctly. That problem can be handled by temporarily
> > changing panic() to a hard reset.
> 
> Trouble is, this couldn't "detect" problems like unresolved symbols in
> ethernet drivers or a troublesome fix that makes init/mount malfunction
> and many more common issues that make you have to get in the car and
> drive off to reset the damn beast.

Where there is a will, there is a way. As others have reported, you
can have an old "always-on" machine at the remote site. You can have
LILO redirect kernel messages out the serial port to be viewed
from your always-on machine, you can reset the hung machine with an
opto-isolator driven off your always-on machine's parallel port, etc.

It you are really serious about doing remote updates, you can also
boot using initrd, and install a bunch of disk drivers until one
(or more) don't fail to install, install a bunch of ethernet drivers
until one (or more) don't fail to install, etc. This can all be
handled in the initrd boot-script. --And that boot-script can be
a full-fledged 'C' program that can do anything a root-priviliged
program can do, including mounting an alternate root file-system
(maybe a CDROM) if all else fails. You don't have to use the default
"run-off-the-end-of-the-script" initrd process. Any program that
you write to replace the ash.static/initrd script has complete
control of the machine.

Booting an initial RAM-Disk requires NO hardware drivers! The
thing got loaded by LILO, through the BIOS services, then a
transition to protected-mode, you don't need anything installed.
Your program or script can try all possible drivers, trying to
get a root file-system on-line. The same for a network card.

Anyway, once it boots and you can review what actually got installed
from the network, you can make a final initrd boot-script.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
  2002-05-17 14:48     ` Richard B. Johnson
@ 2002-05-17 16:30       ` Athanasius
  2002-05-17 16:38         ` andrew may
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Athanasius @ 2002-05-17 16:30 UTC (permalink / raw)
  To: lkml

On Fri, May 17, 2002 at 10:48:21AM -0400, Richard B. Johnson wrote:
> Where there is a will, there is a way. As others have reported, you
> can have an old "always-on" machine at the remote site. You can have
> LILO redirect kernel messages out the serial port to be viewed

   It strikes me that this is also in part a LILO 'problem'.  We could
use some way to tell LILO to only boot a given image _once_ as the
default, and thence reboot to the normal default.  Combine this with any
of the methods for remote reboot (hardware watchdog, other machine wired
to reset, whatever) and you can easily recover from a futzed new kernel.
   I'm sure LILO can find room for a single byte 'flag' for such things
and an extra per-config option in /etc/lilo.conf.

-Ath, checking the LILO docs to see if it does something like this
already...
-- 
- Athanasius = Athanasius(at)miggy.org.uk / http://www.clan-lovely.org/~athan/
                  Finger athan(at)fysh.org for PGP key
	   "And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
  2002-05-17 16:30       ` Athanasius
@ 2002-05-17 16:38         ` andrew may
  2002-05-17 16:50         ` Richard B. Johnson
  2002-05-17 21:52         ` Stevie O
  2 siblings, 0 replies; 11+ messages in thread
From: andrew may @ 2002-05-17 16:38 UTC (permalink / raw)
  To: Athanasius, lkml

On Fri, May 17, 2002 at 05:30:24PM +0100, Athanasius wrote:
>    It strikes me that this is also in part a LILO 'problem'.  We could
> use some way to tell LILO to only boot a given image _once_ as the
> default, and thence reboot to the normal default.  Combine this with any

lilo -R test-image

Just in case you fail in your check.
 
> -Ath, checking the LILO docs to see if it does something like this
> already...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
  2002-05-17 16:30       ` Athanasius
  2002-05-17 16:38         ` andrew may
@ 2002-05-17 16:50         ` Richard B. Johnson
  2002-05-17 21:52         ` Stevie O
  2 siblings, 0 replies; 11+ messages in thread
From: Richard B. Johnson @ 2002-05-17 16:50 UTC (permalink / raw)
  To: Athanasius; +Cc: lkml

On Fri, 17 May 2002, Athanasius wrote:

> On Fri, May 17, 2002 at 10:48:21AM -0400, Richard B. Johnson wrote:
> > Where there is a will, there is a way. As others have reported, you
> > can have an old "always-on" machine at the remote site. You can have
> > LILO redirect kernel messages out the serial port to be viewed
> 
>    It strikes me that this is also in part a LILO 'problem'.  We could
> use some way to tell LILO to only boot a given image _once_ as the
> default, and thence reboot to the normal default.  Combine this with any
> of the methods for remote reboot (hardware watchdog, other machine wired
> to reset, whatever) and you can easily recover from a futzed new kernel.
>    I'm sure LILO can find room for a single byte 'flag' for such things
> and an extra per-config option in /etc/lilo.conf.
> 
> -Ath, checking the LILO docs to see if it does something like this
> already...
> -- 

Of course there is a 'default' entry to be booted by LILO. But
this is what it will boot if there's nobody at the remote site
to hit the ALT key and change to another.

LILO could be modified to look for a printer-port bit pattern
to decide which entry it will boot. This is trivial but makes
it incompatible with systems that don't have anything connected
(which could cause any random bit-pattern to be latched.

LILO could also be modified to use the BIOS int 0x14 RS-232C
interface to send prompts out there as well as the screen/keyboard.
This allows an always-on remote computer to do the boot configuration.

Nevertheless, there is no way for LILO to keep track of boots and
decide for itself if it should try the 'next' kernel because the
previous didn't work. This would require that it write something
into media that is not always writable. The PC, itself, has no
where to store such information except in the CMOS, all locations
of which may already have been 'taken' or, at least included in
the checksum, resulting in no boot sequence at all because the
BIOS will say "CMOS Checksum failure... hit [F2]..." or whatever
else non-standard may exist.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
  2002-05-17 16:30       ` Athanasius
  2002-05-17 16:38         ` andrew may
  2002-05-17 16:50         ` Richard B. Johnson
@ 2002-05-17 21:52         ` Stevie O
  2 siblings, 0 replies; 11+ messages in thread
From: Stevie O @ 2002-05-17 21:52 UTC (permalink / raw)
  To: Athanasius, lkml

At 05:30 PM 5/17/2002 +0100, Athanasius wrote:

>   It strikes me that this is also in part a LILO 'problem'.  We could
>use some way to tell LILO to only boot a given image _once_ as the
>default, and thence reboot to the normal default.  Combine this with any
>of the methods for remote reboot (hardware watchdog, other machine wired
>to reset, whatever) and you can easily recover from a futzed new kernel.
>   I'm sure LILO can find room for a single byte 'flag' for such things
>and an extra per-config option in /etc/lilo.conf.

Erm, this *IS* possible.

excerpt from `man lilo`:
---
       /sbin/lilo -R - set default command line for next reboot

-R command line
   This option sets the default command for the boot loader the next time it executes. The boot loader will then
   erase  this  line: this is a once-only command. It is typically used in reboot scripts, just before 
    calling  shutdown -r'.

---

/etc/lilo.conf:

image = /vmlinuz-stable
        label = Stable_Kernel
        root = /dev/hda1
        read-only

image = /vmlinuz-test
        label = Test_Kernel
        root = /dev/hda1
        read-only

---

test_kernel.sh:

#!/bin/sh

lilo -R 'default=Test_Kernel' && reboot

---------------------

Normally, LILO will boot the first image listed (in this case, the Stable_Kernel) by default.
However, running 'test_kernel.sh' will -- for one time only -- make the default kernel be the Test_Kernel.

The only thing left is to make the kernel (or, if something goes wrong there, userspace) reboot if something isn't working okay.


--
Stevie-O

Real programmers use COPY CON PROGRAM.EXE


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Just an offer
       [not found] <20020517122946.18213.qmail@bilmuh.ege.edu.tr>
  2002-05-17 12:41 ` Just an offer Thomas 'Dent' Mirlacher
  2002-05-17 13:01 ` Richard B. Johnson
@ 2002-05-21  2:42 ` Petro
  2 siblings, 0 replies; 11+ messages in thread
From: Petro @ 2002-05-21  2:42 UTC (permalink / raw)
  To: linux-kernel

On Fri, May 17, 2002 at 12:29:46PM -0000, Halil Demirezen wrote:
> 
> I wonder if there is a way of making the kernel decide whether it can boot 
> successfully or not. For example, lets think of that i am compiling an 
> update kernel not on the local machine but on any other pc using telnet or 
> ssh emulators. And eventually it is time to reboot the machine and and run 
> on the new kernel. However there has been an error during the compiling. - 
> such as misconfiguration. Normally the machine will not boot and halt. So, 
> is not there any way to reboot itself from the previous kernel after some 
> time that it realizes it cannot boot properly. Maybe there is such a way. 
> But, if not, this is an imaginary. Because i usually see these kind of 
> problems ;)

    (1) Serial console is your buddy. 

    (2) Remote power switches save your butt: 
    http://www.apc.com/resource/include/techspec_index.cfm?base_sku=
    AP9211&language=en&LOCAL.APCCountryCode=us

    There is only so much software can do. 
    

-- 
My last cigarette was roughly 28 days, 17 hours, 11 minutes ago.
YHBW

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-05-21  2:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20020517122946.18213.qmail@bilmuh.ege.edu.tr>
2002-05-17 12:41 ` Just an offer Thomas 'Dent' Mirlacher
2002-05-17 13:13   ` Tomasz Rola
2002-05-17 13:01 ` Richard B. Johnson
2002-05-17 14:19   ` Tomas Szepe
2002-05-17 14:48     ` Richard B. Johnson
2002-05-17 16:30       ` Athanasius
2002-05-17 16:38         ` andrew may
2002-05-17 16:50         ` Richard B. Johnson
2002-05-17 21:52         ` Stevie O
2002-05-21  2:42 ` Petro
     [not found] <Pine.LNX.3.95.1020517085300.4551A-100000@chaos.analogic.co m>
2002-05-17 13:25 ` Anton Altaparmakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox