linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC] BOOKE watchdog and kexec
@ 2007-05-22 23:53 Dave Jiang
  2007-05-23  0:18 ` Geoff Levand
  2007-05-23  3:36 ` Michael Ellerman
  0 siblings, 2 replies; 5+ messages in thread
From: Dave Jiang @ 2007-05-22 23:53 UTC (permalink / raw)
  To: linuxppc-dev

What would be the appropriate way to deal with the BOOKE watchdog in order to
properly kexec? The BOOKE watchdog cannot be disabled. With the current
implementation, a watchdog daemon in userland is required to poke the
/dev/watchdog continously in order to keep it from going off. In the kexec
situation, the watchdog daemon in userland goes away when the new kernel is
executed. It is very possible that the new kernel can potentially timeout on a
certain hardware device initialization (i.e. SCSI discovery/timeout) and causes
the watchdog to go off and reset the hardware. The reset is of course not
wanted in this situation.

Several solutions comes into mind:
1. Have the kernel timer poke the watchdog. This would ensure situation
described above would never happen. I think x86 does this with NMI watchdog.

2. Have the watchdog driver spawn a kernel thread to poke the watchdog at a
periodic time. Or perhaps use the delayed-work mechanism to do that.

3. Set the highest bit of the watchdog register so that it does not expire for
2^32 ticks.

IMHO, #2 seems to be a reasonable approach. Comments please?

-- 

------------------------------------------------------
Dave Jiang
Software Engineer
MontaVista Software, Inc.
http://www.mvista.com
------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] BOOKE watchdog and kexec
  2007-05-22 23:53 [RFC] BOOKE watchdog and kexec Dave Jiang
@ 2007-05-23  0:18 ` Geoff Levand
  2007-05-23  0:29   ` Dave Jiang
  2007-05-23  3:36 ` Michael Ellerman
  1 sibling, 1 reply; 5+ messages in thread
From: Geoff Levand @ 2007-05-23  0:18 UTC (permalink / raw)
  To: Dave Jiang; +Cc: linuxppc-dev

Dave Jiang wrote:
> What would be the appropriate way to deal with the BOOKE watchdog in order to
> properly kexec? The BOOKE watchdog cannot be disabled. With the current
> implementation, a watchdog daemon in userland is required to poke the
> /dev/watchdog continously in order to keep it from going off. In the kexec
> situation, the watchdog daemon in userland goes away when the new kernel is
> executed. It is very possible that the new kernel can potentially timeout on a
> certain hardware device initialization (i.e. SCSI discovery/timeout) and causes
> the watchdog to go off and reset the hardware. The reset is of course not
> wanted in this situation.

I would think the same situation exists when the bootloader loads the first
kernel.  If that works, then you should be able to use the same mechanism to
get the second kernel up.

-Geoff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] BOOKE watchdog and kexec
  2007-05-23  0:18 ` Geoff Levand
@ 2007-05-23  0:29   ` Dave Jiang
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Jiang @ 2007-05-23  0:29 UTC (permalink / raw)
  To: Geoff Levand; +Cc: linuxppc-dev

Geoff Levand wrote:
> Dave Jiang wrote:
>> What would be the appropriate way to deal with the BOOKE watchdog in order to
>> properly kexec? The BOOKE watchdog cannot be disabled. With the current
>> implementation, a watchdog daemon in userland is required to poke the
>> /dev/watchdog continously in order to keep it from going off. In the kexec
>> situation, the watchdog daemon in userland goes away when the new kernel is
>> executed. It is very possible that the new kernel can potentially timeout on a
>> certain hardware device initialization (i.e. SCSI discovery/timeout) and causes
>> the watchdog to go off and reset the hardware. The reset is of course not
>> wanted in this situation.
> 
> I would think the same situation exists when the bootloader loads the first
> kernel.  If that works, then you should be able to use the same mechanism to
> get the second kernel up.
> 
> -Geoff
> 

Not really. The bootloader starts from a hardware reset. The watchdog is off
from a hardware reset. The kernel driver has to specifically turn the watchdog
on either via kernel command line or by opening the watchdog device
/dev/watchdog right now. So technically this issue already exists even without
kexec. If the watchdog is turned on via kernel parameter and we hit a device
initialization timeout that takes too long, then we will get a watchdog reset.
There is a period of uncertainty between the watchdog turning on and when the
userland watchdog daemon is started with the current implementation.

-- 

------------------------------------------------------
Dave Jiang
Software Engineer
MontaVista Software, Inc.
http://www.mvista.com
------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] BOOKE watchdog and kexec
  2007-05-22 23:53 [RFC] BOOKE watchdog and kexec Dave Jiang
  2007-05-23  0:18 ` Geoff Levand
@ 2007-05-23  3:36 ` Michael Ellerman
  2007-05-23  6:10   ` Kumar Gala
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Ellerman @ 2007-05-23  3:36 UTC (permalink / raw)
  To: Dave Jiang; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1591 bytes --]

On Tue, 2007-05-22 at 16:53 -0700, Dave Jiang wrote:
> What would be the appropriate way to deal with the BOOKE watchdog in order to
> properly kexec? The BOOKE watchdog cannot be disabled. With the current
> implementation, a watchdog daemon in userland is required to poke the
> /dev/watchdog continously in order to keep it from going off. In the kexec
> situation, the watchdog daemon in userland goes away when the new kernel is
> executed. It is very possible that the new kernel can potentially timeout on a
> certain hardware device initialization (i.e. SCSI discovery/timeout) and causes
> the watchdog to go off and reset the hardware. The reset is of course not
> wanted in this situation.
> 
> Several solutions comes into mind:
> 1. Have the kernel timer poke the watchdog. This would ensure situation
> described above would never happen. I think x86 does this with NMI watchdog.
> 
> 2. Have the watchdog driver spawn a kernel thread to poke the watchdog at a
> periodic time. Or perhaps use the delayed-work mechanism to do that.
> 
> 3. Set the highest bit of the watchdog register so that it does not expire for
> 2^32 ticks.

#3 sounds the easiest. You'd set it in machine_kexec_prepare() and then
have the second kernel restore a sane value. I assume 2^32 ticks is long
enough to boot?

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] BOOKE watchdog and kexec
  2007-05-23  3:36 ` Michael Ellerman
@ 2007-05-23  6:10   ` Kumar Gala
  0 siblings, 0 replies; 5+ messages in thread
From: Kumar Gala @ 2007-05-23  6:10 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev


On May 22, 2007, at 10:36 PM, Michael Ellerman wrote:

> On Tue, 2007-05-22 at 16:53 -0700, Dave Jiang wrote:
>> What would be the appropriate way to deal with the BOOKE watchdog  
>> in order to
>> properly kexec? The BOOKE watchdog cannot be disabled. With the  
>> current
>> implementation, a watchdog daemon in userland is required to poke the
>> /dev/watchdog continously in order to keep it from going off. In  
>> the kexec
>> situation, the watchdog daemon in userland goes away when the new  
>> kernel is
>> executed. It is very possible that the new kernel can potentially  
>> timeout on a
>> certain hardware device initialization (i.e. SCSI discovery/ 
>> timeout) and causes
>> the watchdog to go off and reset the hardware. The reset is of  
>> course not
>> wanted in this situation.
>>
>> Several solutions comes into mind:
>> 1. Have the kernel timer poke the watchdog. This would ensure  
>> situation
>> described above would never happen. I think x86 does this with NMI  
>> watchdog.
>>
>> 2. Have the watchdog driver spawn a kernel thread to poke the  
>> watchdog at a
>> periodic time. Or perhaps use the delayed-work mechanism to do that.
>>
>> 3. Set the highest bit of the watchdog register so that it does  
>> not expire for
>> 2^32 ticks.
>
> #3 sounds the easiest. You'd set it in machine_kexec_prepare() and  
> then
> have the second kernel restore a sane value. I assume 2^32 ticks is  
> long
> enough to boot?

I haven't looked at the 4xx side, but on fsl parts you can pick any  
of 64-bits in the time base as the transition point, so you should  
have enough time, however maybe setting it to something like the  
panic_timeout or make it configurable is the best choice.

However, I agree that just tweaking the time and restoring it the  
best option.

- k

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-05-23  6:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-22 23:53 [RFC] BOOKE watchdog and kexec Dave Jiang
2007-05-23  0:18 ` Geoff Levand
2007-05-23  0:29   ` Dave Jiang
2007-05-23  3:36 ` Michael Ellerman
2007-05-23  6:10   ` Kumar Gala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).