linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* watchdogs and kdump
@ 2011-10-27 20:30 Don Zickus
  2011-10-27 21:43 ` Pádraig Brady
  2011-10-28 15:46 ` Alejandro Cabrera
  0 siblings, 2 replies; 7+ messages in thread
From: Don Zickus @ 2011-10-27 20:30 UTC (permalink / raw)
  To: linux-watchdog; +Cc: kexec, vgoyal, amwang

Hi,

I was assisting a customer the other day debugging a kdump[1] problem, when we
noticed the real problem was the hardware watchdog was firing and
rebooting the box.

Of course, this can be inconvienant if the panic happens right before the
watchdog is supposed to be kicked, leading to a spontaneous reboot before
the second kernel finishes booting and loading the watchdog module.

I was trying to think of a way to solve this and thought, one way to
minimize the problem is to kick the watchdog before we jump into the kdump
kernel.  Another way is to disable the watchdog entirely, but that doesn't
work on all hardware I believe.

Anyway, I was posting on the watchdog mailing list to see if anyone had any
ideas that might help.  And if my above idea to kick the watchdog before
jumping into the kdump kernel seems ok, then an api would need to be
developed.

I am willing to do any coding and testing necessary, but before I did, I
wanted help to get a direction to go in first.

Thoughts?

Cheers,
Don

[1] - I am ignorantly assuming everyone knows what kdump is.  Kdumping is
the ability to jump into a previously loaded kernel in the case of a
panic.  This kdump (second) kernel would run in reserved memory, copy the
first kernel's memory to a file and save it to a pre-determined location.
There is no system reboot in between the first and second kernel, so no
chance for the watchdog to disarm itself.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: watchdogs and kdump
  2011-10-27 20:30 watchdogs and kdump Don Zickus
@ 2011-10-27 21:43 ` Pádraig Brady
  2011-10-28 13:39   ` Don Zickus
  2011-10-28 15:46 ` Alejandro Cabrera
  1 sibling, 1 reply; 7+ messages in thread
From: Pádraig Brady @ 2011-10-27 21:43 UTC (permalink / raw)
  To: Don Zickus; +Cc: linux-watchdog, kexec, vgoyal, amwang

On 10/27/2011 09:30 PM, Don Zickus wrote:
> Hi,
> 
> I was assisting a customer the other day debugging a kdump[1] problem, when we
> noticed the real problem was the hardware watchdog was firing and
> rebooting the box.
> 
> Of course, this can be inconvienant if the panic happens right before the
> watchdog is supposed to be kicked, leading to a spontaneous reboot before
> the second kernel finishes booting and loading the watchdog module.
> 
> I was trying to think of a way to solve this and thought, one way to
> minimize the problem is to kick the watchdog before we jump into the kdump
> kernel.  Another way is to disable the watchdog entirely, but that doesn't
> work on all hardware I believe.
> 
> Anyway, I was posting on the watchdog mailing list to see if anyone had any
> ideas that might help.  And if my above idea to kick the watchdog before
> jumping into the kdump kernel seems ok, then an api would need to be
> developed.
> 
> I am willing to do any coding and testing necessary, but before I did, I
> wanted help to get a direction to go in first.
> 
> Thoughts?

Seems like the appropriate thing to do is to call all the
reboot notifiers that each watchdog registers.
Since one is not doingn a full SYS_RESTART (SYS_DOWN) though,
i.e. not running through the BIOS code again,
it might be worth having a different SYS_JUMP code in notifier.h
that would allow you to kick rather than stop the watchdogs
as the reboot notifiers generally do at the moment.
I think it would be important not to stop the watchdog if possible,
given the large amount of logic that's going to be executed
after the jump.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: watchdogs and kdump
  2011-10-27 21:43 ` Pádraig Brady
@ 2011-10-28 13:39   ` Don Zickus
  0 siblings, 0 replies; 7+ messages in thread
From: Don Zickus @ 2011-10-28 13:39 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: linux-watchdog, kexec, vgoyal, amwang

On Thu, Oct 27, 2011 at 10:43:58PM +0100, Pádraig Brady wrote:
> On 10/27/2011 09:30 PM, Don Zickus wrote:
> > Hi,
> > 
> > I was assisting a customer the other day debugging a kdump[1] problem, when we
> > noticed the real problem was the hardware watchdog was firing and
> > rebooting the box.
> > 
> > Of course, this can be inconvienant if the panic happens right before the
> > watchdog is supposed to be kicked, leading to a spontaneous reboot before
> > the second kernel finishes booting and loading the watchdog module.
> > 
> > I was trying to think of a way to solve this and thought, one way to
> > minimize the problem is to kick the watchdog before we jump into the kdump
> > kernel.  Another way is to disable the watchdog entirely, but that doesn't
> > work on all hardware I believe.
> > 
> > Anyway, I was posting on the watchdog mailing list to see if anyone had any
> > ideas that might help.  And if my above idea to kick the watchdog before
> > jumping into the kdump kernel seems ok, then an api would need to be
> > developed.
> > 
> > I am willing to do any coding and testing necessary, but before I did, I
> > wanted help to get a direction to go in first.
> > 
> > Thoughts?
> 
> Seems like the appropriate thing to do is to call all the
> reboot notifiers that each watchdog registers.
> Since one is not doingn a full SYS_RESTART (SYS_DOWN) though,
> i.e. not running through the BIOS code again,
> it might be worth having a different SYS_JUMP code in notifier.h
> that would allow you to kick rather than stop the watchdogs
> as the reboot notifiers generally do at the moment.

That is an interesting idea.  Not sure if calling a blocking notifier in
the kdump path would be acceptable to the kexec folks.  Then again using
the reboot notifier in the panic path may not be a good idea either, it
might lead to false expectations. :-/

> I think it would be important not to stop the watchdog if possible,
> given the large amount of logic that's going to be executed
> after the jump.

I agree.  Especially since kdump is still not 100% reliable.

Thanks for the feedback!

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: watchdogs and kdump
  2011-10-27 20:30 watchdogs and kdump Don Zickus
  2011-10-27 21:43 ` Pádraig Brady
@ 2011-10-28 15:46 ` Alejandro Cabrera
  2011-10-28 15:48   ` Don Zickus
  1 sibling, 1 reply; 7+ messages in thread
From: Alejandro Cabrera @ 2011-10-28 15:46 UTC (permalink / raw)
  To: Don Zickus, linux-watchdog

Hi

I dont know kjump :), but seeing it's description I think that you could 
use a temporal thread executed in the context of kdump that ping the 
watchdog at certain intervals like watchdogd does at user-space.

Regards
Alejandro


On 10/27/2011 4:30 PM, Don Zickus wrote:
> Hi,
>
> I was assisting a customer the other day debugging a kdump[1] problem, when we
> noticed the real problem was the hardware watchdog was firing and
> rebooting the box.
>
> Of course, this can be inconvienant if the panic happens right before the
> watchdog is supposed to be kicked, leading to a spontaneous reboot before
> the second kernel finishes booting and loading the watchdog module.
>
> I was trying to think of a way to solve this and thought, one way to
> minimize the problem is to kick the watchdog before we jump into the kdump
> kernel.  Another way is to disable the watchdog entirely, but that doesn't
> work on all hardware I believe.
>
> Anyway, I was posting on the watchdog mailing list to see if anyone had any
> ideas that might help.  And if my above idea to kick the watchdog before
> jumping into the kdump kernel seems ok, then an api would need to be
> developed.
>
> I am willing to do any coding and testing necessary, but before I did, I
> wanted help to get a direction to go in first.
>
> Thoughts?
>
> Cheers,
> Don
>
> [1] - I am ignorantly assuming everyone knows what kdump is.  Kdumping is
> the ability to jump into a previously loaded kernel in the case of a
> panic.  This kdump (second) kernel would run in reserved memory, copy the
> first kernel's memory to a file and save it to a pre-determined location.
> There is no system reboot in between the first and second kernel, so no
> chance for the watchdog to disarm itself.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Participe en Universidad 2012, del 13 al 17 de febrero de 2012.
Habana, Cuba: http://www.congresouniversidad.cu
Consulte la enciclopedia colaborativa cubana. http://www.ecured.cu

Participe en el Segundo Congreso Medio Ambiente Construido y 
Desarrollo Sustentable (MACDES 2011) del 6 al 9 de diciembre de 2011, 
Hotel Nacional, Habana, Cuba: http://macdes.cujae.edu.cu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: watchdogs and kdump
  2011-10-28 15:46 ` Alejandro Cabrera
@ 2011-10-28 15:48   ` Don Zickus
  2011-10-28 16:13     ` Alejandro Cabrera
  0 siblings, 1 reply; 7+ messages in thread
From: Don Zickus @ 2011-10-28 15:48 UTC (permalink / raw)
  To: Alejandro Cabrera; +Cc: linux-watchdog

On Fri, Oct 28, 2011 at 11:46:30AM -0400, Alejandro Cabrera wrote:
> Hi
> 
> I dont know kjump :), but seeing it's description I think that you
> could use a temporal thread executed in the context of kdump that
> ping the watchdog at certain intervals like watchdogd does at
> user-space.

Sure.  Add something like watchdogd isn't difficult.  My problem is
getting enough time to boot the second kernel to run that daemon.
Depending on when the watchdog was last kicked, the machine may reboot
while trying to initialize the cpu in the second kernel. :-(

Cheers,
Don

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: watchdogs and kdump
  2011-10-28 15:48   ` Don Zickus
@ 2011-10-28 16:13     ` Alejandro Cabrera
  2011-10-28 16:22       ` Don Zickus
  0 siblings, 1 reply; 7+ messages in thread
From: Alejandro Cabrera @ 2011-10-28 16:13 UTC (permalink / raw)
  To: Don Zickus, linux-watchdog

On 10/28/2011 11:48 AM, Don Zickus wrote:
> On Fri, Oct 28, 2011 at 11:46:30AM -0400, Alejandro Cabrera wrote:
>> Hi
>>
>> I dont know kjump :), but seeing it's description I think that you
>> could use a temporal thread executed in the context of kdump that
>> ping the watchdog at certain intervals like watchdogd does at
>> user-space.
> Sure.  Add something like watchdogd isn't difficult.  My problem is
> getting enough time to boot the second kernel to run that daemon.
> Depending on when the watchdog was last kicked, the machine may reboot
> while trying to initialize the cpu in the second kernel. :-(

You can create the daemon in kdump before the second kernel boots and 
manage that daemon as a kdump thread.

While the second kernel is booting kdump will act as the owner of the 
watchdog device and when the second kernel is stable, kdump passes the 
control  of it to the watchdog kernel device driver.
If is not problem, that two threads (kdump daemon and second kernel 
watchdog driver) access to the device simultaneously and without sync (I 
think IMMO that for pinging the watchdog it is not a issue) you can 
bypass the part that second kernel wdt drivers waits a notification from 
kdump to start working with the watchdog device.

Regards
Alejandro


Participe en Universidad 2012, del 13 al 17 de febrero de 2012.
Habana, Cuba: http://www.congresouniversidad.cu
Consulte la enciclopedia colaborativa cubana. http://www.ecured.cu

Participe en el Segundo Congreso Medio Ambiente Construido y 
Desarrollo Sustentable (MACDES 2011) del 6 al 9 de diciembre de 2011, 
Hotel Nacional, Habana, Cuba: http://macdes.cujae.edu.cu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: watchdogs and kdump
  2011-10-28 16:13     ` Alejandro Cabrera
@ 2011-10-28 16:22       ` Don Zickus
  0 siblings, 0 replies; 7+ messages in thread
From: Don Zickus @ 2011-10-28 16:22 UTC (permalink / raw)
  To: Alejandro Cabrera; +Cc: linux-watchdog

On Fri, Oct 28, 2011 at 12:13:06PM -0400, Alejandro Cabrera wrote:
> On 10/28/2011 11:48 AM, Don Zickus wrote:
> >On Fri, Oct 28, 2011 at 11:46:30AM -0400, Alejandro Cabrera wrote:
> >>Hi
> >>
> >>I dont know kjump :), but seeing it's description I think that you
> >>could use a temporal thread executed in the context of kdump that
> >>ping the watchdog at certain intervals like watchdogd does at
> >>user-space.
> >Sure.  Add something like watchdogd isn't difficult.  My problem is
> >getting enough time to boot the second kernel to run that daemon.
> >Depending on when the watchdog was last kicked, the machine may reboot
> >while trying to initialize the cpu in the second kernel. :-(
> 
> You can create the daemon in kdump before the second kernel boots
> and manage that daemon as a kdump thread.

Unfortunately, you can't do that as only one kernel runs at any given
time.  When the first kernel panics, it does some quick and dirty
shutdowns of platform specific chips, forces all the cpus into a halt
state and then longjmps into the second kernel.

Cheers,
Don

> 
> While the second kernel is booting kdump will act as the owner of
> the watchdog device and when the second kernel is stable, kdump
> passes the control  of it to the watchdog kernel device driver.
> If is not problem, that two threads (kdump daemon and second kernel
> watchdog driver) access to the device simultaneously and without
> sync (I think IMMO that for pinging the watchdog it is not a issue)
> you can bypass the part that second kernel wdt drivers waits a
> notification from kdump to start working with the watchdog device.
> 
> Regards
> Alejandro
> 
> 
> Participe en Universidad 2012, del 13 al 17 de febrero de 2012.
> Habana, Cuba: http://www.congresouniversidad.cu
> Consulte la enciclopedia colaborativa cubana. http://www.ecured.cu
> 
> Participe en el Segundo Congreso Medio Ambiente Construido y
> Desarrollo Sustentable (MACDES 2011) del 6 al 9 de diciembre de
> 2011, Hotel Nacional, Habana, Cuba: http://macdes.cujae.edu.cu

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-10-28 16:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-27 20:30 watchdogs and kdump Don Zickus
2011-10-27 21:43 ` Pádraig Brady
2011-10-28 13:39   ` Don Zickus
2011-10-28 15:46 ` Alejandro Cabrera
2011-10-28 15:48   ` Don Zickus
2011-10-28 16:13     ` Alejandro Cabrera
2011-10-28 16:22       ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).