public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* iTCO_wdt watchdog on Asus P10S-WS motherboard FREEZES MOTHERBOARD COMPLETELY
@ 2016-09-08 17:01 David Madore
  2016-09-20 12:50 ` Mika Westerberg
  0 siblings, 1 reply; 4+ messages in thread
From: David Madore @ 2016-09-08 17:01 UTC (permalink / raw)
  To: Linux Kernel mailing-list

TL;DR: the iTCO_wdt watchdog on the Asus P10S-WS motherboard, instead
of rebooting the machine, places the motherboard in a completely
nonfunctional state, from which it can be revived only by a hard power
cycle.  I suspect this is a BIOS bug: seeking advice on how/where to
report this, and what to do generally.  Maybe Linux can work around?


Dear list,

I have an Asus P10S-WS motherboard (Intel C236 chipset).  I have been
trying to get the iTCO_wdt hardware watchdog to work (I have been
successfully using this driver with similar Intel chipset based Asus
motherboards before, and I know it to work reliably).  I am using
Linux 4.7.3.

I trigger a reboot by killing (with kill -9) the wd_keepalive daemon
once it has opened the watchdog device.

Sadly, it appears that on this motherboard, the watchdog does not
reboot the machine (or at least, does not successfully reboot it).
Instead, the machine enters a "frozen" state (fans spinning, screen
black, all peripherals unresponsive) from which it cannot be woken up
by pressing the reset button, or even the power button twice (the
first press does turn the machine off, but it returns to the same
nonfunctional state after power on).  Instead, power has to be cut
completely, at the power supply level.

In this nonfunctional state, the Asus POST status display shows the
number "62", which according to the motherboard manual is the code for
"installation of the PCH runtime services" (I have no idea of what
that means).

I suspect that this is a BIOS ^W UEFI bug and in no way Linux's fault.
It could also be a hardware problem, a chipset bug, or something else.
And even if it is a firmware bug, it is conceivable that there is a
way to work around the problem from Linux.  So I ask for guidance from
the wisdom of this list:

* Is there something Linux can do about the problem?

* Is there a chance some kernel developer knows someone at Asus and
  can bring this problem to their attention?

* Can someone report success using the iTCO_wdt watchdog with other
  motherboards having the same Intel C236 chipset?  (Note: for it to
  work, the i2c_smbus module needs to be loaded: it took me a long
  time to figure out.)

* Is all hope lost for my motherboard?  (I badly need a hardware
  watchdog: if there is no way to get it to work on this motherboard,
  I will need to buy a new one.)

Any suggestions are welcome (or even words of comfort :-).

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: iTCO_wdt watchdog on Asus P10S-WS motherboard FREEZES MOTHERBOARD COMPLETELY
  2016-09-08 17:01 iTCO_wdt watchdog on Asus P10S-WS motherboard FREEZES MOTHERBOARD COMPLETELY David Madore
@ 2016-09-20 12:50 ` Mika Westerberg
  2016-09-21 12:10   ` David Madore
  0 siblings, 1 reply; 4+ messages in thread
From: Mika Westerberg @ 2016-09-20 12:50 UTC (permalink / raw)
  To: David Madore; +Cc: Linux Kernel mailing-list

On Thu, Sep 08, 2016 at 07:01:09PM +0200, David Madore wrote:
> TL;DR: the iTCO_wdt watchdog on the Asus P10S-WS motherboard, instead
> of rebooting the machine, places the motherboard in a completely
> nonfunctional state, from which it can be revived only by a hard power
> cycle.  I suspect this is a BIOS bug: seeking advice on how/where to
> report this, and what to do generally.  Maybe Linux can work around?
> 
> 
> Dear list,
> 
> I have an Asus P10S-WS motherboard (Intel C236 chipset).  I have been
> trying to get the iTCO_wdt hardware watchdog to work (I have been
> successfully using this driver with similar Intel chipset based Asus
> motherboards before, and I know it to work reliably).  I am using
> Linux 4.7.3.
> 
> I trigger a reboot by killing (with kill -9) the wd_keepalive daemon
> once it has opened the watchdog device.
> 
> Sadly, it appears that on this motherboard, the watchdog does not
> reboot the machine (or at least, does not successfully reboot it).
> Instead, the machine enters a "frozen" state (fans spinning, screen
> black, all peripherals unresponsive) from which it cannot be woken up
> by pressing the reset button, or even the power button twice (the
> first press does turn the machine off, but it returns to the same
> nonfunctional state after power on).  Instead, power has to be cut
> completely, at the power supply level.
> 
> In this nonfunctional state, the Asus POST status display shows the
> number "62", which according to the motherboard manual is the code for
> "installation of the PCH runtime services" (I have no idea of what
> that means).
> 
> I suspect that this is a BIOS ^W UEFI bug and in no way Linux's fault.
> It could also be a hardware problem, a chipset bug, or something else.
> And even if it is a firmware bug, it is conceivable that there is a
> way to work around the problem from Linux.  So I ask for guidance from
> the wisdom of this list:
> 
> * Is there something Linux can do about the problem?
> 
> * Is there a chance some kernel developer knows someone at Asus and
>   can bring this problem to their attention?
> 
> * Can someone report success using the iTCO_wdt watchdog with other
>   motherboards having the same Intel C236 chipset?  (Note: for it to
>   work, the i2c_smbus module needs to be loaded: it took me a long
>   time to figure out.)
> 
> * Is all hope lost for my motherboard?  (I badly need a hardware
>   watchdog: if there is no way to get it to work on this motherboard,
>   I will need to buy a new one.)
> 
> Any suggestions are welcome (or even words of comfort :-).

Does the machine have WDAT ACPI table (see /sys/firmware/acpi/tables/*)?
If it does, you can try the new WDAT watchdog driver instead [1]. It
still uses the same hardware, though but via set of instructions
provided by the BIOS that should work (given the vendor has tested
it on Windows).

[1] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1230607.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: iTCO_wdt watchdog on Asus P10S-WS motherboard FREEZES MOTHERBOARD COMPLETELY
  2016-09-20 12:50 ` Mika Westerberg
@ 2016-09-21 12:10   ` David Madore
  2016-09-21 13:24     ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 4+ messages in thread
From: David Madore @ 2016-09-21 12:10 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Linux Kernel mailing-list

On Tue, Sep 20, 2016 at 03:50:09PM +0300, Mika Westerberg wrote:
> Does the machine have WDAT ACPI table (see /sys/firmware/acpi/tables/*)?
> If it does, you can try the new WDAT watchdog driver instead [1]. It
> still uses the same hardware, though but via set of instructions
> provided by the BIOS that should work (given the vendor has tested
> it on Windows).
> 
> [1] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1230607.html

Thanks for pointing this out.  My motherboard's BIOS does not have
this ACPI table, unfortunately, but it's at least good to know that
some do, and take the hardware watchdog seriously.

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: iTCO_wdt watchdog on Asus P10S-WS motherboard FREEZES MOTHERBOARD COMPLETELY
  2016-09-21 12:10   ` David Madore
@ 2016-09-21 13:24     ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 4+ messages in thread
From: Henrique de Moraes Holschuh @ 2016-09-21 13:24 UTC (permalink / raw)
  To: David Madore; +Cc: Mika Westerberg, Linux Kernel mailing-list

On Wed, 21 Sep 2016, David Madore wrote:
> On Tue, Sep 20, 2016 at 03:50:09PM +0300, Mika Westerberg wrote:
> > Does the machine have WDAT ACPI table (see /sys/firmware/acpi/tables/*)?
> > If it does, you can try the new WDAT watchdog driver instead [1]. It
> > still uses the same hardware, though but via set of instructions
> > provided by the BIOS that should work (given the vendor has tested
> > it on Windows).
> > 
> > [1] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1230607.html
> 
> Thanks for pointing this out.  My motherboard's BIOS does not have
> this ACPI table, unfortunately, but it's at least good to know that
> some do, and take the hardware watchdog seriously.

The ones that take the hardware watchdog seriously will command the
power supply to do a power cycle when it triggers, which pretty much
cuts power to everything that is not hanging off the +5VSB (standby
power) line.

Let's just say that SSDs don't like it, at all.  Avoid at *all* *costs*.

I have been using the kernel's software watchdog on most systems because
of that: it just soft-reboots, which is good enough almost every time
and doesn't mess with the SSDs.

The proper fix is to have two levels of watchdogs, a soft reboot on time
T for the first level, and a power cycle on time 5T (to give the BIOS
enough of a time window to reset the second level watchdog during a soft
reboot).

-- 
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-09-21 13:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-08 17:01 iTCO_wdt watchdog on Asus P10S-WS motherboard FREEZES MOTHERBOARD COMPLETELY David Madore
2016-09-20 12:50 ` Mika Westerberg
2016-09-21 12:10   ` David Madore
2016-09-21 13:24     ` Henrique de Moraes Holschuh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox