public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* MCA Recovery for Enterprise Server
@ 2003-10-20  6:19 Hidetoshi Seto
  2003-10-20 17:02 ` Luck, Tony
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2003-10-20  6:19 UTC (permalink / raw)
  To: linux-ia64

Hi.

Now I am considering the way to apply Linux to Mission-Critical Enterprise
system on IPF (Itanium Processor Family) Server. Generally, Enterprise Server
requires high-reliability and high-availability, so I recognize following
features as fundamentals:

 - Recovery from device error
 - Recovery from intermittent corrected error (ex. Single-bit ECC error)
 - Structured Error logging

Aims of these are:

 - Keep stable.
 - Quick maintenance by early error detecting/declaring.


These features we working on are realized by functions that recover system
from hardware error, block suffered device by judging from CPU/Memory/Chipset
error severity. An outline is here:

 a) Fault Location and Error Classification
    Detect suffered unit and determine error severity on interrupted timing.

 b) Recovery from device error
    If error is local, disable suffered devices and block operations target to
    them. Else, reboot system immediately.

 c) Error Logging
    Structured error log helps maintenance engineer, remote maintenance system,
    and policed error observer.

 d) Error Prediction (from intermittent corrected error)
    To prevent expected error on sick component, check every corrected error
    and alert user to confirmed. This feature will be realized by daemon in
    user-land.

I am planning to offer a) to c) by the mid of March 2004, and d) by the end of
2005.


However, some of these features seem to depend on the platform implementation.
So I am designing a Platform-MCA (Machine Check Abort) handler for our IPF
machine.

Is there any guideline(s) to implement Platform-MCA handler?
I have found a symbol named PLATFORM_MCA_HANDLERS in /arch/ia64/kernel/mca.c,
but it seems not to work.

Also, if you know any technique for debugging MCA codes, please show me the
smart way.


Thanks.

------

H.Seto <seto.hidetoshi@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-10-27  6:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-20  6:19 MCA Recovery for Enterprise Server Hidetoshi Seto
2003-10-20 17:02 ` Luck, Tony
2003-10-20 20:42 ` David Mosberger
2003-10-27  6:44 ` Hidetoshi Seto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox