public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: linux-ia64@vger.kernel.org
Subject: MCA Recovery for Enterprise Server
Date: Mon, 20 Oct 2003 06:19:14 +0000	[thread overview]
Message-ID: <marc-linux-ia64-106663088408345@msgid-missing> (raw)

Hi.

Now I am considering the way to apply Linux to Mission-Critical Enterprise
system on IPF (Itanium Processor Family) Server. Generally, Enterprise Server
requires high-reliability and high-availability, so I recognize following
features as fundamentals:

 - Recovery from device error
 - Recovery from intermittent corrected error (ex. Single-bit ECC error)
 - Structured Error logging

Aims of these are:

 - Keep stable.
 - Quick maintenance by early error detecting/declaring.


These features we working on are realized by functions that recover system
from hardware error, block suffered device by judging from CPU/Memory/Chipset
error severity. An outline is here:

 a) Fault Location and Error Classification
    Detect suffered unit and determine error severity on interrupted timing.

 b) Recovery from device error
    If error is local, disable suffered devices and block operations target to
    them. Else, reboot system immediately.

 c) Error Logging
    Structured error log helps maintenance engineer, remote maintenance system,
    and policed error observer.

 d) Error Prediction (from intermittent corrected error)
    To prevent expected error on sick component, check every corrected error
    and alert user to confirmed. This feature will be realized by daemon in
    user-land.

I am planning to offer a) to c) by the mid of March 2004, and d) by the end of
2005.


However, some of these features seem to depend on the platform implementation.
So I am designing a Platform-MCA (Machine Check Abort) handler for our IPF
machine.

Is there any guideline(s) to implement Platform-MCA handler?
I have found a symbol named PLATFORM_MCA_HANDLERS in /arch/ia64/kernel/mca.c,
but it seems not to work.

Also, if you know any technique for debugging MCA codes, please show me the
smart way.


Thanks.

------

H.Seto <seto.hidetoshi@jp.fujitsu.com>


             reply	other threads:[~2003-10-20  6:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-20  6:19 Hidetoshi Seto [this message]
2003-10-20 17:02 ` MCA Recovery for Enterprise Server Luck, Tony
2003-10-20 20:42 ` David Mosberger
2003-10-27  6:44 ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=marc-linux-ia64-106663088408345@msgid-missing \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox