From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: linux-ia64@vger.kernel.org
Subject: MCA Recovery for Enterprise Server
Date: Mon, 20 Oct 2003 06:19:14 +0000 [thread overview]
Message-ID: <marc-linux-ia64-106663088408345@msgid-missing> (raw)
Hi.
Now I am considering the way to apply Linux to Mission-Critical Enterprise
system on IPF (Itanium Processor Family) Server. Generally, Enterprise Server
requires high-reliability and high-availability, so I recognize following
features as fundamentals:
- Recovery from device error
- Recovery from intermittent corrected error (ex. Single-bit ECC error)
- Structured Error logging
Aims of these are:
- Keep stable.
- Quick maintenance by early error detecting/declaring.
These features we working on are realized by functions that recover system
from hardware error, block suffered device by judging from CPU/Memory/Chipset
error severity. An outline is here:
a) Fault Location and Error Classification
Detect suffered unit and determine error severity on interrupted timing.
b) Recovery from device error
If error is local, disable suffered devices and block operations target to
them. Else, reboot system immediately.
c) Error Logging
Structured error log helps maintenance engineer, remote maintenance system,
and policed error observer.
d) Error Prediction (from intermittent corrected error)
To prevent expected error on sick component, check every corrected error
and alert user to confirmed. This feature will be realized by daemon in
user-land.
I am planning to offer a) to c) by the mid of March 2004, and d) by the end of
2005.
However, some of these features seem to depend on the platform implementation.
So I am designing a Platform-MCA (Machine Check Abort) handler for our IPF
machine.
Is there any guideline(s) to implement Platform-MCA handler?
I have found a symbol named PLATFORM_MCA_HANDLERS in /arch/ia64/kernel/mca.c,
but it seems not to work.
Also, if you know any technique for debugging MCA codes, please show me the
smart way.
Thanks.
------
H.Seto <seto.hidetoshi@jp.fujitsu.com>
next reply other threads:[~2003-10-20 6:19 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-20 6:19 Hidetoshi Seto [this message]
2003-10-20 17:02 ` MCA Recovery for Enterprise Server Luck, Tony
2003-10-20 20:42 ` David Mosberger
2003-10-27 6:44 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=marc-linux-ia64-106663088408345@msgid-missing \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox