From: "Alberto Munoz" <amunoz@vmware.com>
To: linux-ia64@vger.kernel.org
Subject: RE: [RFC] Better MCA recovery on IPF
Date: Mon, 03 Nov 2003 17:51:27 +0000 [thread overview]
Message-ID: <marc-linux-ia64-106788207929468@msgid-missing> (raw)
In-Reply-To: <marc-linux-ia64-106724227826901@msgid-missing>
When I was at HP (a good number of years ago, we (HP and Intel) spent a lot
of time trying to architect machine check behavior. Actually all of the
things you guys have been discussing were considered. Because I have not been
following up on this area in many years, I am not sure how much of the work
we did actually made it to official architecture documents, although I do
know that some of it did.
The main idea was that each layer of the machine check handling code will
either be able to transparently (to that layer) recover the error, or pass
the information up to the next layer (this information always included a flag
that would be set if the error was considered non-recoverable by the lower
layer, like for example a tag parity error on a dirty data cache line). The
layers we defined and the order in which they were executed when a machine
check abort occurred were PAL, SAL and the OS. I have seen some of this
information (although I have not checked how complete it is) in chapter 4 of
the SAL spec (Itanium Processor Family System Abstraction layer
Specification) and section 13.3.i of the architecture spec (Intel Itanium
Architecture Software Developers Manual, Volume 2: System Architecture). The
SAL_GET_STATE_INFO call was to be central to getting all this information to
the OS.
Bert Munoz
> -----Original Message-----
> From: Russ Anderson [mailto:rja@sgi.com]
> Sent: Monday, November 03, 2003 9:09 AM
> To: linux-ia64@vger.kernel.org
> Cc: rja@sgi.com
> Subject: Re: [RFC] Better MCA recovery on IPF
>
>
> Grant Grundler wrote:
> On Fri, Oct 31, 2003 at 02:09:12PM +0900, Hidetoshi Seto wrote:
> >> In the case of platform premising IPF, I think it is
> >> better to regard the Intel's Chipset as the de facto
> >> standard.
> >
> > hmm...given ia64 intel boxes I've played with have no error
> containment
> > and softfail on everything, I'm not sure that's a good choice.
> > Or has enough been published about the chipset to change those
> > behaviors?
>
> There are some errors on ia64 that are recoverable, with the right
> SW (PAL,SAL,Linux) and chipset support.
>
> There are some errors on ia64 that are not recoverable, but hopefully
> will be in newer cpu & chipset versions.
>
> A Matthias points out, some of the recovery should abstracted out
> in linux to hide the underlying hardware implementation.
>
> For example, in the case of an application hitting a memory
> uncorrectable on a multi-processor system, the MCA will be handled
> by PAL and SAL. If SAL can determine the failing HW physical address,
> it could pass that information up to linux. Linux could look at the
> physical address and figure out which application has that address
> mapped and kill the application, without crashing the system. Linux
> should also not allow that physical memory to be reused by any other
> process.
>
> Part of that recovery is platform specific (HW, PAL, SAL) but
> part of it is platform independent (linux converting the physical
> address, shooting the app, page handling).
>
> As for IPF being "the defacto standard", IPF is certainly the
> platform I'm interested in (hence posting to linux-ia64), but others
> will have their own preference. The platform independent parts of
> linux should have interfaces designed to work on any platform (duh).
> Actual implementation will likely be done on several different
> architectures.
>
> --
> Russ Anderson, OS RAS/Partitioning Project Lead
> SGI - Silicon Graphics Inc rja@sgi.com
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
next prev parent reply other threads:[~2003-11-03 17:51 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-27 8:07 [RFC] Better MCA recovery on IPF Hidetoshi Seto
2003-10-27 16:58 ` Matthias Fouquet-Lapar
2003-10-31 5:09 ` Hidetoshi Seto
2003-10-31 17:14 ` Grant Grundler
2003-11-01 6:39 ` Matthias Fouquet-Lapar
2003-11-01 8:38 ` Keith Owens
2003-11-02 13:33 ` Matthias Fouquet-Lapar
2003-11-03 17:09 ` Russ Anderson
2003-11-03 17:37 ` Matthias Fouquet-Lapar
2003-11-03 17:51 ` Alberto Munoz [this message]
2003-11-03 17:53 ` Alberto Munoz
2003-11-03 18:23 ` Jack Steiner
2003-11-03 18:42 ` Alberto Munoz
2003-11-03 19:28 ` Jack Steiner
2003-11-03 23:09 ` Alberto Munoz
2003-11-05 4:11 ` Greg Banks
2003-11-05 17:00 ` Luck, Tony
2003-11-05 17:14 ` Alberto Munoz
2003-11-05 17:30 ` Matthew Wilcox
2003-11-05 17:37 ` Alberto Munoz
2003-11-06 12:03 ` Hidetoshi Seto
2003-11-06 14:23 ` Matthias Fouquet-Lapar
2003-11-06 19:09 ` Luck, Tony
2003-11-07 9:58 ` Hidetoshi Seto
2003-11-07 10:52 ` Matthias Fouquet-Lapar
2003-11-08 1:15 ` Luck, Tony
2003-11-08 7:36 ` Matthias Fouquet-Lapar
2003-11-10 10:33 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=marc-linux-ia64-106788207929468@msgid-missing \
--to=amunoz@vmware.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox