From: Frank van der Linden <Frank.Vanderlinden@Sun.COM>
To: "Jiang, Yunhong" <yunhong.jiang@intel.com>
Cc: Gavin Maltby <Gavin.Maltby@Sun.COM>,
Christoph Egger <Christoph.Egger@amd.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Keir Fraser <keir.fraser@eu.citrix.com>,
"Ke, Liping" <liping.ke@intel.com>
Subject: Re: Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Date: Fri, 20 Feb 2009 14:01:14 -0700 [thread overview]
Message-ID: <499F1A1A.2080808@Sun.COM> (raw)
In-Reply-To: <E2263E4A5B2284449EEBD0AAB751098401C7AACC2B@PDSMSX501.ccr.corp.intel.com>
I had some time to look over the patches in more detail and the previous
discussions that were referenced.
From your patches, what you write, and your slides, I gather the following:
* Corrected errors (found through polling and CMCI):
1) Collected error data (telemetry)
2) Inform dom0 through the VIRQ.
* Uncorrected errors:
1) See if any immediate action can be taken (CPU offline,
page retire)
2) Collect telemetry
3) Deliver vMCE to dom0 (and possibly domU)
I think it's fine that the hypervisor takes some immediate action in
some cases. It is good to do this as quickly as possible, and only the
hypervisor has all the information immediately available.
What would be needed for the Solaris framework, however, is to provide
information on what action was taken, along with the telemetry. As
Christoph noted, the Solaris FMA code checks, at bootup, if there were
components that previously had errors, and if so, it disables them again
to prevent further errors. To be able to do this, it needs the full
information not just on the error data, but also on any action taken by
the hypervisor, so that it can repeat this action. It may take some
modifications in the FMA code to account for the case where an action
has already been taken (to avoid trying to take conflicting action), but
I think that shouldn't be a big problem. Although I don't know that part
of our code very well.
The part that I still have doubts about, is the vMCE code. As far as I
can tell, it takes the information out of the MCA banks, and stores it,
per event, in a linked list. Per vMCE, the head of the list is taken and
used as an MSR context. The rdmsr instruction is trapped and redirected
to that information. It seems that the wrmsr instruction is accepted,
but has no effect (except that if the trap handler writes a value and
then reads it back again immediately, the values will be the same).
The main argument for the vMCE code seems to be that it allows existing
MCA handlers to be reused. However, I don't see the advantage in this.
Basically, it allows the handler to retrieve the MCA banks through plain
rdmsr instructions. Which is fine, but that's as far as it goes. Without
any additional information, that feature does not seem useful. wrmsr
instructions has no effect.
To take further action, the MCA code in dom0 (or a domU) needs to know
that it is running under Xen, and it needs to have detailed physical
information on the system. In other words, the existing code that can be
used is only the code that gathers some information. So, the only thing
that vMCE is good for, is that you can run unmodified error logging
code. But you can't interpret any of the error information further
without knowing more. Especially for a domU, which might not know
anything, this doesn't seem useful. What would the user of a domU do
with that information?
To recap, I think the part where Xen itself takes action is fine, with
some modifications. But I don't see any advantages in vMCE delivery,
unless I'm missing something of course..
- Frank
next prev parent reply other threads:[~2009-02-20 21:01 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-16 5:35 [RFC] RAS(Part II)--MCA enalbing in XEN Ke, Liping
2009-02-16 13:34 ` Christoph Egger
2009-02-16 14:18 ` Christoph Egger
2009-02-16 15:03 ` Keir Fraser
2009-02-16 15:19 ` Jiang, Yunhong
2009-02-16 17:58 ` Frank Van Der Linden
2009-02-17 5:50 ` Frank Van Der Linden
2009-02-17 6:44 ` Jiang, Yunhong
2009-02-17 6:53 ` Jiang, Yunhong
2009-02-17 6:41 ` Jiang, Yunhong
2009-02-18 18:05 ` Christoph Egger
2009-02-19 9:13 ` Jiang, Yunhong
2009-02-19 16:25 ` Christoph Egger
2009-02-20 2:53 ` Jiang, Yunhong
2009-02-20 21:01 ` Frank van der Linden [this message]
2009-02-23 9:01 ` Jiang, Yunhong
2009-02-24 18:53 ` Frank van der Linden
[not found] ` <2E9E6F5F5978EF44A8590E339E888CF988279945@irsmsx503.ger.corp.intel.com>
2009-02-24 19:07 ` Frank van der Linden
2009-02-25 2:26 ` Jiang, Yunhong
2009-02-25 10:37 ` Christoph Egger
[not found] ` <2E9E6F5F5978EF44A8590E339E888CF98827996D@irsmsx503.ger.corp.intel.com>
2009-02-24 20:47 ` Frank van der Linden
2009-02-25 2:25 ` Jiang, Yunhong
2009-02-25 12:19 ` Christoph Egger
2009-02-25 17:32 ` Frank van der Linden
2009-02-26 2:16 ` Jiang, Yunhong
2009-03-02 14:58 ` Christoph Egger
2009-03-02 16:15 ` Jiang, Yunhong
2009-03-02 5:51 ` Jiang, Yunhong
2009-03-02 14:51 ` Christoph Egger
2009-03-02 16:09 ` Jiang, Yunhong
2009-03-02 17:47 ` Frank van der Linden
2009-03-05 4:45 ` Jiang, Yunhong
2009-03-05 8:31 ` Jiang, Yunhong
2009-03-05 14:53 ` Christoph Egger
2009-03-05 15:19 ` Jiang, Yunhong
2009-03-05 17:28 ` Christoph Egger
2009-03-06 2:11 ` Jiang, Yunhong
2009-03-10 1:19 ` Jiang, Yunhong
2009-03-10 19:08 ` Christoph Egger
2009-03-12 15:52 ` Jiang, Yunhong
2009-03-16 16:27 ` Frank van der Linden
2009-02-25 22:30 ` Gavin Maltby
2009-02-25 2:31 ` Jiang, Yunhong
2009-02-25 10:57 ` Christoph Egger
2009-02-16 15:05 ` Jiang, Yunhong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=499F1A1A.2080808@Sun.COM \
--to=frank.vanderlinden@sun.com \
--cc=Christoph.Egger@amd.com \
--cc=Gavin.Maltby@Sun.COM \
--cc=keir.fraser@eu.citrix.com \
--cc=liping.ke@intel.com \
--cc=xen-devel@lists.xensource.com \
--cc=yunhong.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.