All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christoph Egger" <Christoph.Egger@amd.com>
To: xen-devel@lists.xensource.com
Cc: Gavin Maltby <Gavin.Maltby@sun.com>
Subject: Re: RFC: MCA/MCE concept
Date: Fri, 1 Jun 2007 10:11:35 +0200	[thread overview]
Message-ID: <200706011011.35336.Christoph.Egger@amd.com> (raw)
In-Reply-To: <907625E08839C4409CE5768403633E0B02561D81@sefsexmb1.amd.com>

On Wednesday 30 May 2007 17:03:55 Petersson, Mats wrote:
> [snip]
>
> > My feeling is that the hypervisor and dom0 own the hardware
> > and as such
> > all hardware fault management should reside there.  So we should never
> > deliver any form of #MC to a domU, nor should a poll of MCA state from
> > a domU ever observe valid state (e.g, make the RDMSR return 0).
> > So all handling, logging and diagnosis as well as hardware
> > response actions
> > (such as to deploy an online spare chip-select) are controlled
> > in the hypervisor/dom0 combination.  That seems a consistent
> > model - e.g.,
> > if a domU is migrated to another system it should not carry the
> > diagnosis state of the original system across etc, since that
> > belongs with
> > the one domain that cannot migrate.
>
> I agree entirely with this.
>
> > But that is not to say that (I think at a future phase) domU
> > should not
> > participate in a higher-level fault management function, at
> > the direction
> > of the hypervisor/dom0 combo.  For example if/when we can isolate an
> > uncorrectable error to a single domU we could forward such an event to
> > the affected domU if it has registered its ability/interest in such
> > events.  These won't be in the form of a faked #MC or anything,
> > instead they'd be some form of synchronous trap experienced when next
> > the affected domU context resumes on CPU.  The intelligent
> > domU handler
> > can then decide whether the domU must panic, whether it could simply
> > kill the affected process etc.  Those details are clearly
> > sketchy, but the
> > idea is to up-level the communication to a domU to be more like
> > "you're broken" rather than "here's a machine-level hardware error for
> > you to interpret and decide what to do with".
>
> Yes, this makes much more sense than forwarding #MC, as the guest would
> have a hard time to actually do anything really useful with this. As far
> as I know, most uncorrectable errors are near enough entirely fatal in
> most commercial non-Enterprise OS's anyways - e.g. in Windows XP or
> Server 2K3, it always ends in a blue-screen - which is hardly any better
> than the guest being "humanely euthenazed" by Dom0.
>
> I take it this would be some sort of hypercall (available through the
> regular PV-driver interface for HVM guests) to say "Let me know if I'm
> broken - trap on vector X".

For short, guests with a PV MCA driver will see a certain event
(assuming the event mechanism will be used for the notification)
and guests w/o a PV MCA driver will see a "General Protection Fault".
Is that right?

> --
> Mats
>
> > Gavin
> >

-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy

  reply	other threads:[~2007-06-01  8:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-29 15:32 RFC: MCA/MCE concept Christoph Egger
2007-05-30  7:19 ` Jan Beulich
2007-05-30  7:45   ` Christoph Egger
2007-05-30  8:49     ` Jan Beulich
2007-05-30  9:10       ` Christoph Egger
2007-05-30  9:59         ` Jan Beulich
2007-05-30 10:12           ` Christoph Egger
2007-05-30 13:50         ` Gavin Maltby
2007-05-30 15:03           ` Petersson, Mats
2007-06-01  8:11             ` Christoph Egger [this message]
2007-06-01  8:55               ` Petersson, Mats
2007-06-01  9:28                 ` Christoph Egger
2007-06-01  9:48                   ` Petersson, Mats
2007-06-01 10:57                     ` Gavin Maltby
2007-06-01 11:38                       ` Petersson, Mats
2007-06-04 16:16         ` Gavin Maltby
2007-06-06  9:28           ` Christoph Egger
2007-06-06 10:35             ` Gavin Maltby
2007-06-06 11:57               ` Christoph Egger
2007-06-06 12:25                 ` Gavin Maltby
2007-06-06 13:24                   ` Christoph Egger
2007-06-14 11:59             ` Gavin Maltby
2007-06-21  9:29               ` Christoph Egger
2007-06-21 10:15                 ` Petersson, Mats

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200706011011.35336.Christoph.Egger@amd.com \
    --to=christoph.egger@amd.com \
    --cc=Gavin.Maltby@sun.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.