From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Egger Subject: Re: Re: [RFC] RAS(Part II)--MCA enalbing in XEN Date: Mon, 2 Mar 2009 15:58:36 +0100 Message-ID: <200903021558.37334.Christoph.Egger@amd.com> References: <49A580C0.7050501@Sun.COM> Mime-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Jiang, Yunhong" Cc: "xen-devel@lists.xensource.com" , Gavin Maltby , "Ke, Liping" , "Frank.Vanderlinden@Sun.COM" , Keir Fraser , "Kleen, Andi" List-Id: xen-devel@lists.xenproject.org On Thursday 26 February 2009 03:16:29 Jiang, Yunhong wrote: > Christopher/Egger, thanks for reply very much, see comments below. > > >-----Original Message----- > >From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM] > >Sent: 2009=C4=EA2=D4=C226=C8=D5 1:33 > >To: Christoph Egger > >Cc: Jiang, Yunhong; Kleen, Andi; > >xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin Maltby > >Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN > > > >Christoph Egger wrote: > >> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote: > >>> So, Frank/Egger, can I assume followed are consensus currently? > >>> > >>> 1) MCE is handled by Xen HV totally, while guest's vMCE > > > >handler will only > > > >>> works for itself. > >>> 2) Xen present a virtual #MC to guest through MSR access > >>> emulation.(Xen will do the translation if needed). > >>> 3) Guest's unmodified > >>> MCE handler will handle the vMCE injected. > >>> 4) Dom0 will get all log/telemetry through hypercall. > >>> 5) The action taken by xen will be passed to dom0 through > > > >the telemetry > > > >>> mechanism. > >> > >> Mostly. Regarding 2) I want like to discuss first how to > > > >handle errors > > > >> impacting multiple contiguous physical pages which are non-contigous > >> in guest physical space. > >> > >> > >> > >> And I also want to discuss about how to do recovery actions requiring > >> PCI access. One example for this is > >> Shanghai's "L3 Cache Index Disable"-Feature. > >> Xen delegates PCI config space to Dom0 and > >> via PCI passthrough partly to DomU. > >> That means, if registers in PCI config space are independently > >> accessable by Xen, Dom0 and/or DomU, they can interfere with > > > >each other. > > > >> Therefore, we need to > >> a) clearly define who handles what and > >> b) define some rules based on a) > >> c) discuss how to handle Dom0/DomU going wild > >> and break the rules defined in b) > > > >I also agree on the approach in principle, but would like to see these > >points addressed. For non-contiguous pages, I suppose Xen > >could deliver > >multiple #vMCEs to the guest, split into contiguous parts. The > >vmce code > >seems to be set up to be able to do this. =46or virtual MCEs that is ok. But note, for unmodified guests, the MC hand= ler is written with the assumption that the CPU powers off when an #MCE happens before the handler cleared the MCIP bit in the MCG_STATUS MSR. > > For the contigous pages, I agree with Gavin that such contiguous page err= or > should be triggered as multiple #MC and so is ok. > > For PCI config space issue, Christoph, can you please share more > information on it (or provide some document as Frank suggested), like is = it > for CE (Correctable error or UC(UnCorrectable error), is it in PCI range = or > PCI-E range (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's > BDF caculated etc. Followed is some of my understanding. I would like to see a generic solution that works with any feature requiring access to the pci space rather a per-feature solution. > Firstly, if it is CE, Xen will do nothing and dom0 will take recovery > action. If it is UC, Xen will take action when all CPU is in SoftIRQ > context, and dom0 will not take action, so it should be ok. > > Secondly, in Xen environment, per my understanding, CPU is owned by Xen H= V, > so I'm not sure when dom0 disable L3 cache (if it is CE), should Xen be > aware or not. That is, should dom0 disable the cache directly, or it shou= ld > user hypercall to ask Xen do that. Keir can give us more suggestion. > > For item C, currently Xen/dom0 can both access configuration space, while > domU will do that through PCI_frontend/backend. Because PCI backend only > cover device assigned to domU, so we don't need worry about domU and dom0 > should be trusted. However, one thing left is, if this range is beyond > 0x100 (i.e. in pci-e range), we need add mmconfig support in Xen, although > it can be added simply. > > Thanks > -- Yunhong Jiang > > >As for the Shanghai feature: Christoph, are there any documents > >available on that feature? What kind of errors are delivered > >(corrected/correctable)? > > > >- Frank =2D-=20 =2D--to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632