From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frank Van Der Linden Subject: Re: Re: [RFC] RAS(Part II)--MCA enalbing in XEN Date: Mon, 16 Feb 2009 10:58:37 -0700 Message-ID: <4999A94D.5020500@Sun.COM> References: Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Return-path: In-reply-to: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: Gavin Maltby , Christoph Egger , "xen-devel@lists.xensource.com" , "Jiang, Yunhong" , "Ke, Liping" List-Id: xen-devel@lists.xenproject.org Keir Fraser wrote: > On 16/02/2009 14:18, "Christoph Egger" wrote: > > >> IMO, any design change should be discussed first and not changed >> silently, since this will confuse everyone and noone will know >> what is the right thing to do in Xen and in Dom0 and this >> in turn will lead to error prone, unmaintainable code in both >> Xen and in Dom0 >> > > I certainly think we should have a shared approach for x86 machine-check > handling, rather than completely different architectures for AMD and Intel. > Fortunately Sun are an interested and active third party regarding this > feature. I'll be interested in their opinion. > > -- Keir > > > Today is a holiday here in the US, so I have only taken a superficial look at the patches. However, my initial impression is that I share Christoph's concern. I like the original design, where the hypervisor deals with low-level information collection, passes it on to dom0, which then can make a high-level decision and instructs the hypervisor to take high-level action via a hypercall. The hypervisor does the actual MSR reads and writes, dom0 only acts on the values provided via hypercalls. We added the physcpuinfo hypercall to stay in this framework: get physical information needed for analysis, but don't access any registers directly. It seems that these new patches blur this distinction, especially the virtualized msr reads/writes. I am not sure what added value they have, except for being able to run an unmodified MCA handler. However, I think that any active MCA decision making should be centralized, and that centralized place would be dom0. Dom0 is already very much aware of the hypervisor, so I don't see the advantage of having an unmodified MCA handler there (our MCA handlers are virtually unmodified, it's just that the part where the telemetry is collected is inside Xen for the dom0 case). I also agree that different behavior for AMD and Intel chips would not be good. Perhaps the Intel folks can explain what the advantages of their approach are, and give some scenarios where there approach would be better? My first impression is that staying within the general framework as provided by Christoph's original work is the better option. - Frank