From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frank van der Linden Subject: Re: [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN Date: Fri, 20 Mar 2009 17:46:19 -0600 Message-ID: <49C42ACB.9030207@Sun.COM> References: Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Return-path: In-reply-to: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Ke, Liping" Cc: "xen-devel@lists.xensource.com" , Keir Fraser List-Id: xen-devel@lists.xenproject.org Ke, Liping wrote: > The patches are for MCA enabling in XEN. Those patches based on AMD and SUN's MCA related jobs. > We have some discussions with AMD/SUN and did refinements from the last sending. Also we rebase it after > SUN's latest improvements. We will have following patches for recovery actions. This is a basic framework > for Intel MCA. I looked the patches over a little more closely, and merged them with my -unstable tree. I found a few minor issues: * some compile issues with printk format strings in the case of DEBUG and 32bit * in severity_scan, use mca_rdmsrl and mca_wrmsrl to work correctly for simulated errors using injection * in severity_scan, if the MSR values were injected for debugging purposes, don't panic but keep going, since the injected values will be lost at reboot, and this is just a simulated #MC anyway, there is no danger of losing state I'll attach a little patch to fix these issues. I haven't tested this patch yet, although the compile fixes have been "tested". Finally, one final question: > 2) When MCE# happens, all CPUs enter MCA context. The first CPU who read&clear the error MSR bank will be this > MCE# owner. Necessary locks/synchronization will help to judge the owner and select most severe error. Is it always true (at least, for Intel CPUs of family 6 and 15) that when a #MC happens, *all* CPUs will receive a #MC trap? I couldn't find this anywhere in the documentation. If this is true, I'll change the MCE injection code to simulate #MC on all CPUs in the case of an Intel system. - Frank