From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Xen-3.x fix pagefault in cmci handler Date: Thu, 1 Mar 2012 12:22:43 +0000 Message-ID: <4F4F6A13.8020703@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------000000080902030508090607" Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "keith.coleman@n2servers.com" , "xen-devel@lists.xensource.com" Cc: Keir Fraser List-Id: xen-devel@lists.xenproject.org --------------000000080902030508090607 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Hello, XenServer has just had a support escalation, which resulted in this fix. It appears that certain Intel CPUs (X56xx series) with hyperthreading enabled have a race condition between an SMI (from the legacy USB support in this case) and an AP (the other hyperthread) between SIPI/INIT and trying to execute the first trampoline instruction. The race condition results in a CMCI complaining about a parity error in an instruction cache is delivered to Xen as soon as the SMI handler returns. On Xen-3.x, the CMCI handler dereferences the dom0 pointer, resulting in a pagefault at this point in boot. Intel are working to track down why the CMCI is occurring in the first place, but as it is apparently benign, Xen should continue to boot regardless. This error does not affect Xen-4.x, as it correctly checks that the dom0 pointer is not null before trying to use it. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com --------------000000080902030508090607 Content-Type: text/x-patch; name="cmci-fix-pagefault.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cmci-fix-pagefault.patch" # HG changeset patch # Parent ac68ad6fe4b779ca0b894ca3845b66662dd2dd9c CMCI: Fix pagefault The CMCI handler blindly dereferences the dom0 pointer. If a CMCI occurs during early boot, this results in a page fault and reboot. Check the dom0 pointer first. If it is null, then the CMCI information is dumped onto the Xen console, and booting continues happily. Signed-off-by: Andrew Cooper diff -r ac68ad6fe4b7 xen/arch/x86/cpu/mcheck/mce_intel.c --- a/xen/arch/x86/cpu/mcheck/mce_intel.c +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c @@ -705,7 +705,7 @@ fastcall void smp_cmci_interrupt(struct MCA_CMCI_HANDLER, __get_cpu_var(mce_banks_owned), &bs); if (bs.errcnt && mctc != NULL) { - if (guest_enabled_event(dom0->vcpu[0], VIRQ_MCA)) { + if (dom0 && guest_enabled_event(dom0->vcpu[0], VIRQ_MCA)) { mctelem_commit(mctc); printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through virq\n"); send_guest_global_virq(dom0, VIRQ_MCA); --------------000000080902030508090607 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------000000080902030508090607--