From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ashwin Pankaj Subject: Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ? Date: Mon, 15 Feb 2010 19:49:53 +0530 Message-ID: <4B795809.5070304@lsi.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Hi , I am using Xen 3.4.1 - I see that sometimes when an MCE error occurs Xen panics due to a page fault with the following stack trace- http://pastebin.com/f30f67342 After some digging, probable culprit seems to be smp_cmci_interrupt > if (bs.errcnt && mctc != NULL) { > if (guest_enabled_event(dom0->vcpu[0], > <------------------------------------ here > VIRQ_MCA)) { > mctelem_commit(mctc); > printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through virq\n"); > send_guest_global_virq(dom0, VIRQ_MCA); > } else { > x86_mcinfo_dump(mctelem_dataptr(mctc)); > mctelem_dismiss(mctc); > } Looks like dom0 is NULL here ( vcpu[0] offset is 0x468). Is this possible? Other functions like mce_softirq() perform a NULL check on dom0 before accessing it's members .... > /* Step2: Send Log to DOM0 through vIRQ */ > if (dom0 && guest_enabled_event(dom0->vcpu[0], VIRQ_MCA)) { > printk(KERN_DEBUG "MCE: send MCE# to DOM0 through virq\n"); > send_guest_global_virq(dom0, VIRQ_MCA); > } Also note that, this system printed the MCE warning message( "(XEN) MCE: The hardware reports a non fatal, correctable incident occured on CPU 0" ) twice before panicing. So this code worked properly and entered x86_mcinfo_dump() atleast twice before panic. - Regards, Ashwin