From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: IO-APIC line level race condition Date: Fri, 17 Feb 2012 13:27:24 +0000 Message-ID: <4F3E55BC.4050004@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------070701020402010101020007" Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "xen-devel@lists.xensource.com" , Keir Fraser , Jan Beulich List-Id: xen-devel@lists.xenproject.org --------------070701020402010101020007 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sadly, we have discovered another line level interrupt race condition in Xen-4.1. The result was that an outstanding un-eoi'd interrupt at the IO-APIC resulted in the mptsas controller offlining the root filesystem. This is now two separate IO-APIC bugs found recently. 1) Cisco C210 M2 server - EOI Broadcast Suppression, io_apci_ack=old 2) Dell R710 - No EOI Broadcast Suppression, io_apic_ack=new Both servers use IO-APIC version 0x20 and have an mptsas controller for their disks, using Legacy PCI line level interrupts. Workload on both servers appear to have more active vcpus Case 1 is now considered stable by the customer after I provided a private fix which caused Xen to never consider turning on EOI Broadcast Suppression. I have re-attached a patch which allows this problem to be "fixed" by specifying "ioapic_ack=new" on the command line, rather than requiring a patch and recompile of Xen. Case 2 has only been seen once (this morning) so we currently have no idea as to its reproducibility. However, given that this hardware is fairly common in our test infrastructure, i would say that it is fairly rare. With Case1 and the new patch, Case1 becomes the same as Case2 with respect to IO-APIC setup, presumably meaning that the Case2 bug still exists with Case1. I will start working on cleaning up the IO-APIC code as soon as I can, as reducing the unnecessary complexity should make race conditions like this easier to find. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com --------------070701020402010101020007 Content-Type: text/x-patch; name="ioapic-ack-fix.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ioapic-ack-fix.patch" # HG changeset patch # Parent 9ad1e42c341bc78463b6f6610a6300f75b535fbb IO-APIC: Prevent using EOI broadcast suppression if user specified ioapic_ack=new on the command line. Currently, if EOI broadcast suppression is advertised on the BSP LAPIC, Xen will discard any user specified option regarding IO-APIC ack mode. This patch introduces a check which prevents EOI Broadcast suppression from forcing the IO-APIC ack mode to old if the user has explicitly asked for the new ack mode on the command line. Signed-off-by: Andrew Cooper diff -r 9ad1e42c341b xen/arch/x86/apic.c --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -424,9 +424,16 @@ int __init verify_local_APIC(void) */ if ( reg0 & APIC_LVR_DIRECTED_EOI ) { - ioapic_ack_new = 0; - directed_eoi_enabled = 1; - printk("Enabled directed EOI with ioapic_ack_old on!\n"); + if ( ioapic_ack_new == 1 && ioapic_ack_forced == 1 ) + printk("Not enabling EOI broadcast suppression because " + "ioapic_ack_new has been forced on the command line\n"); + else + { + ioapic_ack_new = 0; + directed_eoi_enabled = 1; + printk("Enabled EOI broadcast suppression with ioapic_ack_old " + "on!\n"); + } } /* diff -r 9ad1e42c341b xen/arch/x86/io_apic.c --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -44,6 +44,7 @@ static DEFINE_SPINLOCK(ioapic_lock); bool_t __read_mostly skip_ioapic_setup; bool_t __read_mostly ioapic_ack_new = 1; +bool_t __read_mostly ioapic_ack_forced = 0; #ifndef sis_apic_bug /* @@ -1543,9 +1544,15 @@ static unsigned int startup_level_ioapic static void __init setup_ioapic_ack(char *s) { if ( !strcmp(s, "old") ) + { ioapic_ack_new = 0; + ioapic_ack_forced = 1; + } else if ( !strcmp(s, "new") ) + { ioapic_ack_new = 1; + ioapic_ack_forced = 1; + } else printk("Unknown ioapic_ack value specified: '%s'\n", s); } diff -r 9ad1e42c341b xen/include/asm-x86/io_apic.h --- a/xen/include/asm-x86/io_apic.h +++ b/xen/include/asm-x86/io_apic.h @@ -180,6 +180,7 @@ static inline void io_apic_modify(unsign /* 1 if "noapic" boot option passed */ extern bool_t skip_ioapic_setup; extern bool_t ioapic_ack_new; +extern bool_t ioapic_ack_forced; #ifdef CONFIG_ACPI_BOOT extern int io_apic_get_unique_id (int ioapic, int apic_id); --------------070701020402010101020007 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --------------070701020402010101020007--