From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: cpuidle and un-eoid interrupts at the local apic Date: Wed, 31 Jul 2013 10:47:44 +0100 Message-ID: <51F8DD40.2090207@citrix.com> References: <51A908CA.7050604@citrix.com> <51F8CB15.1070608@digithi.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------090602070107010302050706" Return-path: In-Reply-To: <51F8CB15.1070608@digithi.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Thimo E." Cc: Keir Fraser , Jan Beulich , Xen-devel List List-Id: xen-devel@lists.xenproject.org --------------090602070107010302050706 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit On 31/07/13 09:30, Thimo E. wrote: > Hello all, > > I have also a Haswell system. I am running XenServer 6.2 (with Xen > 4.1.5) on it and I am experiencing the same issue. Do you already have > a solution for this problem ? > > Best regards > Thimo Hi, We are still none the wiser on this issue. I have a debugging patch to get more information, but the problem hasn't reoccurred since. This is now 2 crashes on Xen 4.1 and a single crash on Xen 4.2 that I have seen. For the benefit of anyone else who runs over this issue in the meantime, the patch (against Xen-4.3) is attached. Thimo: I shall put a new version of the XenServer 6.2 Xen with the debugging patch on the forum thread. ~Andrew > > (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at > irq.c:1027^M > (XEN) ----[ Xen-4.1.5.debug x86_64 debug=y Not tainted ]----^M > (XEN) CPU: 1^M > (XEN) RIP: e008:[] do_IRQ+0x3ba/0x6d9^M > (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor^M > (XEN) rax: 0000000000000001 rbx: ffff83081f080f00 rcx: > ffff83081f05b340^M > (XEN) rdx: 0000000000000001 rsi: 000000000000002b rdi: > 0000000000000001^M > (XEN) rbp: ffff83081f057d88 rsp: ffff83081f057d18 r8: > ffff83081f05b63c^M > (XEN) r9: 000070044fb97100 r10: ffff8300b858c060 r11: > 000020f3f5a4dea5^M > (XEN) r12: 000000000000002b r13: ffff83081f004e80 r14: > 000000000000001d^M > (XEN) r15: 0000000000000002 cr0: 000000008005003b cr4: > 00000000001026f0^M > (XEN) cr3: 000000045915f000 cr2: 0000000000150008^M > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008^M > (XEN) Xen stack trace from rsp=ffff83081f057d18:^M > (XEN) 000000000000001d 000000000000001d ffff83081f080f00 > 0000000000000000^M > (XEN) 00000000ffffffea ffff83081f080f00 0000000000000000 > 0000000000000000^M > (XEN) ffffffffffffffff ffff83081f057f18 ffff83081f06bb00 > ffff83081f06bb90^M > (XEN) ffff8300b858c000 0000000000000002 00007cf7e0fa8247 > ffff82c480161a66^M > (XEN) 0000000000000002 ffff8300b858c000 ffff83081f06bb90 > ffff83081f06bb00^M > (XEN) ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5 > ffff8300b858c060^M > (XEN) 000070044fb97100 ffff83081f05bb80 0000000000007f40 > 0000000000000001^M > (XEN) 0000000000000000 000020f3c755a972 ffff83081f06bb90 > 0000002b00000000^M > (XEN) ffff82c4801a21f0 000000000000e008 0000000000000246 > ffff83081f057e48^M > (XEN) 000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4 > 000020f3f595c09c^M > (XEN) 000020f3f596987e ffff8306383e3010 ffff83081f05b100 > ffffffffffffffff^M > (XEN) 0000000000000001 0000000000000001 ffffffffffffffff > ffff83081f057f18^M > (XEN) 00000000802d4680 0000000000000000 0000000000000000 > ffff82c4802d4680^M > (XEN) 000002a80000024b ffff8300b8586000 ffff83081f057f18 > ffff8300b8586000^M > (XEN) ffff8300b858c000 ffff8300b858c000 0000000000000002 > ffff83081f057f10^M > (XEN) ffff82c48015a261 ffff82c480126ccd 0000000000000001 > ffff83081f057d18^M > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000^M > (XEN) 0000000000000000 0000000000000000 0000000000000246 > ffff88001a8093a0^M > (XEN) 0000000100885e0f 000000000000000f 0000000000000000 > ffffffff802063aa^M > (XEN) 0000000000000001 00000000deadbeef 00000000deadbeef > 0000010000000000^M > (XEN) Xen call trace:^M > (XEN) [] do_IRQ+0x3ba/0x6d9^M > (XEN) [] common_interrupt+0x26/0x30^M > (XEN) [] lapic_timer_nop+0x0/0x6^M > (XEN) [] idle_loop+0x48/0x59^M > (XEN) ^M > (XEN) ^M > (XEN) ****************************************^M > (XEN) Panic on CPU 1:^M > (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at > irq.c:1027^M > (XEN) ****************************************^M > (XEN) ^M > (XEN) Reboot in five seconds...^M > > Am 31.05.2013 22:32, schrieb Andrew Cooper: >> Recently our automated testing system has caught a curious assertion >> while testing Xen 4.1.5 on a HaswellDT system. >> >> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at >> irq.c:1030 >> (XEN) ----[ Xen-4.1.5 x86_64 debug=n Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[] do_IRQ+0x514/0x750 >> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor >> (XEN) rax: 000000000000002f rbx: ffff830249841e80 rcx: >> ffff82c4803127c0 >> (XEN) rdx: 0000000000000004 rsi: 0000000000000027 rdi: >> 0000000000000001 >> (XEN) rbp: 0000000000001e00 rsp: ffff82c4802bfd48 r8: >> ffff82c480312abc >> (XEN) r9: ffff8302498a5948 r10: 0000000000000009 r11: >> ffff8302498c6c80 >> (XEN) r12: ffff830243b07f50 r13: ffff8300a24f8000 r14: >> 00000af8373788e3 >> (XEN) r15: ffff830249841e80 cr0: 000000008005003b cr4: >> 00000000001026f0 >> (XEN) cr3: 00000002479e6000 cr2: 00000000e6d3c090 >> (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 >> (XEN) Xen stack trace from rsp=ffff82c4802bfd48: >> (XEN) ffff830249841eb4 ffff82c480312ec0 000000000000001e >> 0000001e00000000 >> (XEN) 0000000000000000 00000000498a5670 ffff830249841d80 >> ffff830249840080 >> (XEN) ffff830249841db4 0000000000000000 ffff8302498a55e0 >> ffff8302498a5670 >> (XEN) ffff8300a24f8000 00000af8373788e3 00000af83736b8ed >> ffff82c480162ca0 >> (XEN) 00000af83736b8ed 00000af8373788e3 ffff8300a24f8000 >> ffff8302498a5670 >> (XEN) ffff8302498a55e0 0000000000000000 ffff8302498c6c80 >> 0000000000000009 >> (XEN) ffff8302498a5948 ffff82c480313000 0000000000007f40 >> 0000000000000001 >> (XEN) 0000000000000000 0000000000000000 00000af80db652fd >> 0000002700000000 >> (XEN) ffff82c4801a50a0 000000000000e008 0000000000000246 >> ffff82c4802bfe78 >> (XEN) 0000000000000000 ffff8302498a5670 ffff82c4801a6a56 >> ffffffffffffffff >> (XEN) ffff830249818000 0000000000000000 ffff8300a24f8000 >> ffff82c480122c11 >> (XEN) 00000af839021119 0000000000000000 0000000000000000 >> 00000000802bff18 >> (XEN) 0000025c0000013b ffff82c4802e7580 ffff82c4802bff18 >> ffff8300a2838000 >> (XEN) ffff82c4802f61a0 ffff8300a24f8000 0000000000000002 >> 00000af837304b45 >> (XEN) ffff82c48015b67a 0000000000000000 0000000000000000 >> 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f8c >> 0000000000000001 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 >> 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f74 >> 0000000000000af8 >> (XEN) 0000000000000001 0000010000000000 00000000c01013a7 >> 0000000000000061 >> (XEN) 0000000000000246 00000000ee8a3f70 0000000000000069 >> 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [] do_IRQ+0x514/0x750 >> (XEN) 15[] common_interrupt+0x20/0x30 >> (XEN) 32[] lapic_timer_nop+0x0/0x10 >> (XEN) 38[] acpi_processor_idle+0x376/0x740 >> (XEN) 43[] do_block+0x71/0xd0 >> (XEN) 56[] idle_loop+0x1a/0x50 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at >> irq.c:1030 >> (XEN) **************************************** >> >> And the disassembly before the assertion: >> >> ffff82c48016b29f: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx >> ffff82c48016b2a6: 00 >> ffff82c48016b2a7: 0f b6 44 11 ff movzbl >> -0x1(%rcx,%rdx,1),%eax >> ffff82c48016b2ac: 39 c6 cmp %eax,%esi >> ffff82c48016b2ae: 0f 8f 5c ff ff ff jg >> ffff82c48016b210 >> ffff82c48016b2b4: 0f 0b ud2 >> >> >> Xen has been woken up by an interrupt of vector 0x27, but has a vector >> 0x2f on the top of the pending EOI stack for the local APIC. >> >> I have put in more debugging to dump the LAPIC state of the two >> interesting vectors and the IOAPIC state, but I have no idea if/when the >> problem might reoccur. >> >> My understanding of LAPIC priority leads me to think that Xen really >> shouldn't be woken up by a lower priority vector if a higher priority >> one is still un-eoi'd. There is not yet sufficient information to tell >> whether this is truely the case, or that Xen has simply gotten confused >> about which vectors it eoi'd. >> >> Having said that, we do keep line level interrupts un-eoi'd for extended >> periods while guests service the interrupt. Given that vectors are >> chosen at random, we could get into a situation where a line interrupt >> has a vector 0xdf and stays pending for 150ms (which I measured as a >> not-overly-uncommon mean-time-till-eoi for line level interrupt). This >> would starve any other guest interrupts for an extended period. >> >> Given directed-eoi support in the past few generations of processor, the >> requirement for the pending EOI stack has disappeared as far as I am >> aware. Would it be sensible idea in general to make use of the pending >> eoi stack conditional on not having/using directed EOI support? >> >> ~Andrew >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > --------------090602070107010302050706 Content-Type: text/x-patch; name="ca-107844-debug.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ca-107844-debug.patch" # HG changeset patch # Parent 3bc8894f281f3ee68406a565beb2f811c67c6b5e diff -r 3bc8894f281f xen/arch/x86/io_apic.c --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -1100,7 +1100,7 @@ static inline void UNEXPECTED_IO_APIC(vo { } -static void /*__init*/ __print_IO_APIC(void) +void /*__init*/ __print_IO_APIC(void) { int apic, i; union IO_APIC_reg_00 reg_00; diff -r 3bc8894f281f xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1115,6 +1115,8 @@ static void irq_guest_eoi_timer_fn(void spin_unlock_irqrestore(&desc->lock, flags); } +static void dump_irqs(unsigned char key); +void __print_IO_APIC(void); static void __do_IRQ_guest(int irq) { struct irq_desc *desc = irq_to_desc(irq); @@ -1137,7 +1139,36 @@ static void __do_IRQ_guest(int irq) if ( action->ack_type == ACKTYPE_EOI ) { sp = pending_eoi_sp(peoi); - ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); + if ( unlikely( !((sp == 0) || (peoi[sp-1].vector < vector)) )) + { + printk("**Pending EOI error\n"); + printk(" irq %d, vector 0x%x\n", irq, vector); + + for ( i = sp-1; i >= 0; --i ) + { + printk(" s[%d] irq %d, vec 0x%x, ready %u, " + "ISR %u, TMR %u, IRR %u\n", + i, peoi[i].irq, peoi[i].vector, peoi[i].ready, + apic_isr_read(peoi[i].vector), + apic_tmr_read(peoi[i].vector), + apic_irr_read(peoi[i].vector) ); + } + + printk("All LAPIC state:\n"); + printk("[vector] %8s %8s %8s\n", "ISR", "TMR", "IRR"); + for ( i = 0; i < APIC_ISR_NR; ++i ) + printk("[%02x:%0x2x] %08"PRIu32" %08"PRIu32" %08"PRIu32"\n", + (i * 32)+31, i*32, + apic_read(APIC_ISR + i*0x10), + apic_read(APIC_TMR + i*0x10), + apic_read(APIC_IRR + i*0x10) ); + + spin_unlock(&desc->lock); + dump_irqs('i'); + __print_IO_APIC(); + + panic("CA-107844"); + } ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); peoi[sp].irq = irq; peoi[sp].vector = vector; diff -r 3bc8894f281f xen/include/asm-x86/apic.h --- a/xen/include/asm-x86/apic.h +++ b/xen/include/asm-x86/apic.h @@ -152,6 +152,18 @@ static __inline bool_t apic_isr_read(u8 (vector & 0x1f)) & 1; } +static __inline bool_t apic_tmr_read(u8 vector) +{ + return (apic_read(APIC_TMR + ((vector & ~0x1f) >> 1)) >> + (vector & 0x1f)) & 1; +} + +static __inline bool_t apic_irr_read(u8 vector) +{ + return (apic_read(APIC_IRR + ((vector & ~0x1f) >> 1)) >> + (vector & 0x1f)) & 1; +} + static __inline u32 get_apic_id(void) /* Get the physical APIC id */ { u32 id = apic_read(APIC_ID); --------------090602070107010302050706 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------090602070107010302050706--