From: Andrew Cooper <andrew.cooper3@citrix.com>
To: "Thimo E." <abc@digithi.de>
Cc: Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
Xen-devel List <xen-devel@lists.xen.org>
Subject: Re: cpuidle and un-eoid interrupts at the local apic
Date: Wed, 31 Jul 2013 10:47:44 +0100 [thread overview]
Message-ID: <51F8DD40.2090207@citrix.com> (raw)
In-Reply-To: <51F8CB15.1070608@digithi.de>
[-- Attachment #1: Type: text/plain, Size: 9270 bytes --]
On 31/07/13 09:30, Thimo E. wrote:
> Hello all,
>
> I have also a Haswell system. I am running XenServer 6.2 (with Xen
> 4.1.5) on it and I am experiencing the same issue. Do you already have
> a solution for this problem ?
>
> Best regards
> Thimo
Hi,
We are still none the wiser on this issue. I have a debugging patch to
get more information, but the problem hasn't reoccurred since. This is
now 2 crashes on Xen 4.1 and a single crash on Xen 4.2 that I have seen.
For the benefit of anyone else who runs over this issue in the meantime,
the patch (against Xen-4.3) is attached.
Thimo: I shall put a new version of the XenServer 6.2 Xen with the
debugging patch on the forum thread.
~Andrew
>
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
> irq.c:1027^M
> (XEN) ----[ Xen-4.1.5.debug x86_64 debug=y Not tainted ]----^M
> (XEN) CPU: 1^M
> (XEN) RIP: e008:[<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
> (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor^M
> (XEN) rax: 0000000000000001 rbx: ffff83081f080f00 rcx:
> ffff83081f05b340^M
> (XEN) rdx: 0000000000000001 rsi: 000000000000002b rdi:
> 0000000000000001^M
> (XEN) rbp: ffff83081f057d88 rsp: ffff83081f057d18 r8:
> ffff83081f05b63c^M
> (XEN) r9: 000070044fb97100 r10: ffff8300b858c060 r11:
> 000020f3f5a4dea5^M
> (XEN) r12: 000000000000002b r13: ffff83081f004e80 r14:
> 000000000000001d^M
> (XEN) r15: 0000000000000002 cr0: 000000008005003b cr4:
> 00000000001026f0^M
> (XEN) cr3: 000000045915f000 cr2: 0000000000150008^M
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008^M
> (XEN) Xen stack trace from rsp=ffff83081f057d18:^M
> (XEN) 000000000000001d 000000000000001d ffff83081f080f00
> 0000000000000000^M
> (XEN) 00000000ffffffea ffff83081f080f00 0000000000000000
> 0000000000000000^M
> (XEN) ffffffffffffffff ffff83081f057f18 ffff83081f06bb00
> ffff83081f06bb90^M
> (XEN) ffff8300b858c000 0000000000000002 00007cf7e0fa8247
> ffff82c480161a66^M
> (XEN) 0000000000000002 ffff8300b858c000 ffff83081f06bb90
> ffff83081f06bb00^M
> (XEN) ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5
> ffff8300b858c060^M
> (XEN) 000070044fb97100 ffff83081f05bb80 0000000000007f40
> 0000000000000001^M
> (XEN) 0000000000000000 000020f3c755a972 ffff83081f06bb90
> 0000002b00000000^M
> (XEN) ffff82c4801a21f0 000000000000e008 0000000000000246
> ffff83081f057e48^M
> (XEN) 000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4
> 000020f3f595c09c^M
> (XEN) 000020f3f596987e ffff8306383e3010 ffff83081f05b100
> ffffffffffffffff^M
> (XEN) 0000000000000001 0000000000000001 ffffffffffffffff
> ffff83081f057f18^M
> (XEN) 00000000802d4680 0000000000000000 0000000000000000
> ffff82c4802d4680^M
> (XEN) 000002a80000024b ffff8300b8586000 ffff83081f057f18
> ffff8300b8586000^M
> (XEN) ffff8300b858c000 ffff8300b858c000 0000000000000002
> ffff83081f057f10^M
> (XEN) ffff82c48015a261 ffff82c480126ccd 0000000000000001
> ffff83081f057d18^M
> (XEN) 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000^M
> (XEN) 0000000000000000 0000000000000000 0000000000000246
> ffff88001a8093a0^M
> (XEN) 0000000100885e0f 000000000000000f 0000000000000000
> ffffffff802063aa^M
> (XEN) 0000000000000001 00000000deadbeef 00000000deadbeef
> 0000010000000000^M
> (XEN) Xen call trace:^M
> (XEN) [<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
> (XEN) [<ffff82c480161a66>] common_interrupt+0x26/0x30^M
> (XEN) [<ffff82c4801a21f0>] lapic_timer_nop+0x0/0x6^M
> (XEN) [<ffff82c48015a261>] idle_loop+0x48/0x59^M
> (XEN) ^M
> (XEN) ^M
> (XEN) ****************************************^M
> (XEN) Panic on CPU 1:^M
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
> irq.c:1027^M
> (XEN) ****************************************^M
> (XEN) ^M
> (XEN) Reboot in five seconds...^M
>
> Am 31.05.2013 22:32, schrieb Andrew Cooper:
>> Recently our automated testing system has caught a curious assertion
>> while testing Xen 4.1.5 on a HaswellDT system.
>>
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1030
>> (XEN) ----[ Xen-4.1.5 x86_64 debug=n Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor
>> (XEN) rax: 000000000000002f rbx: ffff830249841e80 rcx:
>> ffff82c4803127c0
>> (XEN) rdx: 0000000000000004 rsi: 0000000000000027 rdi:
>> 0000000000000001
>> (XEN) rbp: 0000000000001e00 rsp: ffff82c4802bfd48 r8:
>> ffff82c480312abc
>> (XEN) r9: ffff8302498a5948 r10: 0000000000000009 r11:
>> ffff8302498c6c80
>> (XEN) r12: ffff830243b07f50 r13: ffff8300a24f8000 r14:
>> 00000af8373788e3
>> (XEN) r15: ffff830249841e80 cr0: 000000008005003b cr4:
>> 00000000001026f0
>> (XEN) cr3: 00000002479e6000 cr2: 00000000e6d3c090
>> (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c4802bfd48:
>> (XEN) ffff830249841eb4 ffff82c480312ec0 000000000000001e
>> 0000001e00000000
>> (XEN) 0000000000000000 00000000498a5670 ffff830249841d80
>> ffff830249840080
>> (XEN) ffff830249841db4 0000000000000000 ffff8302498a55e0
>> ffff8302498a5670
>> (XEN) ffff8300a24f8000 00000af8373788e3 00000af83736b8ed
>> ffff82c480162ca0
>> (XEN) 00000af83736b8ed 00000af8373788e3 ffff8300a24f8000
>> ffff8302498a5670
>> (XEN) ffff8302498a55e0 0000000000000000 ffff8302498c6c80
>> 0000000000000009
>> (XEN) ffff8302498a5948 ffff82c480313000 0000000000007f40
>> 0000000000000001
>> (XEN) 0000000000000000 0000000000000000 00000af80db652fd
>> 0000002700000000
>> (XEN) ffff82c4801a50a0 000000000000e008 0000000000000246
>> ffff82c4802bfe78
>> (XEN) 0000000000000000 ffff8302498a5670 ffff82c4801a6a56
>> ffffffffffffffff
>> (XEN) ffff830249818000 0000000000000000 ffff8300a24f8000
>> ffff82c480122c11
>> (XEN) 00000af839021119 0000000000000000 0000000000000000
>> 00000000802bff18
>> (XEN) 0000025c0000013b ffff82c4802e7580 ffff82c4802bff18
>> ffff8300a2838000
>> (XEN) ffff82c4802f61a0 ffff8300a24f8000 0000000000000002
>> 00000af837304b45
>> (XEN) ffff82c48015b67a 0000000000000000 0000000000000000
>> 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f8c
>> 0000000000000001
>> (XEN) 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f74
>> 0000000000000af8
>> (XEN) 0000000000000001 0000010000000000 00000000c01013a7
>> 0000000000000061
>> (XEN) 0000000000000246 00000000ee8a3f70 0000000000000069
>> 0000000000000000
>> (XEN) Xen call trace:
>> (XEN) [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>> (XEN) 15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
>> (XEN) 32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
>> (XEN) 38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
>> (XEN) 43[<ffff82c480122c11>] do_block+0x71/0xd0
>> (XEN) 56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1030
>> (XEN) ****************************************
>>
>> And the disassembly before the assertion:
>>
>> ffff82c48016b29f: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
>> ffff82c48016b2a6: 00
>> ffff82c48016b2a7: 0f b6 44 11 ff movzbl
>> -0x1(%rcx,%rdx,1),%eax
>> ffff82c48016b2ac: 39 c6 cmp %eax,%esi
>> ffff82c48016b2ae: 0f 8f 5c ff ff ff jg
>> ffff82c48016b210 <do_IRQ+0x470>
>> ffff82c48016b2b4: 0f 0b ud2
>>
>>
>> Xen has been woken up by an interrupt of vector 0x27, but has a vector
>> 0x2f on the top of the pending EOI stack for the local APIC.
>>
>> I have put in more debugging to dump the LAPIC state of the two
>> interesting vectors and the IOAPIC state, but I have no idea if/when the
>> problem might reoccur.
>>
>> My understanding of LAPIC priority leads me to think that Xen really
>> shouldn't be woken up by a lower priority vector if a higher priority
>> one is still un-eoi'd. There is not yet sufficient information to tell
>> whether this is truely the case, or that Xen has simply gotten confused
>> about which vectors it eoi'd.
>>
>> Having said that, we do keep line level interrupts un-eoi'd for extended
>> periods while guests service the interrupt. Given that vectors are
>> chosen at random, we could get into a situation where a line interrupt
>> has a vector 0xdf and stays pending for 150ms (which I measured as a
>> not-overly-uncommon mean-time-till-eoi for line level interrupt). This
>> would starve any other guest interrupts for an extended period.
>>
>> Given directed-eoi support in the past few generations of processor, the
>> requirement for the pending EOI stack has disappeared as far as I am
>> aware. Would it be sensible idea in general to make use of the pending
>> eoi stack conditional on not having/using directed EOI support?
>>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
[-- Attachment #2: ca-107844-debug.patch --]
[-- Type: text/x-patch, Size: 2919 bytes --]
# HG changeset patch
# Parent 3bc8894f281f3ee68406a565beb2f811c67c6b5e
diff -r 3bc8894f281f xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1100,7 +1100,7 @@ static inline void UNEXPECTED_IO_APIC(vo
{
}
-static void /*__init*/ __print_IO_APIC(void)
+void /*__init*/ __print_IO_APIC(void)
{
int apic, i;
union IO_APIC_reg_00 reg_00;
diff -r 3bc8894f281f xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1115,6 +1115,8 @@ static void irq_guest_eoi_timer_fn(void
spin_unlock_irqrestore(&desc->lock, flags);
}
+static void dump_irqs(unsigned char key);
+void __print_IO_APIC(void);
static void __do_IRQ_guest(int irq)
{
struct irq_desc *desc = irq_to_desc(irq);
@@ -1137,7 +1139,36 @@ static void __do_IRQ_guest(int irq)
if ( action->ack_type == ACKTYPE_EOI )
{
sp = pending_eoi_sp(peoi);
- ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
+ if ( unlikely( !((sp == 0) || (peoi[sp-1].vector < vector)) ))
+ {
+ printk("**Pending EOI error\n");
+ printk(" irq %d, vector 0x%x\n", irq, vector);
+
+ for ( i = sp-1; i >= 0; --i )
+ {
+ printk(" s[%d] irq %d, vec 0x%x, ready %u, "
+ "ISR %u, TMR %u, IRR %u\n",
+ i, peoi[i].irq, peoi[i].vector, peoi[i].ready,
+ apic_isr_read(peoi[i].vector),
+ apic_tmr_read(peoi[i].vector),
+ apic_irr_read(peoi[i].vector) );
+ }
+
+ printk("All LAPIC state:\n");
+ printk("[vector] %8s %8s %8s\n", "ISR", "TMR", "IRR");
+ for ( i = 0; i < APIC_ISR_NR; ++i )
+ printk("[%02x:%0x2x] %08"PRIu32" %08"PRIu32" %08"PRIu32"\n",
+ (i * 32)+31, i*32,
+ apic_read(APIC_ISR + i*0x10),
+ apic_read(APIC_TMR + i*0x10),
+ apic_read(APIC_IRR + i*0x10) );
+
+ spin_unlock(&desc->lock);
+ dump_irqs('i');
+ __print_IO_APIC();
+
+ panic("CA-107844");
+ }
ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
peoi[sp].irq = irq;
peoi[sp].vector = vector;
diff -r 3bc8894f281f xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -152,6 +152,18 @@ static __inline bool_t apic_isr_read(u8
(vector & 0x1f)) & 1;
}
+static __inline bool_t apic_tmr_read(u8 vector)
+{
+ return (apic_read(APIC_TMR + ((vector & ~0x1f) >> 1)) >>
+ (vector & 0x1f)) & 1;
+}
+
+static __inline bool_t apic_irr_read(u8 vector)
+{
+ return (apic_read(APIC_IRR + ((vector & ~0x1f) >> 1)) >>
+ (vector & 0x1f)) & 1;
+}
+
static __inline u32 get_apic_id(void) /* Get the physical APIC id */
{
u32 id = apic_read(APIC_ID);
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2013-07-31 9:47 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-31 20:32 cpuidle and un-eoid interrupts at the local apic Andrew Cooper
2013-06-03 14:30 ` Jan Beulich
2013-07-31 8:30 ` Thimo E.
2013-07-31 9:47 ` Andrew Cooper [this message]
2013-08-02 22:50 ` Thimo E.
2013-08-02 23:32 ` Andrew Cooper
2013-08-05 12:45 ` Jan Beulich
2013-08-05 14:51 ` Andrew Cooper
2013-08-09 21:27 ` Thimo E.
2013-08-09 21:40 ` Andrew Cooper
2013-08-09 21:44 ` Andrew Cooper
2013-08-11 17:46 ` Thimo E.
2013-08-12 6:02 ` Zhang, Yang Z
2013-08-12 8:49 ` Zhang, Yang Z
2013-08-12 8:57 ` Jan Beulich
2013-08-12 11:52 ` Thimo E
2013-08-12 12:04 ` Andrew Cooper
2013-08-19 15:14 ` Thimo E.
2013-08-20 5:43 ` Thimo Eichstädt
2013-08-20 8:40 ` Jan Beulich
2013-08-20 8:50 ` Zhang, Yang Z
2013-08-23 7:22 ` Thimo Eichstädt
2013-08-23 7:30 ` Zhang, Yang Z
2013-08-27 1:03 ` Zhang, Yang Z
2013-09-04 18:32 ` Thimo E.
2013-09-04 18:55 ` Andrew Cooper
2013-09-04 19:56 ` Thimo E.
2013-09-04 20:54 ` Andrew Cooper
2013-09-05 1:45 ` Zhang, Yang Z
2013-09-05 7:20 ` Thimo E.
2013-09-05 1:15 ` Zhang, Yang Z
2013-09-17 2:09 ` Zhang, Yang Z
2013-09-17 7:39 ` Thimo E.
2013-09-17 7:43 ` Zhang, Yang Z
2013-09-17 21:04 ` Thimo E.
2013-09-18 1:18 ` Zhang, Xiantao
2013-09-18 17:24 ` Thimo E.
2013-09-18 12:06 ` Andrew Cooper
2013-08-12 13:54 ` Thimo E
2013-08-12 14:06 ` Andrew Cooper
2013-08-13 1:43 ` Zhang, Yang Z
2013-08-13 6:39 ` Thimo E.
2013-08-13 11:39 ` Wu, Feng
2013-08-13 12:46 ` Andrew Cooper
2013-08-12 9:10 ` Andrew Cooper
2013-08-12 5:50 ` Zhang, Yang Z
2013-08-12 8:20 ` Jan Beulich
2013-08-12 9:28 ` Andrew Cooper
2013-08-12 10:05 ` Jan Beulich
2013-08-12 10:27 ` Andrew Cooper
2013-08-14 2:53 ` Zhang, Yang Z
2013-08-14 7:51 ` Thimo E.
2013-08-14 9:52 ` Andrew Cooper
2013-09-07 13:27 ` Thimo E.
2013-09-07 17:02 ` Andrew Cooper
2013-09-07 23:37 ` Thimo E.
2013-09-08 9:53 ` Andrew Cooper
2013-09-08 10:24 ` Thimo E.
2013-09-09 13:16 ` Andrew Cooper
2013-09-09 14:48 ` Thimo Eichstädt
2013-09-09 15:12 ` Andrew Cooper
2013-09-09 7:59 ` Jan Beulich
2013-09-09 12:53 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51F8DD40.2090207@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=abc@digithi.de \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).