From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x Date: Tue, 26 Mar 2013 16:12:56 +0000 Message-ID: <5151C908.2000807@citrix.com> References: <5140E69F.9090803@invisiblethingslab.com> <20130315130240.GA8582@phenom.dumpdata.com> <514C79F3.5050504@invisiblethingslab.com> <20130322165651.GA4827@phenom.dumpdata.com> <515036BF.10105@invisiblethingslab.com> <20130325141701.GI11546@phenom.dumpdata.com> <515191CC.6060609@invisiblethingslab.com> <5151AC8C02000078000C88B9@nat28.tlf.novell.com> <5151A788.809@invisiblethingslab.com> <5151C2FC.2090205@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5151C2FC.2090205@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Marek Marczykowski Cc: Konrad Rzeszutek Wilk , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 26/03/2013 15:47, Andrew Cooper wrote: > On 26/03/2013 13:50, Marek Marczykowski wrote: >> On 26.03.2013 14:11, Jan Beulich wrote: >>>>>> On 26.03.13 at 13:17, Marek Marczykowski wrote: >>>> Finally got serial console :) >>>> The debug=y problem is (actually at resume): >>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542 >>>> (XEN) ----[ Xen-4.1.5-rc1 x86_64 debug=y Tainted: C ]---- >>>> (XEN) CPU: 0 >>>> (XEN) RIP: e008:[] >>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d >>>> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor >>>> (XEN) rax: 0000000000000000 rbx: 00000000000000e9 rcx: ffff82c48029ff18 >>>> (XEN) rdx: 00000000000000e9 rsi: 000000000000002a rdi: ffff830421060538 >>>> (XEN) rbp: ffff82c48029ff08 rsp: ffff82c48029feb8 r8: ffff88041820eb60 >>>> (XEN) r9: 0000000000000000 r10: 0000000000007ff0 r11: 0000000000000000 >>>> (XEN) r12: ffff830421080250 r13: ffff830421060534 r14: ffff82c48029ff18 >>>> (XEN) r15: ffff82c4802dd9e0 cr0: 000000008005003b cr4: 00000000000026f0 >>>> (XEN) cr3: 0000000300b81000 cr2: ffff880402070198 >>>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 >>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8: >>>> (XEN) 0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0 >>>> (XEN) ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728 >>>> (XEN) ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60 >>>> (XEN) 00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729 >>>> (XEN) ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0 >>>> (XEN) 0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88 >>>> (XEN) 000000000000002a 000000000000002a 00000000ffff372a 0000002000000000 >>>> (XEN) ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8 >>>> (XEN) 000000000000e02b 0000000000000000 0000000000000000 0000000000000000 >>>> (XEN) 0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000 >>>> (XEN) 0000000000000000 >>>> (XEN) Xen call trace: >>>> (XEN) [] smp_irq_move_cleanup_interrupt+0x1c3/0x23d >>>> (XEN) >>>> (XEN) >>>> (XEN) **************************************** >>>> (XEN) Panic on CPU 0: >>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542 >>>> (XEN) **************************************** >>> To make sense of this, we need to know the register (and maybe >>> stack) allocation at this point, to know which vector it was that >>> triggered the assertion. You can either do this analysis for us, or >>> point us at the xen-syms binary matching the xen.gz you used. >> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9. >> >>> From the register values, the most likely candidates are vector 0xe9 >>> and 0x2a. The former having two registers set to this value seems >>> more likely from than angle, but vectors in the 0xe? range should >>> never end up in smp_irq_move_cleanup_interrupt(). >>> >>> And if it's the 0x2a one, then we'd need to know what IRQ it was >>> last used for. That can't be reconstructed from the data above, so >>> would require you being able to reproduce this and adding some >>> instrumentation to the code. >>> >>> Jan >>> > Could it be something to do with switching virtual wire mode, and having > PIC compatibility stuff left in the IO-APIC after leaving the BIOS but > before starting back up again? > > Looking at the stack dump, there is an extra exception frame under what > is printed by the assertion failure. > > 0000002000000000 TRAP_syscall Apologies - this is a vector 0x20 interrupt, not TRAP_syscall, which makes sense as 0x20 is FIRST_DYNAMIC_IRQ which is also the cleanup IPI vector. The other comments still stand, espcially as we appear to be interrupting dom0 which is already running. ~Andrew > ffffffff81a01db8 guest kernel addr > 0000000000000246 FLAGS > 000000000000e033 FLAT_RING3_CS64 > ffffffff8105dd5a guest kernel addr > 000000000000e02b FLAT_RING3_SS{64,32} > > So it appears that we are already executing a guest (presumably dom0) by the time this assertion occurs. From the serial, is there any indication that dom0 has started up again? > > I would have thought that we should have successfully reset the IO-APIC back up properly before we would ever get back around to executing dom0. > > ~Andrew > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel