From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163 Date: Sun, 17 Jan 2016 23:12:28 +0000 Message-ID: <569C1FDC.4070303@citrix.com> References: <5698D0C1.6000808@alstadheim.priv.no> <5698D297.8030700@citrix.com> <569BAA4A.8010501@alstadheim.priv.no> <569BB05B.1000801@citrix.com> <569BC1C1.6090307@alstadheim.priv.no> <569C1E98.8020802@alstadheim.priv.no> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <569C1E98.8020802@alstadheim.priv.no> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: =?UTF-8?Q?H=c3=a5kon_Alstadheim?= , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 17/01/2016 23:07, H=E5kon Alstadheim wrote: > Den 17. jan. 2016 17:30, skrev H=E5kon Alstadheim: >> Den 17. jan. 2016 16:16, skrev Andrew Cooper: >>> On 17/01/16 14:50, H=E5kon Alstadheim wrote: >>>> Den 15. jan. 2016 12:05, skrev Andrew Cooper: >>>>> On 15/01/16 10:58, H=E5kon Alstadheim wrote: >>>>>> CPUINFO: >>>>>> vendor_id : GenuineIntel >>>>>> cpu family : 6 >>>>>> model : 63 >>>>>> model name : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz >>>>>> >>>>>> # smbios-sys-info >>>>>> Libsmbios version: 2.2.28 >>>>>> Product Name: Z10PE-D8 WS >>>>>> Vendor: ASUSTeK COMPUTER INC. >>>>>> BIOS Version: 3101 >>>>>> >>>>>> >>>>>> I have been experiencing issues with domains with passed through PCIe >>>>>> devices since I first installed xen. Then at version 4.5.x , I'm now >>>>>> at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci >>>>>> pass through and interrupts (usb-cards, sound cards). >>>>>> >>>>>> Recently the system has been more stable, whether it is because I pa= ss >>>>>> through as few things as possible, or because of improvements in Xen= I >>>>>> do not know. I have also taken to building with debug, which leads to >>>>>> more abrupt but less mysterious failures. Earlier (w/o debug and und= er >>>>>> xen 4.5 ) stuff would just gradually stop working and end up in total >>>>>> hang of everything. So, hey, things are improving :-b >>>>> This isn't the first time we have seen this on Haswell processors. Do >>>>> you have microcode loading set up? >>>>> >>>>> ~Andrew >>>>> >>>> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated >>>> cpu microcode, using microcode from 20151106. >>> Ok - I previously investigated this issue, but my repro evaporated from >>> under my feet with a firmware update, and I never got to the bottom of = it. >>> >>> Please can you start with the following patch which will dump some more >>> information on crash. >>> >>> ---8<--- >>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c >>> index 1228568..588b562 100644 >>> --- a/xen/arch/x86/irq.c >>> +++ b/xen/arch/x86/irq.c >>> @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq) >>> if ( action->ack_type =3D=3D ACKTYPE_EOI ) >>> { >>> sp =3D pending_eoi_sp(peoi); >>> + if ( unlikely(!((sp =3D=3D 0) || (peoi[sp-1].vector < vector))= ) ) >>> + { >>> + int p; >>> + for ( p =3D sp; p > 0; --p ) >>> + printk("**peoi[%d] =3D {%d, 0x%u, %d}\n", >>> + p-1, peoi[p-1].irq, peoi[p-1].vector, >>> peoi[p-1].ready); >>> + } >>> ASSERT((sp =3D=3D 0) || (peoi[sp-1].vector < vector)); >>> ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); >>> peoi[sp].irq =3D irq; >>> >>> >> Will do. Building now. >> Seems there is a line accidentally folded "peoi[p-1].ready);" belongs at >> the end of preceding line I presume? >> > There we go :-/ . Log attached from boot to assertion-failure with > loglvl=3Dall guest_loglvl=3Dall . Some of the log output might be a bit > cryptic, they are notes to myself from local boot-scripts, basically > firing up my router/name-server/dhcp-server and waiting until services > are ready before continuing. Would you mind running with the second patch I sent? It gathers more information. ~Andrew