From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: domU and dom0 hung with Xen console interrupt binding showing in-flight=1, (---M) Date: Wed, 18 Aug 2010 09:47:36 +0100 Message-ID: <4C6BBA480200007800010949@vpn.id2.novell.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser , Bruce Edge Cc: Xen-devel List-Id: xen-devel@lists.xenproject.org >>> On 17.08.10 at 20:01, Keir Fraser wrote: > On 17/08/2010 18:28, "Bruce Edge" wrote: >=20 >> On Tue, Jun 29, 2010 at 1:42 AM, Jan Beulich = wrote: >>>>>> On 28.06.10 at 20:22, Dante Cinco wrote: >>>> I have an HP Proliant DL380-G6 (dual Xeon E5540 @ 2.53GHz) with Xen = 4.0.0 >>>> and dom0 Linux 2.6.32.12 x86_64 pvops and domU Linux kernel 2.6.30.1 = x86_64. >>>> I'm using PCI passthrough (pci-stub) to pass my 4-port 8Gb PMC-Sierra = Fibre >>>> Channel HBA to domU. After running I/Os for several hours, both dom0 = and >>>> domU hangs and the Xen console shows the interrupt binding below = where IRQ >>>> 66 shows in-flight=3D1 and mask set (---M). What's the best way to = debug this >>>> problem? >>>=20 >>> There are potentially two problems here: One is that the guest may >>> fail to send the EOI notification. You would want to check whether >>> pirq_guest_eoi() got run after that last occurrence of the interrupt. >>>=20 >>> The more worrying part is that Xen should time out on a guest failing >>> to send the EOI notification, and ack the interrupt nevertheless. >>> Looking at the code I fail to see how the ack_APIC_irq() would get >>> sent in this case: non-maskable MSIs get this issued from >>> end_msi_irq(), but ->end doesn't get invoked from >>> irq_guest_eoi_timer_fn() (only ->enable does). Keir, am I missing >>> something? >=20 > I don't think that timer logic is designed to handle non-maskable MSIs, = only > maskable ones. It ought to be not too hard to fix it up for non-maskable > ones too by issuing the ->end() call from the timer handler? Yes, that was what I was trying to hint at, but I wasn't sure whether calling ->end() here has any unintended side effects and/or requires any extra care (like preventing a subsequent guest initiated EOI to call ->end() again). While looking at this I came across another thing I don't understand: __pirq_guest_eoi(), for the ACKTYPE_EOI case, calls __set_eoi_ready() in a cpu_test_and_clear() conditional, but __set_eoi_ready() bails out if it finds !cpu_test_and_clear() on the same bitmap - what's the point of calling __set_eoi_ready() here then (or what am I missing)? Jan