From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Xen-unstable: pci-passthrough "irq 16: nobody cared" on HVM guest shutdown on irq of device not passed through. Date: Thu, 25 Sep 2014 15:42:24 +0100 Message-ID: <542429D0.5000104@citrix.com> References: <885160611.20140925163649@eikelenboom.it> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1XXAFZ-0000mt-LD for xen-devel@lists.xenproject.org; Thu, 25 Sep 2014 14:42:29 +0000 In-Reply-To: <885160611.20140925163649@eikelenboom.it> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Sander Eikelenboom , Jan Beulich , Konrad Rzeszutek Wilk Cc: xen-devel List-Id: xen-devel@lists.xenproject.org On 25/09/14 15:36, Sander Eikelenboom wrote: > Hi Jan / Konrad, > > I mentioned before seeing this sometimes, but since it happened infrequently it was hard to describe the case and log everything. > Somehow it seems i can trigger it quite reliably at the moment, so here a extensive report. > > When shutting down a HVM guest with pci passthrough (in this case a VGA adapter), > i *sometimes* run into this: > > [ 2265.395971] irq 16: nobody cared (try booting with the "irqpoll" option) > [ 2265.422948] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc6-20140925-vanilla+ #1 > [ 2265.453314] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 > [ 2265.484046] ffff880057a1a290 ffff88005f603d88 ffffffff81b7d90e 0000000000000001 > [ 2265.513053] ffff880057a1a200 ffff88005f603db8 ffffffff8110d6c8 ffff88005f603db8 > [ 2265.542121] ffff880057a1a200 0000000000000010 0000000000000000 ffff88005f603e08 > [ 2265.571135] Call Trace: > [ 2265.585507] [] dump_stack+0x46/0x58 > [ 2265.609694] [] __report_bad_irq+0x38/0xd0 > [ 2265.633625] [] note_interrupt+0x23a/0x290 > [ 2265.657572] [] ? add_interrupt_randomness+0x45/0x210 > [ 2265.684405] [] handle_irq_event_percpu+0x9d/0x150 > [ 2265.710379] [] handle_irq_event+0x43/0x70 > [ 2265.734213] [] ? handle_fasteoi_irq+0x2a/0x150 > [ 2265.759463] [] handle_fasteoi_irq+0x87/0x150 > [ 2265.784122] [] generic_handle_irq+0x1d/0x40 > [ 2265.808338] [] evtchn_fifo_handle_events+0x16a/0x170 > [ 2265.834898] [] __xen_evtchn_do_upcall+0x48/0x90 > [ 2265.860241] [] xen_evtchn_do_upcall+0x32/0x50 > [ 2265.885031] [] xen_do_hypervisor_callback+0x1e/0x30 > [ 2265.911279] [] ? xen_hypercall_sched_op+0xa/0x20 > [ 2265.938509] [] ? xen_hypercall_sched_op+0xa/0x20 > [ 2265.963981] [] ? xen_safe_halt+0x10/0x20 > [ 2265.987198] [] ? default_idle+0x18/0x20 > [ 2266.010032] [] ? arch_cpu_idle+0xa/0x10 > [ 2266.032827] [] ? cpu_startup_entry+0x281/0x2f0 > [ 2266.057481] [] ? rest_init+0xb4/0xc0 > [ 2266.079672] [] ? csum_partial_copy_generic+0x170/0x170 > [ 2266.106401] [] ? start_kernel+0x43f/0x44c > [ 2266.129479] [] ? set_init_arg+0x58/0x58 > [ 2266.151971] [] ? x86_64_start_reservations+0x2a/0x2c > [ 2266.177879] [] ? xen_start_kernel+0x59b/0x59d > [ 2266.201994] handlers: > [ 2266.214783] [] azx_interrupt > [ 2266.234031] Disabling IRQ #16 > > The system: > > - AMD > - Xen-unstable xen_changeset: Wed Sep 24 11:19:57 2014 +0200 git:b67a26f-dirty > - Both dom0 and domU (HVM guest using qemu-xen) run a 3.17-rc6 kernel > - The device passed through is 09:00.0 > > - This IRQ is *not* coupled to the passthrough device (09:00.0), but to the onboard > soundcard (00:14.2 on the southbridge) and is in dom0 and not in active use (although the > snd-hda-intel driver is loaded). > > - No "soundhw" option is specified in the guest config, so it also shouldn't be > trying to use it that way. > > > > There are 2 things that can happen when trying to start and shutdown a guest: > A) It starts and shutdowns OK, (no irq nobody cared messages) > B) It starts fine and but after shutdown the nirq nobody cared message > > - B *can* happen both on: the first start-and-shutdown of the HVM guest, or only on a subsequent start-and-shutdown > (so on the first start-and-shutdown it can work ok, but does not always) > > There seems to be some small differences for both cases from the start of the domain: > > - When booting the HVM guest the irq number of /proc/interrupts stays the same for when A happens, but when B happens, the number of interrupts has been > doubled (so that seems like a reinit of the device that is not passed through). > > - When shutting down the HVM guest when A happens the number of interrupts in /proc/interrups is still what it was, but when B happens it seems like a irq storm > and after the irq nobody cared that ends with (always that 200000 so perhaps a threshold ?): > 16: 200000 0 0 0 0 0 xen-pirq-ioapic-level snd_hda_intel > > - On the start when B happens, xl dmesg contains this message (when A happens it doesn't contain it): > (XEN) [2014-09-25 13:39:48.149] d32767v2: Unsupported MSI delivery mode 3 for Dom2 > > If i interpret that right in the logging the d32767 seems to be used for the IOMMU. > > I attached the complete serial log while doing this (hope it's not too large for the mailing list): > > - Cold boot of the host system > - Dump with xl debug-keys of i, I, Q, M, z, e, v > - Start of the HVM guest with pci device passed through. > - Dump with xl debug-keys of i, I, Q, M, z, e, v > - Shutdown of the HVM guest with pci device passed through, A happened. > - Dump with xl debug-keys of i, I, Q, M, z, e, v > - Start of the HVM guest with pci device passed through. > - Dump with xl debug-keys of i, I, Q, M, z, e, v > - Shutdown of the HVM guest with pci device passed through, B happened. > - Dump with xl debug-keys of i, I, Q, M, z, e, v > > I also attached the output of lspci -vvvknn Could you provide `lspci -tv` as well please?