All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sander Eikelenboom <linux@eikelenboom.it>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: Xen-unstable: pci-passthrough "irq 16: nobody cared" on HVM guest shutdown on irq of device not passed through.
Date: Thu, 25 Sep 2014 16:47:30 +0200	[thread overview]
Message-ID: <1086460412.20140925164730@eikelenboom.it> (raw)
In-Reply-To: <542429D0.5000104@citrix.com>


Thursday, September 25, 2014, 4:42:24 PM, you wrote:

> On 25/09/14 15:36, Sander Eikelenboom wrote:
>> Hi Jan / Konrad,
>>
>> I mentioned before seeing this sometimes, but since it happened infrequently it was hard to describe the case and log everything.
>> Somehow it seems i can trigger it quite reliably at the moment, so here a extensive report.
>>
>> When shutting down a HVM guest with pci passthrough (in this case a VGA adapter),
>>  i *sometimes* run into this:
>>
>> [ 2265.395971] irq 16: nobody cared (try booting with the "irqpoll" option)
>> [ 2265.422948] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc6-20140925-vanilla+ #1
>> [ 2265.453314] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>> [ 2265.484046]  ffff880057a1a290 ffff88005f603d88 ffffffff81b7d90e 0000000000000001
>> [ 2265.513053]  ffff880057a1a200 ffff88005f603db8 ffffffff8110d6c8 ffff88005f603db8
>> [ 2265.542121]  ffff880057a1a200 0000000000000010 0000000000000000 ffff88005f603e08
>> [ 2265.571135] Call Trace:
>> [ 2265.585507]  <IRQ>  [<ffffffff81b7d90e>] dump_stack+0x46/0x58
>> [ 2265.609694]  [<ffffffff8110d6c8>] __report_bad_irq+0x38/0xd0
>> [ 2265.633625]  [<ffffffff8110dc1a>] note_interrupt+0x23a/0x290
>> [ 2265.657572]  [<ffffffff8155f0f5>] ? add_interrupt_randomness+0x45/0x210
>> [ 2265.684405]  [<ffffffff8110b45d>] handle_irq_event_percpu+0x9d/0x150
>> [ 2265.710379]  [<ffffffff8110b553>] handle_irq_event+0x43/0x70
>> [ 2265.734213]  [<ffffffff8110e29a>] ? handle_fasteoi_irq+0x2a/0x150
>> [ 2265.759463]  [<ffffffff8110e2f7>] handle_fasteoi_irq+0x87/0x150
>> [ 2265.784122]  [<ffffffff8110acbd>] generic_handle_irq+0x1d/0x40
>> [ 2265.808338]  [<ffffffff8152037a>] evtchn_fifo_handle_events+0x16a/0x170
>> [ 2265.834898]  [<ffffffff8151d4c8>] __xen_evtchn_do_upcall+0x48/0x90
>> [ 2265.860241]  [<ffffffff8151f0d2>] xen_evtchn_do_upcall+0x32/0x50
>> [ 2265.885031]  [<ffffffff81b8a76e>] xen_do_hypervisor_callback+0x1e/0x30
>> [ 2265.911279]  <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [ 2265.938509]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [ 2265.963981]  [<ffffffff81008d80>] ? xen_safe_halt+0x10/0x20
>> [ 2265.987198]  [<ffffffff81018bd8>] ? default_idle+0x18/0x20
>> [ 2266.010032]  [<ffffffff8101949a>] ? arch_cpu_idle+0xa/0x10
>> [ 2266.032827]  [<ffffffff810f84f1>] ? cpu_startup_entry+0x281/0x2f0
>> [ 2266.057481]  [<ffffffff81b741e4>] ? rest_init+0xb4/0xc0
>> [ 2266.079672]  [<ffffffff81b74130>] ? csum_partial_copy_generic+0x170/0x170
>> [ 2266.106401]  [<ffffffff82321079>] ? start_kernel+0x43f/0x44c
>> [ 2266.129479]  [<ffffffff82320a27>] ? set_init_arg+0x58/0x58
>> [ 2266.151971]  [<ffffffff82320608>] ? x86_64_start_reservations+0x2a/0x2c
>> [ 2266.177879]  [<ffffffff823240af>] ? xen_start_kernel+0x59b/0x59d
>> [ 2266.201994] handlers:
>> [ 2266.214783] [<ffffffff81945580>] azx_interrupt
>> [ 2266.234031] Disabling IRQ #16
>>
>> The system:
>>
>> - AMD
>> - Xen-unstable xen_changeset: Wed Sep 24 11:19:57 2014 +0200 git:b67a26f-dirty
>> - Both dom0 and domU (HVM guest using qemu-xen) run a 3.17-rc6 kernel
>> - The device passed through is 09:00.0
>>
>> - This IRQ is *not* coupled to the passthrough device (09:00.0), but to the onboard 
>>   soundcard (00:14.2 on the southbridge) and is in dom0 and not in active use (although the 
>>   snd-hda-intel driver is loaded).
>>
>> - No "soundhw" option is specified in the guest config, so it also shouldn't be 
>>   trying to use it that way.
>>
>>
>>
>> There are 2 things that can happen when trying to start and shutdown a guest:
>> A) It starts and shutdowns OK, (no irq nobody cared messages)
>> B) It starts fine and but after shutdown the nirq nobody cared message
>>
>> - B *can* happen both on: the first start-and-shutdown of the HVM guest, or only on a subsequent start-and-shutdown
>>   (so on the first start-and-shutdown it can work ok, but does not always)
>>
>> There seems to be some small differences for both cases from the start of the domain:
>>
>> - When booting the HVM guest the irq number of /proc/interrupts stays the same for when A happens, but when B happens, the number of interrupts has been
>>   doubled (so that seems like a reinit of the device that is not passed through).
>>
>> - When shutting down the HVM guest when A happens the number of interrupts in /proc/interrups is still what it was, but when B happens it seems like a irq storm
>>   and after the irq nobody cared that ends with (always that 200000 so perhaps a threshold ?):
>>   16:     200000          0          0          0          0          0  xen-pirq-ioapic-level  snd_hda_intel
>>
>> - On the start when B happens, xl dmesg contains this message (when A happens it doesn't contain it):
>>   (XEN) [2014-09-25 13:39:48.149] d32767v2: Unsupported MSI delivery mode 3 for Dom2
>>
>>   If i interpret that right in the logging the d32767 seems to be used for the IOMMU.
>>
>> I attached the complete serial log while doing this (hope it's not too large for the mailing list):
>>
>> - Cold boot of the host system
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>> - Start of the HVM guest with pci device passed through.
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>> - Shutdown of the HVM guest with pci device passed through, A happened.
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>> - Start of the HVM guest with pci device passed through.
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>> - Shutdown of the HVM guest with pci device passed through, B happened.
>> - Dump with xl debug-keys of i, I, Q, M, z, e, v
>>
>> I also attached the output of lspci -vvvknn

> Could you provide `lspci -tv` as well please?

Sure:
~# lspci -tv
-[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 Northbridge only single slot PCI-e GFX Hydra part
           +-00.2  Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory Management Unit (IOMMU)
           +-02.0-[0f]--+-00.0  Advanced Micro Devices [AMD] nee ATI RV620 LE [Radeon HD 3450]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI RV620 HDMI Audio [Radeon HD 3400 Series]
           +-03.0-[0e]--+-00.0  Advanced Micro Devices [AMD] nee ATI Turks [Radeon HD 6570]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
           +-05.0-[0d]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller
           +-06.0-[0c]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller
           +-09.0-[0b]----00.0  NEC Corporation uPD720200 USB 3.0 Host Controller
           +-0a.0-[0a]----00.0  Conexant Systems, Inc. Device 8210
           +-0b.0-[09]--+-00.0  Advanced Micro Devices [AMD] nee ATI Turks [Radeon HD 6570]
           |            \-00.1  Advanced Micro Devices [AMD] nee ATI Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
           +-0c.0-[05-08]----00.0-[06-08]--+-01.0-[08]----00.0  NEC Corporation uPD720200 USB 3.0 Host Controller
           |                               \-02.0-[07]----00.0  Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller
           +-0d.0-[04]----00.0  NEC Corporation uPD720200 USB 3.0 Host Controller
           +-11.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
           +-12.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
           +-12.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
           +-13.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
           +-13.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
           +-14.0  Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller
           +-14.2  Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA)
           +-14.3  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC host controller
           +-14.4-[03]----06.0  C-Media Electronics Inc CMI8738/CMI8768 PCI Audio
           +-14.5  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
           +-15.0-[02]--
           +-16.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
           +-16.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
           +-18.0  Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
           +-18.1  Advanced Micro Devices [AMD] Family 10h Processor Address Map
           +-18.2  Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
           +-18.3  Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
           \-18.4  Advanced Micro Devices [AMD] Family 10h Processor Link Control

  reply	other threads:[~2014-09-25 14:47 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-25 14:36 Xen-unstable: pci-passthrough "irq 16: nobody cared" on HVM guest shutdown on irq of device not passed through Sander Eikelenboom
2014-09-25 14:42 ` Andrew Cooper
2014-09-25 14:47   ` Sander Eikelenboom [this message]
2014-09-25 15:11 ` Jan Beulich
2014-09-25 15:49   ` Sander Eikelenboom
2014-09-25 16:14     ` Jan Beulich
2014-09-25 17:02       ` Sander Eikelenboom
2014-09-25 18:45         ` Sander Eikelenboom
2014-09-25 22:09           ` Sander Eikelenboom
2014-09-26  6:59             ` Jan Beulich
2014-09-26  9:18               ` Sander Eikelenboom
2014-09-26  9:43                 ` Jan Beulich
2014-09-26 10:02                   ` Sander Eikelenboom
2014-09-26 10:08                     ` Jan Beulich
2014-09-27 14:00                   ` Sander Eikelenboom
2014-09-27 18:02                     ` Konrad Rzeszutek Wilk
2014-09-27 18:23                       ` Sander Eikelenboom
2014-10-01 13:52                     ` Sander Eikelenboom
2014-10-01 14:19                       ` Jan Beulich
2014-10-07 13:41                       ` Konrad Rzeszutek Wilk
2014-10-07 14:50                         ` Jan Beulich
2014-10-08 12:56                           ` Konrad Rzeszutek Wilk
2014-10-08 20:33                             ` Sander Eikelenboom
2014-10-21 13:43                             ` Sander Eikelenboom
2014-10-21 14:52                               ` Jan Beulich
2014-09-26  6:54           ` Jan Beulich
2014-09-26  9:06             ` Sander Eikelenboom
2014-09-26  6:50         ` Jan Beulich
2014-09-26  9:00           ` Sander Eikelenboom
2014-09-26  9:09             ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1086460412.20140925164730@eikelenboom.it \
    --to=linux@eikelenboom.it \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.