* Hang (due to HW?) in qi_submit_sync()
@ 2015-01-06 0:57 Roland Dreier
[not found] ` <1420505840-30096-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2015-01-06 0:57 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Jiang Liu
From: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Hi, we're running kernel 3.10.59 (pretty recent long-term kernel) on a
2-socket Xeon E5 v3 (Haswell) system. We're using vfio to access some
PCI devices from userspace, and occasionally when we kill a process,
we see the system hang in qi_submit_sync().
Based on a very old patch from Intel <https://lkml.org/lkml/2009/5/20/341>,
we added code to the dmar driver:
int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu )
{
//...
/*
* update the HW tail register indicating the presence of
* new descriptors.
*/
writel(qi->free_head << DMAR_IQ_SHIFT, iommu->reg + DMAR_IQT_REG);
start_time = get_cycles();
while (qi->desc_status[wait_index] != QI_DONE) {
/*
* We will leave the interrupts disabled, to prevent interrupt
* context to queue another cmd while a cmd is already submitted
* and waiting for completion on this cpu. This is to avoid
* a deadlock where the interrupt context can wait indefinitely
* for free slots in the queue.
*/
rc = qi_check_fault(iommu, index);
if (rc)
break;
raw_spin_unlock(&qi->q_lock);
// We added this -->
if (get_cycles() - start_time > DMAR_OPERATION_TIMEOUT) {
printk(KERN_EMERG "desc_status[%d] = %d.\n",
wait_index, qi->desc_status[wait_index]);
/* line 888: */ BUG();
}
// <-- to here
cpu_relax();
raw_spin_lock(&qi->q_lock);
}
and indeed when the system hangs, we see for example
desc_status[69] = 1.
------------[ cut here ]------------
kernel BUG at drivers/iommu/dmar.c:888!
CPU: 8 PID: 12211 Comm: foed Tainted: P O 3.10.59+ #201412290537+4e4984e.platinum
task: ffff88275ac643e0 ti: ffff8825d329a000 task.ti: ffff8825d329a000
RIP: 0010:[<ffffffff81529737>] [<ffffffff81529737>] qi_submit_sync+0x3f7/0x490
RSP: 0018:ffff8825d329ba10 EFLAGS: 00010092
RAX: 0000000000000014 RBX: 0000000000000044 RCX: ffff881fffb0ec00
RDX: 0000000000000000 RSI: ffff881fffb0d048 RDI: 0000000000000046
RBP: ffff8825d329ba78 R08: ffffffffffffffff R09: 000000000001a4a1
R10: 0000000000000051 R11: 00000000000000e4 R12: 00007068faa64fc8
R13: ffff881fff40c780 R14: 0000000000000114 R15: ffff883ffec01a00
FS: 00007f3c86ffb700(0000) GS:ffff881fffb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f996d3f1ba0 CR3: 00000026222f0000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff8825d329ba88 0000000000000450 0000000000000440 ffff881ff3215000
00000044d329bb18 0000000000000086 0000000000000044 ffff882500000045
ffff881ff12b1600 0000000000000000 0000000000000246 ffff881ff278e858
Call Trace:
[<ffffffff8152f6b5>] free_irte+0xc5/0x100
[<ffffffff81530834>] free_remapped_irq+0x44/0x60
[<ffffffff81027b23>] destroy_irq+0x33/0xd0
[<ffffffff81027ede>] native_teardown_msi_irq+0xe/0x10
[<ffffffff812a6a70>] default_teardown_msi_irqs+0x60/0x80
[<ffffffff812a64d9>] free_msi_irqs+0x99/0x150
[<ffffffff812a749d>] pci_disable_msix+0x3d/0x60
[<ffffffffa0078748>] vfio_msi_disable+0xc8/0xe0 [vfio_pci]
[<ffffffffa0078f86>] vfio_pci_set_msi_trigger+0x2a6/0x2d0 [vfio_pci]
[<ffffffffa007941c>] vfio_pci_set_irqs_ioctl+0x8c/0xa0 [vfio_pci]
[<ffffffffa00773b0>] vfio_pci_release+0x70/0x150 [vfio_pci]
[<ffffffffa006dcbc>] vfio_device_fops_release+0x1c/0x40 [vfio]
[<ffffffff8114d7db>] __fput+0xdb/0x220
[<ffffffff8114d92e>] ____fput+0xe/0x10
[<ffffffff810614ac>] task_work_run+0xbc/0xe0
[<ffffffff81043d0e>] do_exit+0x3ce/0xe50
[<ffffffff8104557f>] do_group_exit+0x3f/0xa0
[<ffffffff81054769>] get_signal_to_deliver+0x1a9/0x5b0
[<ffffffff810023f8>] do_signal+0x48/0x5e0
as far as I can understand the driver, this is a "shouldn't happen,
your hardware is broken" occurrence. However I haven't been able to
find any relevant looking sightings for our CPU.
Does anyone from Intel (or elsewhere) have any suggestions on how to
chase this further?
Thanks!
Roland
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang (due to HW?) in qi_submit_sync()
[not found] ` <1420505840-30096-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2015-01-06 2:54 ` Alex Williamson
[not found] ` <1420512860.3541.77.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2015-01-06 2:54 UTC (permalink / raw)
To: Roland Dreier
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jiang Liu
On Mon, 2015-01-05 at 16:57 -0800, Roland Dreier wrote:
> From: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
>
> Hi, we're running kernel 3.10.59 (pretty recent long-term kernel) on a
> 2-socket Xeon E5 v3 (Haswell) system. We're using vfio to access some
> PCI devices from userspace, and occasionally when we kill a process,
> we see the system hang in qi_submit_sync().
>
> Based on a very old patch from Intel <https://lkml.org/lkml/2009/5/20/341>,
> we added code to the dmar driver:
>
> int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu )
> {
>
> //...
>
> /*
> * update the HW tail register indicating the presence of
> * new descriptors.
> */
> writel(qi->free_head << DMAR_IQ_SHIFT, iommu->reg + DMAR_IQT_REG);
>
> start_time = get_cycles();
> while (qi->desc_status[wait_index] != QI_DONE) {
> /*
> * We will leave the interrupts disabled, to prevent interrupt
> * context to queue another cmd while a cmd is already submitted
> * and waiting for completion on this cpu. This is to avoid
> * a deadlock where the interrupt context can wait indefinitely
> * for free slots in the queue.
> */
> rc = qi_check_fault(iommu, index);
> if (rc)
> break;
>
> raw_spin_unlock(&qi->q_lock);
>
> // We added this -->
> if (get_cycles() - start_time > DMAR_OPERATION_TIMEOUT) {
> printk(KERN_EMERG "desc_status[%d] = %d.\n",
> wait_index, qi->desc_status[wait_index]);
> /* line 888: */ BUG();
> }
> // <-- to here
>
> cpu_relax();
> raw_spin_lock(&qi->q_lock);
> }
>
> and indeed when the system hangs, we see for example
>
> desc_status[69] = 1.
> ------------[ cut here ]------------
> kernel BUG at drivers/iommu/dmar.c:888!
> CPU: 8 PID: 12211 Comm: foed Tainted: P O 3.10.59+ #201412290537+4e4984e.platinum
> task: ffff88275ac643e0 ti: ffff8825d329a000 task.ti: ffff8825d329a000
> RIP: 0010:[<ffffffff81529737>] [<ffffffff81529737>] qi_submit_sync+0x3f7/0x490
> RSP: 0018:ffff8825d329ba10 EFLAGS: 00010092
> RAX: 0000000000000014 RBX: 0000000000000044 RCX: ffff881fffb0ec00
> RDX: 0000000000000000 RSI: ffff881fffb0d048 RDI: 0000000000000046
> RBP: ffff8825d329ba78 R08: ffffffffffffffff R09: 000000000001a4a1
> R10: 0000000000000051 R11: 00000000000000e4 R12: 00007068faa64fc8
> R13: ffff881fff40c780 R14: 0000000000000114 R15: ffff883ffec01a00
> FS: 00007f3c86ffb700(0000) GS:ffff881fffb00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f996d3f1ba0 CR3: 00000026222f0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Stack:
> ffff8825d329ba88 0000000000000450 0000000000000440 ffff881ff3215000
> 00000044d329bb18 0000000000000086 0000000000000044 ffff882500000045
> ffff881ff12b1600 0000000000000000 0000000000000246 ffff881ff278e858
> Call Trace:
> [<ffffffff8152f6b5>] free_irte+0xc5/0x100
> [<ffffffff81530834>] free_remapped_irq+0x44/0x60
> [<ffffffff81027b23>] destroy_irq+0x33/0xd0
> [<ffffffff81027ede>] native_teardown_msi_irq+0xe/0x10
> [<ffffffff812a6a70>] default_teardown_msi_irqs+0x60/0x80
> [<ffffffff812a64d9>] free_msi_irqs+0x99/0x150
> [<ffffffff812a749d>] pci_disable_msix+0x3d/0x60
> [<ffffffffa0078748>] vfio_msi_disable+0xc8/0xe0 [vfio_pci]
> [<ffffffffa0078f86>] vfio_pci_set_msi_trigger+0x2a6/0x2d0 [vfio_pci]
> [<ffffffffa007941c>] vfio_pci_set_irqs_ioctl+0x8c/0xa0 [vfio_pci]
> [<ffffffffa00773b0>] vfio_pci_release+0x70/0x150 [vfio_pci]
> [<ffffffffa006dcbc>] vfio_device_fops_release+0x1c/0x40 [vfio]
> [<ffffffff8114d7db>] __fput+0xdb/0x220
> [<ffffffff8114d92e>] ____fput+0xe/0x10
> [<ffffffff810614ac>] task_work_run+0xbc/0xe0
> [<ffffffff81043d0e>] do_exit+0x3ce/0xe50
> [<ffffffff8104557f>] do_group_exit+0x3f/0xa0
> [<ffffffff81054769>] get_signal_to_deliver+0x1a9/0x5b0
> [<ffffffff810023f8>] do_signal+0x48/0x5e0
>
> as far as I can understand the driver, this is a "shouldn't happen,
> your hardware is broken" occurrence. However I haven't been able to
> find any relevant looking sightings for our CPU.
>
> Does anyone from Intel (or elsewhere) have any suggestions on how to
> chase this further?
Try disabling CONFIG_NET_DMA
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang (due to HW?) in qi_submit_sync()
[not found] ` <1420512860.3541.77.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-01-06 4:39 ` Roland Dreier
[not found] ` <CAG4TOxPOgTOKYZa7q9Of8XzHV_wAadtJmXFC0bmyN2Qds7T9RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2015-01-06 4:39 UTC (permalink / raw)
To: Alex Williamson
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jiang Liu
On Mon, Jan 5, 2015 at 6:54 PM, Alex Williamson
<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Try disabling CONFIG_NET_DMA
We already have that disabled (well, in 3.10 it depends on BROKEN, and
we don't have BROKEN enabled :).
However I'm curious why you suggest that. Because some of the devices
we're accessing via vfio are in fact Intel DMA engines (we blacklist
ioatdma and use the DMA devices directly from userspace). Is there
some known interaction between the Intel DMA engines and interrupt
remapping?
Thanks!
Roland
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang (due to HW?) in qi_submit_sync()
[not found] ` <CAG4TOxPOgTOKYZa7q9Of8XzHV_wAadtJmXFC0bmyN2Qds7T9RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-06 6:48 ` Alex Williamson
[not found] ` <1420526901.3541.96.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2015-01-06 6:48 UTC (permalink / raw)
To: Roland Dreier
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jiang Liu
On Mon, 2015-01-05 at 20:39 -0800, Roland Dreier wrote:
> On Mon, Jan 5, 2015 at 6:54 PM, Alex Williamson
> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > Try disabling CONFIG_NET_DMA
>
> We already have that disabled (well, in 3.10 it depends on BROKEN, and
> we don't have BROKEN enabled :).
Only since v3.10.26 is it marked BROKEN.
> However I'm curious why you suggest that. Because some of the devices
> we're accessing via vfio are in fact Intel DMA engines (we blacklist
> ioatdma and use the DMA devices directly from userspace). Is there
> some known interaction between the Intel DMA engines and interrupt
> remapping?
I suggest it because I spent several weeks isolating why we hit the same
lockup in qi_submit_sync() as you're seeing after we backported a number
of iommu patches. In our case, finding that it's toggled by
CONFIG_NET_DMA, which has since been marked broken and removed is a
sufficient solution. I agree though that regardless of what terrible
things NET_DMA was doing, we seem to be hitting a "broken hardware"
condition, potentially invoked by the DMA engine.
What I observed was that it occurs when flushing an irte entry, the
queued invalidation queue is working prior to this flush, but the wait
descriptor value is never written to the status address, the queue head
never advances past that wait descriptor once it gets wedged, and the
status register never indicates any sort of error. Section 6.5.6 of the
VT-d spec on interrupt draining has an interesting statement:
Interrupt draining is performed on Interrupt Entry Cache (IEC)
invalidation requests. For IEC invalidations submitted through
the queued invalidation interface, interrupt draining must be
completed before the next Invalidation Wait Descriptor is
completed by hardware.
Given the circumstances of the hang, that certainly makes me suspect
that queued invalidation is failing to complete interrupt draining and
hardware is therefore unable to advance past the subsequent wait
descriptor or complete the wait descriptor because of this requirement.
I wouldn't be surprised if there's an errata hidden in there somewhere.
Thanks,
Alex
PS - Thanks for using vfio :)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Hang (due to HW?) in qi_submit_sync()
[not found] ` <1420526901.3541.96.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-01-08 22:36 ` Roland Dreier
[not found] ` <1420756610-20918-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2015-01-08 22:36 UTC (permalink / raw)
To: Alex Williamson, Jiang Liu
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Hi,
So we've managed to find one magic setup and workload that reproduces
this reliably, and I've done a bit more debugging that leads me to
believe we probably need Intel's help to really get to the bottom of
this.
One question though: you mentioned that you saw this behavior until
you turned off CONFIG_NET_DMA. What platform was that on? We see
this on dual socket Xeon E5 v3 (Haswell EP / Grantley), and I don't
really have any other setup I can try. Did you see this on other
platforms (Ivy Bridge / Romley maybe)?
Anyway, I added the debugging patch at the end of this mail to our
kernel to dump some status when the driver detects a hung queue.
Below we see some example output. Things I notice:
- IQH == IQT, in other words the QI hardware thinks it has picked up
and validated all the descriptors submitted by software.
- The queue has successfully processed many operations (although
everything that we can see succeeded is type 2h, ie "IOTLB
invalidate" as opposed to the type 4h "Interrupt Entry Cache
invalidate" that we're stuck on.
- There haven't been any faults and FSTS is clear.
- The hardware hasn't executed the "Invalidation wait" descriptor.
Beyond that I'm not sure how to make much more progress without
insight into the hardware -- it looks like the driver is doing
everything right. Anyone from Intel have any thoughts?
Here's the actual output from the kernel log:
dmar: QI stuck waiting for index 252 / wait_index 253 IQH/T 00000fe0/00000fe0 FSTS 00000000
dmar: [ 0] stat 0 00000000000100f2 000001500027f000
dmar: [ 1] stat 0 0000000200000025 0000001fff412004
dmar: [ 2] stat 0 00000000000100f2 0000015000280000
dmar: [ 3] stat 0 0000000200000025 0000001fff41200c
dmar: [ 4] stat 0 00000000000100f2 0000015000281000
dmar: [ 5] stat 0 0000000200000025 0000001fff412014
dmar: [ 6] stat 0 00000000000100f2 0000015000282000
dmar: [ 7] stat 0 0000000200000025 0000001fff41201c
dmar: [ 8] stat 0 00000000000100f2 0000015000283000
dmar: [ 9] stat 0 0000000200000025 0000001fff412024
dmar: [ 10] stat 0 00000000000100f2 0000015000284000
dmar: [ 11] stat 0 0000000200000025 0000001fff41202c
dmar: [ 12] stat 0 00000000000100f2 0000015000285000
dmar: [ 13] stat 0 0000000200000025 0000001fff412034
dmar: [ 14] stat 0 00000000000100f2 0000015000286000
dmar: [ 15] stat 0 0000000200000025 0000001fff41203c
dmar: [ 16] stat 0 00000000000100f2 0000015000287000
dmar: [ 17] stat 0 0000000200000025 0000001fff412044
dmar: [ 18] stat 0 00000000000100f2 0000015000288000
dmar: [ 19] stat 0 0000000200000025 0000001fff41204c
dmar: [ 20] stat 0 00000000000100f2 0000015000289000
dmar: [ 21] stat 0 0000000200000025 0000001fff412054
dmar: [ 22] stat 0 00000000000100f2 000001500028a000
dmar: [ 23] stat 0 0000000200000025 0000001fff41205c
dmar: [ 24] stat 0 00000000000100f2 000001500026a000
dmar: [ 25] stat 0 0000000200000025 0000001fff412064
dmar: [ 26] stat 0 00000000000100f2 000001500026b000
dmar: [ 27] stat 0 0000000200000025 0000001fff41206c
dmar: [ 28] stat 0 00000000000100f2 000001500026c000
dmar: [ 29] stat 0 0000000200000025 0000001fff412074
dmar: [ 30] stat 0 00000000000100f2 000001500026d000
dmar: [ 31] stat 0 0000000200000025 0000001fff41207c
dmar: [ 32] stat 0 00000000000100f2 000001500026e000
dmar: [ 33] stat 0 0000000200000025 0000001fff412084
dmar: [ 34] stat 0 00000000000100f2 000001500026f000
dmar: [ 35] stat 0 0000000200000025 0000001fff41208c
dmar: [ 36] stat 0 00000000000100f2 0000015000270000
dmar: [ 37] stat 0 0000000200000025 0000001fff412094
dmar: [ 38] stat 0 00000000000100f2 0000015000271000
dmar: [ 39] stat 0 0000000200000025 0000001fff41209c
dmar: [ 40] stat 0 00000000000100f2 0000015000272000
dmar: [ 41] stat 0 0000000200000025 0000001fff4120a4
dmar: [ 42] stat 0 00000000000100f2 0000015000273000
dmar: [ 43] stat 0 0000000200000025 0000001fff4120ac
dmar: [ 44] stat 0 00000000000100f2 0000015000274000
dmar: [ 45] stat 0 0000000200000025 0000001fff4120b4
dmar: [ 46] stat 0 00000000000100f2 0000015000275000
dmar: [ 47] stat 0 0000000200000025 0000001fff4120bc
dmar: [ 48] stat 0 00000000000100f2 0000015000276000
dmar: [ 49] stat 0 0000000200000025 0000001fff4120c4
dmar: [ 50] stat 0 00000000000100f2 0000015000277000
dmar: [ 51] stat 0 0000000200000025 0000001fff4120cc
dmar: [ 52] stat 0 00000000000100f2 0000015000278000
dmar: [ 53] stat 0 0000000200000025 0000001fff4120d4
dmar: [ 54] stat 0 00000000000100f2 0000015000279000
dmar: [ 55] stat 0 0000000200000025 0000001fff4120dc
dmar: [ 56] stat 0 00000000000100f2 0000015000259000
dmar: [ 57] stat 0 0000000200000025 0000001fff4120e4
dmar: [ 58] stat 0 00000000000100f2 000001500025a000
dmar: [ 59] stat 0 0000000200000025 0000001fff4120ec
dmar: [ 60] stat 0 00000000000100f2 000001500025b000
dmar: [ 61] stat 0 0000000200000025 0000001fff4120f4
dmar: [ 62] stat 0 00000000000100f2 000001500025c000
dmar: [ 63] stat 0 0000000200000025 0000001fff4120fc
dmar: [ 64] stat 0 00000000000100f2 000001500025d000
dmar: [ 65] stat 0 0000000200000025 0000001fff412104
dmar: [ 66] stat 0 00000000000100f2 000001500025e000
dmar: [ 67] stat 0 0000000200000025 0000001fff41210c
dmar: [ 68] stat 0 00000000000100f2 000001500025f000
dmar: [ 69] stat 0 0000000200000025 0000001fff412114
dmar: [ 70] stat 0 00000000000100f2 0000015000260000
dmar: [ 71] stat 0 0000000200000025 0000001fff41211c
dmar: [ 72] stat 0 00000000000100f2 0000015000261000
dmar: [ 73] stat 0 0000000200000025 0000001fff412124
dmar: [ 74] stat 0 00000000000100f2 0000015000262000
dmar: [ 75] stat 0 0000000200000025 0000001fff41212c
dmar: [ 76] stat 0 00000000000100f2 0000015000263000
dmar: [ 77] stat 0 0000000200000025 0000001fff412134
dmar: [ 78] stat 0 00000000000100f2 0000015000264000
dmar: [ 79] stat 0 0000000200000025 0000001fff41213c
dmar: [ 80] stat 0 00000000000100f2 0000015000265000
dmar: [ 81] stat 0 0000000200000025 0000001fff412144
dmar: [ 82] stat 0 00000000000100f2 0000015000266000
dmar: [ 83] stat 0 0000000200000025 0000001fff41214c
dmar: [ 84] stat 0 00000000000100f2 0000015000267000
dmar: [ 85] stat 0 0000000200000025 0000001fff412154
dmar: [ 86] stat 0 00000000000100f2 0000015000268000
dmar: [ 87] stat 0 0000000200000025 0000001fff41215c
dmar: [ 88] stat 0 00000000000100f2 0000015000248000
dmar: [ 89] stat 0 0000000200000025 0000001fff412164
dmar: [ 90] stat 0 00000000000100f2 0000015000249000
dmar: [ 91] stat 0 0000000200000025 0000001fff41216c
dmar: [ 92] stat 0 00000000000100f2 000001500024a000
dmar: [ 93] stat 0 0000000200000025 0000001fff412174
dmar: [ 94] stat 0 00000000000100f2 000001500024b000
dmar: [ 95] stat 0 0000000200000025 0000001fff41217c
dmar: [ 96] stat 0 00000000000100f2 000001500024c000
dmar: [ 97] stat 0 0000000200000025 0000001fff412184
dmar: [ 98] stat 0 00000000000100f2 000001500024d000
dmar: [ 99] stat 0 0000000200000025 0000001fff41218c
dmar: [100] stat 0 00000000000100f2 000001500024e000
dmar: [101] stat 0 0000000200000025 0000001fff412194
dmar: [102] stat 0 00000000000100f2 000001500024f000
dmar: [103] stat 0 0000000200000025 0000001fff41219c
dmar: [104] stat 0 00000000000100f2 0000015000250000
dmar: [105] stat 0 0000000200000025 0000001fff4121a4
dmar: [106] stat 0 00000000000100f2 0000015000251000
dmar: [107] stat 0 0000000200000025 0000001fff4121ac
dmar: [108] stat 0 00000000000100f2 0000015000252000
dmar: [109] stat 0 0000000200000025 0000001fff4121b4
dmar: [110] stat 0 00000000000100f2 0000015000253000
dmar: [111] stat 0 0000000200000025 0000001fff4121bc
dmar: [112] stat 0 00000000000100f2 0000015000254000
dmar: [113] stat 0 0000000200000025 0000001fff4121c4
dmar: [114] stat 0 00000000000100f2 0000015000255000
dmar: [115] stat 0 0000000200000025 0000001fff4121cc
dmar: [116] stat 0 00000000000100f2 0000015000256000
dmar: [117] stat 0 0000000200000025 0000001fff4121d4
dmar: [118] stat 0 00000000000100f2 0000015000257000
dmar: [119] stat 0 0000000200000025 0000001fff4121dc
dmar: [120] stat 0 00000000000100f2 0000015000237000
dmar: [121] stat 0 0000000200000025 0000001fff4121e4
dmar: [122] stat 0 00000000000100f2 0000015000238000
dmar: [123] stat 0 0000000200000025 0000001fff4121ec
dmar: [124] stat 0 00000000000100f2 0000015000239000
dmar: [125] stat 0 0000000200000025 0000001fff4121f4
dmar: [126] stat 0 00000000000100f2 000001500023a000
dmar: [127] stat 0 0000000200000025 0000001fff4121fc
dmar: [128] stat 0 00000000000100f2 000001500023b000
dmar: [129] stat 0 0000000200000025 0000001fff412204
dmar: [130] stat 0 00000000000100f2 000001500023c000
dmar: [131] stat 0 0000000200000025 0000001fff41220c
dmar: [132] stat 0 00000000000100f2 000001500023d000
dmar: [133] stat 0 0000000200000025 0000001fff412214
dmar: [134] stat 0 00000000000100f2 000001500023e000
dmar: [135] stat 0 0000000200000025 0000001fff41221c
dmar: [136] stat 0 00000000000100f2 000001500023f000
dmar: [137] stat 0 0000000200000025 0000001fff412224
dmar: [138] stat 0 00000000000100f2 0000015000240000
dmar: [139] stat 0 0000000200000025 0000001fff41222c
dmar: [140] stat 0 00000000000100f2 0000015000241000
dmar: [141] stat 0 0000000200000025 0000001fff412234
dmar: [142] stat 0 00000000000100f2 0000015000242000
dmar: [143] stat 0 0000000200000025 0000001fff41223c
dmar: [144] stat 0 00000000000100f2 0000015000243000
dmar: [145] stat 0 0000000200000025 0000001fff412244
dmar: [146] stat 0 00000000000100f2 0000015000244000
dmar: [147] stat 0 0000000200000025 0000001fff41224c
dmar: [148] stat 0 00000000000100f2 0000015000245000
dmar: [149] stat 0 0000000200000025 0000001fff412254
dmar: [150] stat 0 00000000000100f2 0000015000246000
dmar: [151] stat 0 0000000200000025 0000001fff41225c
dmar: [152] stat 0 00000000000100f2 0000015000226000
dmar: [153] stat 0 0000000200000025 0000001fff412264
dmar: [154] stat 0 00000000000100f2 0000015000227000
dmar: [155] stat 0 0000000200000025 0000001fff41226c
dmar: [156] stat 0 00000000000100f2 0000015000228000
dmar: [157] stat 0 0000000200000025 0000001fff412274
dmar: [158] stat 0 00000000000100f2 0000015000229000
dmar: [159] stat 0 0000000200000025 0000001fff41227c
dmar: [160] stat 0 00000000000100f2 000001500022a000
dmar: [161] stat 0 0000000200000025 0000001fff412284
dmar: [162] stat 0 00000000000100f2 000001500022b000
dmar: [163] stat 0 0000000200000025 0000001fff41228c
dmar: [164] stat 0 00000000000100f2 000001500022c000
dmar: [165] stat 0 0000000200000025 0000001fff412294
dmar: [166] stat 0 00000000000100f2 000001500022d000
dmar: [167] stat 0 0000000200000025 0000001fff41229c
dmar: [168] stat 0 00000000000100f2 000001500022e000
dmar: [169] stat 0 0000000200000025 0000001fff4122a4
dmar: [170] stat 0 00000000000100f2 000001500022f000
dmar: [171] stat 0 0000000200000025 0000001fff4122ac
dmar: [172] stat 0 00000000000100f2 0000015000230000
dmar: [173] stat 0 0000000200000025 0000001fff4122b4
dmar: [174] stat 0 00000000000100f2 0000015000231000
dmar: [175] stat 0 0000000200000025 0000001fff4122bc
dmar: [176] stat 0 00000000000100f2 0000015000232000
dmar: [177] stat 0 0000000200000025 0000001fff4122c4
dmar: [178] stat 0 00000000000100f2 0000015000233000
dmar: [179] stat 0 0000000200000025 0000001fff4122cc
dmar: [180] stat 0 00000000000100f2 0000015000234000
dmar: [181] stat 0 0000000200000025 0000001fff4122d4
dmar: [182] stat 0 00000000000100f2 0000015000235000
dmar: [183] stat 0 0000000200000025 0000001fff4122dc
dmar: [184] stat 0 00000000000100f2 0000015000215000
dmar: [185] stat 0 0000000200000025 0000001fff4122e4
dmar: [186] stat 0 00000000000100f2 0000015000216000
dmar: [187] stat 0 0000000200000025 0000001fff4122ec
dmar: [188] stat 0 00000000000100f2 0000015000217000
dmar: [189] stat 0 0000000200000025 0000001fff4122f4
dmar: [190] stat 0 00000000000100f2 0000015000218000
dmar: [191] stat 0 0000000200000025 0000001fff4122fc
dmar: [192] stat 0 00000000000100f2 0000015000219000
dmar: [193] stat 0 0000000200000025 0000001fff412304
dmar: [194] stat 0 00000000000100f2 000001500021a000
dmar: [195] stat 0 0000000200000025 0000001fff41230c
dmar: [196] stat 0 00000000000100f2 000001500021b000
dmar: [197] stat 0 0000000200000025 0000001fff412314
dmar: [198] stat 0 00000000000100f2 000001500021c000
dmar: [199] stat 0 0000000200000025 0000001fff41231c
dmar: [200] stat 0 00000000000100f2 000001500021d000
dmar: [201] stat 0 0000000200000025 0000001fff412324
dmar: [202] stat 0 00000000000100f2 000001500021e000
dmar: [203] stat 0 0000000200000025 0000001fff41232c
dmar: [204] stat 0 00000000000100f2 000001500021f000
dmar: [205] stat 0 0000000200000025 0000001fff412334
dmar: [206] stat 0 00000000000100f2 0000015000220000
dmar: [207] stat 0 0000000200000025 0000001fff41233c
dmar: [208] stat 0 00000000000100f2 0000015000221000
dmar: [209] stat 0 0000000200000025 0000001fff412344
dmar: [210] stat 0 00000000000100f2 0000015000222000
dmar: [211] stat 0 0000000200000025 0000001fff41234c
dmar: [212] stat 0 00000000000100f2 0000015000223000
dmar: [213] stat 0 0000000200000025 0000001fff412354
dmar: [214] stat 0 00000000000100f2 0000015000224000
dmar: [215] stat 0 0000000200000025 0000001fff41235c
dmar: [216] stat 0 00000000000100f2 0000015000204000
dmar: [217] stat 0 0000000200000025 0000001fff412364
dmar: [218] stat 0 00000000000100f2 0000015000205000
dmar: [219] stat 0 0000000200000025 0000001fff41236c
dmar: [220] stat 0 00000000000100f2 0000015000206000
dmar: [221] stat 0 0000000200000025 0000001fff412374
dmar: [222] stat 0 00000000000100f2 0000015000207000
dmar: [223] stat 0 0000000200000025 0000001fff41237c
dmar: [224] stat 0 00000000000100f2 0000015000208000
dmar: [225] stat 0 0000000200000025 0000001fff412384
dmar: [226] stat 0 00000000000100f2 0000015000209000
dmar: [227] stat 0 0000000200000025 0000001fff41238c
dmar: [228] stat 0 00000000000100f2 000001500020a000
dmar: [229] stat 0 0000000200000025 0000001fff412394
dmar: [230] stat 0 00000000000100f2 000001500020b000
dmar: [231] stat 0 0000000200000025 0000001fff41239c
dmar: [232] stat 0 00000000000100f2 000001500020c000
dmar: [233] stat 0 0000000200000025 0000001fff4123a4
dmar: [234] stat 0 00000000000100f2 000001500020d000
dmar: [235] stat 0 0000000200000025 0000001fff4123ac
dmar: [236] stat 0 00000000000100f2 000001500020e000
dmar: [237] stat 0 0000000200000025 0000001fff4123b4
dmar: [238] stat 0 00000000000100f2 000001500020f000
dmar: [239] stat 0 0000000200000025 0000001fff4123bc
dmar: [240] stat 0 00000000000100f2 0000015000210000
dmar: [241] stat 0 0000000200000025 0000001fff4123c4
dmar: [242] stat 0 00000000000100f2 0000015000211000
dmar: [243] stat 0 0000000200000025 0000001fff4123cc
dmar: [244] stat 0 00000000000100f2 0000015000212000
dmar: [245] stat 0 0000000200000025 0000001fff4123d4
dmar: [246] stat 0 00000000000100f2 0000015000213000
dmar: [247] stat 0 0000000200000025 0000001fff4123dc
dmar: [248] stat 0 00000000000100f2 0000015000201000
dmar: [249] stat 0 0000000200000025 0000001fff4123e4
dmar: [250] stat 0 00000000000100f2 0000015000202000
dmar: [251] stat 0 0000000200000025 0000001fff4123ec
dmar: [252] stat 1 0000001410000014 0000000000000000
dmar: [253] stat 1 0000000200000025 0000001fff4123f4
dmar: [254] stat 0 00000000000100f2 000001500027e000
dmar: [255] stat 0 0000000200000025 0000001fff4123fc
------------[ cut here ]------------
kernel BUG at drivers/iommu/dmar.c:901!
invalid opcode: 0000 [#1] SMP
CPU: 1 PID: 5310 Comm: foed Tainted: G O 3.10.59+ #201501080140+8825998.platinum
task: ffff883fe5f743e0 ti: ffff883feb696000 task.ti: ffff883feb696000
RIP: 0010:[<ffffffff81529737>] [<ffffffff81529737>] qi_submit_sync+0x477/0x510
RSP: 0018:ffff883feb697a10 EFLAGS: 00010046
RAX: 0000000000000038 RBX: 0000000000000100 RCX: 00000000000000dd
RDX: 00000000000000e2 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff883feb697a78 R08: ffffffffffffffff R09: 0000000000003b72
R10: 0000000000000075 R11: 00000000000000d1 R12: ffff883ff2dda008
R13: 0000000000000400 R14: ffff881fff40c6c0 R15: ffff883ffec019c0
FS: 00007fb876bea700(0000) GS:ffff881fffa20000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f42963c9d50 CR3: 0000000003e0c000 CR4: 00000000001407e0
wait_for_sess_cmds: Completed cmd ffff882706266f70 tag 0x117514 tr_state 0x91 t_state 5 flags 0x4010b tas 0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff883feb697a88 0000000000000fd0 0000000000000fc0 ffff883ff2dd9000
0000000000000086 00000000000000fc 000000fc000000fd 000b415115c12cd0
ffff883fec30c800 0000000000000000 0000000000000246 ffff883fef949858
Call Trace:
[<ffffffff8152f6b5>] free_irte+0xc5/0x100
[<ffffffff81530834>] free_remapped_irq+0x44/0x60
[<ffffffff81027b23>] destroy_irq+0x33/0xd0
[<ffffffff81027ede>] native_teardown_msi_irq+0xe/0x10
[<ffffffff812a6a70>] default_teardown_msi_irqs+0x60/0x80
[<ffffffff812a64d9>] free_msi_irqs+0x99/0x150
[<ffffffff812a749d>] pci_disable_msix+0x3d/0x60
[<ffffffffa006b748>] vfio_msi_disable+0xc8/0xe0 [vfio_pci]
[<ffffffffa006bf86>] vfio_pci_set_msi_trigger+0x2a6/0x2d0 [vfio_pci]
[<ffffffffa006c41c>] vfio_pci_set_irqs_ioctl+0x8c/0xa0 [vfio_pci]
[<ffffffffa006a3b0>] vfio_pci_release+0x70/0x150 [vfio_pci]
[<ffffffffa0060cbc>] vfio_device_fops_release+0x1c/0x40 [vfio]
[<ffffffff8114d7db>] __fput+0xdb/0x220
[<ffffffff8114d92e>] ____fput+0xe/0x10
[<ffffffff810614ac>] task_work_run+0xbc/0xe0
[<ffffffff81043d0e>] do_exit+0x3ce/0xe50
[<ffffffff8104557f>] do_group_exit+0x3f/0xa0
[<ffffffff81054769>] get_signal_to_deliver+0x1a9/0x5b0
[<ffffffff810023f8>] do_signal+0x48/0x5e0
[<ffffffff81056778>] ? k_getrusage+0x368/0x3d0
[<ffffffff810595f0>] ? getrusage+0x30/0x40
[<ffffffff81009449>] ? read_tsc+0x9/0x20
[<ffffffff810029f5>] do_notify_resume+0x65/0x80
[<ffffffff81648ab8>] int_signal+0x12/0x17
Code: 89 de 4d 8b 04 24 48 c7 c7 b0 12 bd 81 83 c3 01 49 83 c4 10 42 8b
--------------------
And our debug patch (on top of 3.10.59):
diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index a7967ceb79e6..b9303fa53fbf 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -781,6 +782,8 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
* No new descriptors are fetched until the ITE is cleared.
*/
if (fault & DMA_FSTS_ITE) {
+ trace_printk("DMA_FSTS_ITE set (%08x)\n", fault);
+
head = readl(iommu->reg + DMAR_IQH_REG);
head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH;
head |= 1;
@@ -799,8 +802,10 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
return -EAGAIN;
}
- if (fault & DMA_FSTS_ICE)
+ if (fault & DMA_FSTS_ICE) {
+ trace_printk("DMA_FSTS_ICE set (%08x)\n", fault);
writel(DMA_FSTS_ICE, iommu->reg + DMAR_FSTS_REG);
+ }
return 0;
}
@@ -816,6 +821,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
struct qi_desc *hw, wait_desc;
int wait_index, index;
unsigned long flags;
+ cycles_t start_time;
if (!qi)
return 0;
@@ -826,8 +832,16 @@ restart:
rc = 0;
raw_spin_lock_irqsave(&qi->q_lock, flags);
+ start_time = get_cycles();
while (qi->free_cnt < 3) {
raw_spin_unlock_irqrestore(&qi->q_lock, flags);
+
+ if (get_cycles() - start_time > DMAR_OPERATION_TIMEOUT) {
+ printk(KERN_EMERG "Only %d free entries in invalidation queue.\n",
+ qi->free_cnt);
+ BUG();
+ }
+
cpu_relax();
raw_spin_lock_irqsave(&qi->q_lock, flags);
}
@@ -857,6 +871,7 @@ restart:
*/
writel(qi->free_head << DMAR_IQ_SHIFT, iommu->reg + DMAR_IQT_REG);
+ start_time = get_cycles();
while (qi->desc_status[wait_index] != QI_DONE) {
/*
* We will leave the interrupts disabled, to prevent interrupt
@@ -870,6 +885,22 @@ restart:
break;
raw_spin_unlock(&qi->q_lock);
+
+ if (get_cycles() - start_time > DMAR_OPERATION_TIMEOUT) {
+ int i;
+
+ pr_emerg("QI stuck waiting for index %d / wait_index %d IQH/T %08x/%08x FSTS %08x\n",
+ index, wait_index,
+ readl(iommu->reg + DMAR_IQH_REG),
+ readl(iommu->reg + DMAR_IQT_REG),
+ readl(iommu->reg + DMAR_FSTS_REG));
+ for (i = 0; i < QI_LENGTH; ++i)
+ pr_emerg(" [%3d] stat %x %016llx %016llx\n",
+ i, qi->desc_status[i], hw[i].low, hw[i].high);
+
+ BUG();
+ }
+
cpu_relax();
raw_spin_lock(&qi->q_lock);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Hang (due to HW?) in qi_submit_sync()
[not found] ` <1420756610-20918-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2015-01-08 23:39 ` Alex Williamson
0 siblings, 0 replies; 6+ messages in thread
From: Alex Williamson @ 2015-01-08 23:39 UTC (permalink / raw)
To: Roland Dreier
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jiang Liu
On Thu, 2015-01-08 at 14:36 -0800, Roland Dreier wrote:
> Hi,
>
> So we've managed to find one magic setup and workload that reproduces
> this reliably, and I've done a bit more debugging that leads me to
> believe we probably need Intel's help to really get to the bottom of
> this.
Can you share your test case? My test case was also using vfio,
repeatedly booting a pair of VMs with assigned GPUs, but I haven't seen
any sign of the issue without NET_DMA. I agree that it seems like the
hardware is getting somewhere that it shouldn't regardless of the
software, but are you able to reproduce on a more recent kernel?
I've also seen a report of the same problem without vfio by manipulating
IRQ affinity, but it was not a reliable test case.
> One question though: you mentioned that you saw this behavior until
> you turned off CONFIG_NET_DMA. What platform was that on? We see
> this on dual socket Xeon E5 v3 (Haswell EP / Grantley), and I don't
> really have any other setup I can try. Did you see this on other
> platforms (Ivy Bridge / Romley maybe)?
I saw it on a dual socket Xeon E5 v2 (Ivy Bridge EP / Patsburg), and the
other report I mention above was the same, different systems though.
> Anyway, I added the debugging patch at the end of this mail to our
> kernel to dump some status when the driver detects a hung queue.
> Below we see some example output. Things I notice:
>
> - IQH == IQT, in other words the QI hardware thinks it has picked up
> and validated all the descriptors submitted by software.
> - The queue has successfully processed many operations (although
> everything that we can see succeeded is type 2h, ie "IOTLB
> invalidate" as opposed to the type 4h "Interrupt Entry Cache
> invalidate" that we're stuck on.
> - There haven't been any faults and FSTS is clear.
> - The hardware hasn't executed the "Invalidation wait" descriptor.
This is all consistent with my observations as well.
> Beyond that I'm not sure how to make much more progress without
> insight into the hardware -- it looks like the driver is doing
> everything right. Anyone from Intel have any thoughts?
Agreed, the hardware appears to be getting wedged without and sort of
fault indication or recovery mechanism. Thanks,
Alex
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-08 23:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-06 0:57 Hang (due to HW?) in qi_submit_sync() Roland Dreier
[not found] ` <1420505840-30096-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-01-06 2:54 ` Alex Williamson
[not found] ` <1420512860.3541.77.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-06 4:39 ` Roland Dreier
[not found] ` <CAG4TOxPOgTOKYZa7q9Of8XzHV_wAadtJmXFC0bmyN2Qds7T9RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-06 6:48 ` Alex Williamson
[not found] ` <1420526901.3541.96.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-08 22:36 ` Roland Dreier
[not found] ` <1420756610-20918-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-01-08 23:39 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox