From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Jiang Liu <jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Subject: Re: Hang (due to HW?) in qi_submit_sync()
Date: Mon, 05 Jan 2015 19:54:20 -0700 [thread overview]
Message-ID: <1420512860.3541.77.camel@redhat.com> (raw)
In-Reply-To: <1420505840-30096-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
On Mon, 2015-01-05 at 16:57 -0800, Roland Dreier wrote:
> From: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
>
> Hi, we're running kernel 3.10.59 (pretty recent long-term kernel) on a
> 2-socket Xeon E5 v3 (Haswell) system. We're using vfio to access some
> PCI devices from userspace, and occasionally when we kill a process,
> we see the system hang in qi_submit_sync().
>
> Based on a very old patch from Intel <https://lkml.org/lkml/2009/5/20/341>,
> we added code to the dmar driver:
>
> int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu )
> {
>
> //...
>
> /*
> * update the HW tail register indicating the presence of
> * new descriptors.
> */
> writel(qi->free_head << DMAR_IQ_SHIFT, iommu->reg + DMAR_IQT_REG);
>
> start_time = get_cycles();
> while (qi->desc_status[wait_index] != QI_DONE) {
> /*
> * We will leave the interrupts disabled, to prevent interrupt
> * context to queue another cmd while a cmd is already submitted
> * and waiting for completion on this cpu. This is to avoid
> * a deadlock where the interrupt context can wait indefinitely
> * for free slots in the queue.
> */
> rc = qi_check_fault(iommu, index);
> if (rc)
> break;
>
> raw_spin_unlock(&qi->q_lock);
>
> // We added this -->
> if (get_cycles() - start_time > DMAR_OPERATION_TIMEOUT) {
> printk(KERN_EMERG "desc_status[%d] = %d.\n",
> wait_index, qi->desc_status[wait_index]);
> /* line 888: */ BUG();
> }
> // <-- to here
>
> cpu_relax();
> raw_spin_lock(&qi->q_lock);
> }
>
> and indeed when the system hangs, we see for example
>
> desc_status[69] = 1.
> ------------[ cut here ]------------
> kernel BUG at drivers/iommu/dmar.c:888!
> CPU: 8 PID: 12211 Comm: foed Tainted: P O 3.10.59+ #201412290537+4e4984e.platinum
> task: ffff88275ac643e0 ti: ffff8825d329a000 task.ti: ffff8825d329a000
> RIP: 0010:[<ffffffff81529737>] [<ffffffff81529737>] qi_submit_sync+0x3f7/0x490
> RSP: 0018:ffff8825d329ba10 EFLAGS: 00010092
> RAX: 0000000000000014 RBX: 0000000000000044 RCX: ffff881fffb0ec00
> RDX: 0000000000000000 RSI: ffff881fffb0d048 RDI: 0000000000000046
> RBP: ffff8825d329ba78 R08: ffffffffffffffff R09: 000000000001a4a1
> R10: 0000000000000051 R11: 00000000000000e4 R12: 00007068faa64fc8
> R13: ffff881fff40c780 R14: 0000000000000114 R15: ffff883ffec01a00
> FS: 00007f3c86ffb700(0000) GS:ffff881fffb00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f996d3f1ba0 CR3: 00000026222f0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Stack:
> ffff8825d329ba88 0000000000000450 0000000000000440 ffff881ff3215000
> 00000044d329bb18 0000000000000086 0000000000000044 ffff882500000045
> ffff881ff12b1600 0000000000000000 0000000000000246 ffff881ff278e858
> Call Trace:
> [<ffffffff8152f6b5>] free_irte+0xc5/0x100
> [<ffffffff81530834>] free_remapped_irq+0x44/0x60
> [<ffffffff81027b23>] destroy_irq+0x33/0xd0
> [<ffffffff81027ede>] native_teardown_msi_irq+0xe/0x10
> [<ffffffff812a6a70>] default_teardown_msi_irqs+0x60/0x80
> [<ffffffff812a64d9>] free_msi_irqs+0x99/0x150
> [<ffffffff812a749d>] pci_disable_msix+0x3d/0x60
> [<ffffffffa0078748>] vfio_msi_disable+0xc8/0xe0 [vfio_pci]
> [<ffffffffa0078f86>] vfio_pci_set_msi_trigger+0x2a6/0x2d0 [vfio_pci]
> [<ffffffffa007941c>] vfio_pci_set_irqs_ioctl+0x8c/0xa0 [vfio_pci]
> [<ffffffffa00773b0>] vfio_pci_release+0x70/0x150 [vfio_pci]
> [<ffffffffa006dcbc>] vfio_device_fops_release+0x1c/0x40 [vfio]
> [<ffffffff8114d7db>] __fput+0xdb/0x220
> [<ffffffff8114d92e>] ____fput+0xe/0x10
> [<ffffffff810614ac>] task_work_run+0xbc/0xe0
> [<ffffffff81043d0e>] do_exit+0x3ce/0xe50
> [<ffffffff8104557f>] do_group_exit+0x3f/0xa0
> [<ffffffff81054769>] get_signal_to_deliver+0x1a9/0x5b0
> [<ffffffff810023f8>] do_signal+0x48/0x5e0
>
> as far as I can understand the driver, this is a "shouldn't happen,
> your hardware is broken" occurrence. However I haven't been able to
> find any relevant looking sightings for our CPU.
>
> Does anyone from Intel (or elsewhere) have any suggestions on how to
> chase this further?
Try disabling CONFIG_NET_DMA
next prev parent reply other threads:[~2015-01-06 2:54 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-06 0:57 Hang (due to HW?) in qi_submit_sync() Roland Dreier
[not found] ` <1420505840-30096-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-01-06 2:54 ` Alex Williamson [this message]
[not found] ` <1420512860.3541.77.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-06 4:39 ` Roland Dreier
[not found] ` <CAG4TOxPOgTOKYZa7q9Of8XzHV_wAadtJmXFC0bmyN2Qds7T9RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-06 6:48 ` Alex Williamson
[not found] ` <1420526901.3541.96.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-08 22:36 ` Roland Dreier
[not found] ` <1420756610-20918-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-01-08 23:39 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1420512860.3541.77.camel@redhat.com \
--to=alex.williamson-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.