Linux IOMMU Development
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Jiang Liu <jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Subject: Re: Hang (due to HW?) in qi_submit_sync()
Date: Mon, 05 Jan 2015 23:48:21 -0700	[thread overview]
Message-ID: <1420526901.3541.96.camel@redhat.com> (raw)
In-Reply-To: <CAG4TOxPOgTOKYZa7q9Of8XzHV_wAadtJmXFC0bmyN2Qds7T9RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, 2015-01-05 at 20:39 -0800, Roland Dreier wrote:
> On Mon, Jan 5, 2015 at 6:54 PM, Alex Williamson
> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > Try disabling CONFIG_NET_DMA
> 
> We already have that disabled (well, in 3.10 it depends on BROKEN, and
> we don't have BROKEN enabled :).

Only since v3.10.26 is it marked BROKEN.

> However I'm curious why you suggest that.  Because some of the devices
> we're accessing via vfio are in fact Intel DMA engines (we blacklist
> ioatdma and use the DMA devices directly from userspace).  Is there
> some known interaction between the Intel DMA engines and interrupt
> remapping?

I suggest it because I spent several weeks isolating why we hit the same
lockup in qi_submit_sync() as you're seeing after we backported a number
of iommu patches.  In our case, finding that it's toggled by
CONFIG_NET_DMA, which has since been marked broken and removed is a
sufficient solution.  I agree though that regardless of what terrible
things NET_DMA was doing, we seem to be hitting a "broken hardware"
condition, potentially invoked by the DMA engine.

What I observed was that it occurs when flushing an irte entry, the
queued invalidation queue is working prior to this flush, but the wait
descriptor value is never written to the status address, the queue head
never advances past that wait descriptor once it gets wedged, and the
status register never indicates any sort of error.  Section 6.5.6 of the
VT-d spec on interrupt draining has an interesting statement:

        Interrupt draining is performed on Interrupt Entry Cache (IEC)
        invalidation requests. For IEC invalidations submitted through
        the queued invalidation interface, interrupt draining must be
        completed before the next Invalidation Wait Descriptor is
        completed by hardware.

Given the circumstances of the hang, that certainly makes me suspect
that queued invalidation is failing to complete interrupt draining and
hardware is therefore unable to advance past the subsequent wait
descriptor or complete the wait descriptor because of this requirement.
I wouldn't be surprised if there's an errata hidden in there somewhere.
Thanks,

Alex

PS - Thanks for using vfio :)

  parent reply	other threads:[~2015-01-06  6:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-06  0:57 Hang (due to HW?) in qi_submit_sync() Roland Dreier
     [not found] ` <1420505840-30096-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-01-06  2:54   ` Alex Williamson
     [not found]     ` <1420512860.3541.77.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-06  4:39       ` Roland Dreier
     [not found]         ` <CAG4TOxPOgTOKYZa7q9Of8XzHV_wAadtJmXFC0bmyN2Qds7T9RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-06  6:48           ` Alex Williamson [this message]
     [not found]             ` <1420526901.3541.96.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-08 22:36               ` Roland Dreier
     [not found]                 ` <1420756610-20918-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-01-08 23:39                   ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1420526901.3541.96.camel@redhat.com \
    --to=alex.williamson-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox