xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: xen-devel@lists.xenproject.org, ian.campbell@citrix.com,
	ian.jackson@eu.citrix.com, jbeulich@suse.com, keir@xen.org,
	tim@xen.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: [PATCH v5 for-xen-4.5 2/3] dpci: Safeguard against race with hvm_dirq_assist crashing and pt_irq_[create|destroy]_bind
Date: Fri, 19 Sep 2014 14:51:36 -0400	[thread overview]
Message-ID: <1411152697-13765-3-git-send-email-konrad.wilk@oracle.com> (raw)
In-Reply-To: <1411152697-13765-1-git-send-email-konrad.wilk@oracle.com>

There is a potential for race where the pt_irq_destroy_bind is called while
the interrupt for the PIRQ has been taken - and hvm_dirq_assist is scheduled
for execution.

The timing window is short enough - and the issue we can encounter is that
the hvm_dirq_assist could get a page fault while trying to access the
'struct domain' which was extracted from 'struct hvm_pirq_dpci->dom' field
- while the pt_irq_destroy_bind resets 'struct hvm_pirq_dpci->dom' to
NULL. Imagine this example:

CPU0                              CPU1
   do_IRQ                        pt_irq_destroy_bind
     -> __do_IRQ_guest
        -> schedule_dpci_for
      ...                          pirq_guest_unbind
                                   [action unbound]
                                   pirq_dpci->dom = NULL;
      ...
      dpci_softirq
         hvm_dirq_assist
         pirq_dpci->dom dereference
         [BOOM!]

The way to fix this is by:
 a) Do not reset ->dom field to NULL at all. Only allow 'dpci_kill'
    once it has established that the hvm_dirq_assist is done - to
    reset the field.
 b) At the start of 'pt_irq_create_bind' check if the softirq
    is still running (by using dpci_kill) - and if it is spin
    - but only for less than a second. If the time is greater
    than a second return -EAGAIN and let QEMU handle it
    (which is that it returns to the OS that the MSI was not set).
    If it was less than a second restart the creation/re-use process.

The b) could be done in 'pt_irq_destroy_bind' as well if that
would be better. Either option works.

The return of -EAGAIN could be removed as QEMU is not that smart
to handle the error.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/drivers/passthrough/io.c | 54 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 5f82aba..13433a5 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -163,6 +163,7 @@ int pt_irq_create_bind(
     struct pirq *info;
     int rc, pirq = pt_irq_bind->machine_irq;
 
+ restart:
     if ( pirq < 0 || pirq >= d->nr_pirqs )
         return -EINVAL;
 
@@ -194,15 +195,34 @@ int pt_irq_create_bind(
     }
     pirq_dpci = pirq_dpci(info);
 
-    /* Something is horrible wrong if it has been scheduled or is running. */
-    ASSERT(pirq_dpci->state == 0);
+    /*
+     * We can hit this scenario if the dpci_softirq is running while we are
+     * doing pt_irq_destroy_bind followed by this call on another CPU.
+     * We have to wait until the dpci_softirq is done.
+     */
+    if ( pirq_dpci->state )
+    {
+        s_time_t start;
 
-    /* Guest unbinds (pt_irq_destroy_bind) and rebinds it back. */
-    if ( !pirq_dpci->dom )
-        pirq_dpci->dom = d;
+        /* Need to drop the lock so hvm_dirq_assist can progress. */
+        spin_unlock(&d->event_lock);
+        start = NOW();
+        while ( dpci_kill(pirq_dpci) )
+        {
+            process_pending_softirqs();
+            if ( (NOW() - start) >> 30 )
+            {
+                return -EAGAIN;
+            }
+        }
+        /* And since we dropped the lock, another pt_irq_[destroy|create]_bind
+         * could be called. Lets start from scratch.
+         */
+        goto restart;
+    }
 
     /* Otherwise 'schedule_dpci_for' will crash. */
-    ASSERT(pirq_dpci->dom == d);
+    pirq_dpci->dom = d;
 
     switch ( pt_irq_bind->irq_type )
     {
@@ -229,9 +249,15 @@ int pt_irq_create_bind(
             {
                 pirq_dpci->gmsi.gflags = 0;
                 pirq_dpci->gmsi.gvec = 0;
-                pirq_dpci->dom = NULL;
                 pirq_dpci->flags = 0;
-                dpci_kill(pirq_dpci);
+                /*
+                 * If we fail, we have flags reset and hvm_dirq_assist will
+                 * just exit. We cannot touch pirq_dpci->dom unless
+                 * the softirq is truly dead - and we reap it at the start or
+                 * in pci_clean_dpci_irq.
+                 */
+                if ( !dpci_kill(pirq_dpci) )
+                    pirq_dpci->dom = NULL;
                 pirq_cleanup_check(info, d);
                 spin_unlock(&d->event_lock);
                 return rc;
@@ -332,12 +358,18 @@ int pt_irq_create_bind(
             {
                 if ( pt_irq_need_timer(pirq_dpci->flags) )
                     kill_timer(&pirq_dpci->timer);
-                pirq_dpci->dom = NULL;
                 list_del(&girq->list);
                 list_del(&digl->list);
                 hvm_irq_dpci->link_cnt[link]--;
                 pirq_dpci->flags = 0;
-                dpci_kill(pirq_dpci);
+                /*
+                 * If we fail, we have flags reset and hvm_dirq_assist will
+                 * just exit. We cannot touch pirq_dpci->dom unless
+                 * the softirq is truly dead - and we reap it at the start or
+                 * in pci_clean_dpci_irq.
+                 */
+                if ( !dpci_kill(pirq_dpci) )
+                    pirq_dpci->dom = NULL;
                 pirq_cleanup_check(info, d);
                 spin_unlock(&d->event_lock);
                 xfree(girq);
@@ -466,7 +498,6 @@ int pt_irq_destroy_bind(
         msixtbl_pt_unregister(d, pirq);
         if ( pt_irq_need_timer(pirq_dpci->flags) )
             kill_timer(&pirq_dpci->timer);
-        pirq_dpci->dom   = NULL;
         pirq_dpci->flags = 0;
         dpci_kill(pirq_dpci);
         pirq_cleanup_check(pirq, d);
@@ -491,7 +522,6 @@ void pt_pirq_init(struct domain *d, struct hvm_pirq_dpci *dpci)
 {
     INIT_LIST_HEAD(&dpci->digl_list);
     INIT_LIST_HEAD(&dpci->softirq_list);
-    dpci->dom = d;
     dpci->gmsi.dest_vcpu_id = -1;
 }
 
-- 
1.9.3

  parent reply	other threads:[~2014-09-19 18:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-19 18:51 [PATCH v5] Fix interrupt latency of HVM PCI passthroughdevices Konrad Rzeszutek Wilk
2014-09-19 18:51 ` [PATCH v5 for-xen-4.5 1/3] dpci: Replace tasklet with an softirq (v5) Konrad Rzeszutek Wilk
2014-09-22 11:58   ` Jan Beulich
2014-09-22 14:31     ` Konrad Rzeszutek Wilk
2014-09-22 15:31       ` Jan Beulich
2014-09-19 18:51 ` Konrad Rzeszutek Wilk [this message]
2014-09-19 18:51 ` [PATCH v5 for-xen-4.5 3/3] dpci: In hvm_dirq_assist stop using pt_pirq_iterate Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1411152697-13765-3-git-send-email-konrad.wilk@oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).