xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6] Fix interrupt latency of HVM PCI passthrough devices.
@ 2014-09-23  2:10 Konrad Rzeszutek Wilk
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model Konrad Rzeszutek Wilk
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-23  2:10 UTC (permalink / raw)
  To: JBeulich, ian.campbell, ian.jackson, xen-devel, keir, tim

Changelog:

since v5 (http://lists.xen.org/archives/html/xen-devel/2014-09/msg02868.html)
 - Redid the series based on Jan's feedback
since v4 
(http://lists.xen.org/archives/html/xen-devel/2014-09/msg01676.html):
 - Ditch the domain centric mechansim.
 - Fix issues raised by Jan.

The big change is on ditching of having a domain centric view and
focus on having it all revolve around 'struct hvm_pirq_dpci'
structure. That all works - except that we must be very careful
with the 'dom' field. The patches optimize the mechanism we use
for injecting interrupts in a guest and reduce the latency of
delievering them to a guest.

 xen/arch/x86/domain.c         |   4 +-
 xen/drivers/passthrough/io.c  | 202 ++++++++++++++++++++++++++++++++++++------
 xen/drivers/passthrough/pci.c |  29 ++++--
 xen/include/xen/hvm/irq.h     |   3 +-
 xen/include/xen/pci.h         |   2 +-
 xen/include/xen/softirq.h     |   3 +
 6 files changed, 203 insertions(+), 40 deletions(-)

Konrad Rzeszutek Wilk (3):
      dpci: Move from domain centric model to hvm_dirq_dpci model.
      dpci: In hvm_dirq_assist stop using pt_pirq_iterate
      dpci: Replace tasklet with an softirq (v6)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
  2014-09-23  2:10 [PATCH v6] Fix interrupt latency of HVM PCI passthrough devices Konrad Rzeszutek Wilk
@ 2014-09-23  2:10 ` Konrad Rzeszutek Wilk
  2014-09-25 14:24   ` Jan Beulich
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 2/3] dpci: In hvm_dirq_assist stop using pt_pirq_iterate Konrad Rzeszutek Wilk
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6) Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-23  2:10 UTC (permalink / raw)
  To: JBeulich, ian.campbell, ian.jackson, xen-devel, keir, tim
  Cc: Konrad Rzeszutek Wilk

When an interrupt for an PCI (or PCIe) passthrough device
is to be sent to a guest, we find the appropiate
'hvm_dirq_dpci' structure for the interrupt (PIRQ), set
a bit, and schedule an tasklet.

Then the 'hvm_dirq_assist' tasklet gets called with the 'struct
domain' from which we it iterates over all of the 'hvm_dirq_dpci'
which are mapped to the guest. However, it is important to
note that when we setup the 'hvm_dirq_dpci' we have a field
for the 'struct domain' that we can use instead of
having to pass in the 'struct domain'.

As such this patch moves the tasklet to the 'struct hvm_dirq_dpci'
and sets the 'dom' field to the domain.

We have to be careful with this as that means we cannot
set 'dom' to NULL until the tasklet has run. As such
this patch introduces 'pt_pirq_reset' which resets the
'hvm_dirq_dpci' structure for proper usage during setup.

The mechanism to tear it down is more complex as there
are two ways it can be executed:

 a) pci_clean_dpci_irq. This gets called when the guest is
    being destroyed. We end up calling 'tasklet_kill'.

    The scenarios in which the 'struct pirq' (and subsequently
    the 'hvm_pirq_dpci') gets destroyed is when:

    - guest did not use the pirq at all after setup.
    - guest did use pirq, but decided to mask and left it in that
      state.
    - guest did use pirq, but crashed.

    In all of those scenarios we end up calling 'tasklet_kill'
    which will spin on the tasklet if it is running.

 b) pt_irq_destroy_bind (guest disables the MSI). We double-check
    that the softirq has run by piggy-backing on the existing
    'pirq_cleanup_check' mechanism which calls 'pt_pirq_cleanup_check'.
    We add the extra call to 'pt_pirq_softirq_active' in
    'pt_pirq_cleanup_check'.

    NOTE: Guests that use event channels unbind first the
    event channel from PIRQs, so the 'pt_pirq_cleanup_check'
    won't be called as eventch is set to zero. In that case
    we either clean it up via the a) mechanism. It is OK
    to re-use the tasklet when 'pt_irq_create_bind' is called
    afterwards.

    There is an extra scenario regardless of eventch being
    set or not: the guest did 'pt_irq_destroy_bind' while an
    interrupt was triggered and tasklet was scheduled (but had not
    been run). It is OK to still run the tasklet as
    hvm_dirq_assist won't do anything (as the flags are
    set to zero).

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/drivers/passthrough/io.c  | 39 ++++++++++++++++++++++++++++++---------
 xen/drivers/passthrough/pci.c |  4 ++--
 xen/include/xen/hvm/irq.h     |  2 +-
 3 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 4cd32b5..59b5c09 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -90,6 +90,25 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
     xfree(dpci);
 }
 
+/*
+ * Reset the 'struct hvm_pirq_dpci' values to proper states.
+ *
+ * N.B:
+ * The 'pt_irq_create_bind' can be called right after 'pt_irq_destroy_bind'
+ * was called. The 'pirq_cleanup_check' which would free the structure
+ * is only called if the event channel for the PIRQ is active. However
+ * OS-es that use event channels usually bind the PIRQ to an event channel
+ * and also unbind it before 'pt_irq_destroy_bind' is called which means
+ * we end up re-using the 'dpci' structure. This can be easily reproduced
+ * with unloading and loading the driver for the device.
+ *
+ * As such on every 'pt_irq_create_bind' call we MUST reset the values.
+ */
+static void pt_pirq_reset(struct domain *d, struct hvm_pirq_dpci *dpci)
+{
+    dpci->dom = d;
+}
+
 int pt_irq_create_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -114,9 +133,6 @@ int pt_irq_create_bind(
             spin_unlock(&d->event_lock);
             return -ENOMEM;
         }
-        softirq_tasklet_init(
-            &hvm_irq_dpci->dirq_tasklet,
-            hvm_dirq_assist, (unsigned long)d);
         for ( i = 0; i < NR_HVM_IRQS; i++ )
             INIT_LIST_HEAD(&hvm_irq_dpci->girq[i]);
 
@@ -130,6 +146,7 @@ int pt_irq_create_bind(
         return -ENOMEM;
     }
     pirq_dpci = pirq_dpci(info);
+    pt_pirq_reset(d, pirq_dpci);
 
     switch ( pt_irq_bind->irq_type )
     {
@@ -232,7 +249,6 @@ int pt_irq_create_bind(
         {
             unsigned int share;
 
-            pirq_dpci->dom = d;
             if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
             {
                 pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
@@ -258,7 +274,6 @@ int pt_irq_create_bind(
             {
                 if ( pt_irq_need_timer(pirq_dpci->flags) )
                     kill_timer(&pirq_dpci->timer);
-                pirq_dpci->dom = NULL;
                 list_del(&girq->list);
                 list_del(&digl->list);
                 hvm_irq_dpci->link_cnt[link]--;
@@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
         msixtbl_pt_unregister(d, pirq);
         if ( pt_irq_need_timer(pirq_dpci->flags) )
             kill_timer(&pirq_dpci->timer);
-        pirq_dpci->dom   = NULL;
         pirq_dpci->flags = 0;
         pirq_cleanup_check(pirq, d);
     }
@@ -415,11 +429,18 @@ void pt_pirq_init(struct domain *d, struct hvm_pirq_dpci *dpci)
 {
     INIT_LIST_HEAD(&dpci->digl_list);
     dpci->gmsi.dest_vcpu_id = -1;
+    softirq_tasklet_init(&dpci->tasklet, hvm_dirq_assist, (unsigned long)dpci);
 }
 
 bool_t pt_pirq_cleanup_check(struct hvm_pirq_dpci *dpci)
 {
-    return !dpci->flags;
+    if ( !dpci->flags )
+    {
+        tasklet_kill(&dpci->tasklet);
+        dpci->dom = NULL;
+        return 1;
+    }
+    return 0;
 }
 
 int pt_pirq_iterate(struct domain *d,
@@ -459,7 +480,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
         return 0;
 
     pirq_dpci->masked = 1;
-    tasklet_schedule(&dpci->dirq_tasklet);
+    tasklet_schedule(&pirq_dpci->tasklet);
     return 1;
 }
 
@@ -564,7 +585,7 @@ static int _hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
 
 static void hvm_dirq_assist(unsigned long _d)
 {
-    struct domain *d = (struct domain *)_d;
+    struct domain *d = ((struct hvm_pirq_dpci *)_d)->dom;
 
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 1eba833..81e8a3a 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -767,6 +767,8 @@ static int pci_clean_dpci_irq(struct domain *d,
         xfree(digl);
     }
 
+    tasklet_kill(&pirq_dpci->tasklet);
+
     return 0;
 }
 
@@ -784,8 +786,6 @@ static void pci_clean_dpci_irqs(struct domain *d)
     hvm_irq_dpci = domain_get_irq_dpci(d);
     if ( hvm_irq_dpci != NULL )
     {
-        tasklet_kill(&hvm_irq_dpci->dirq_tasklet);
-
         pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
 
         d->arch.hvm_domain.irq.dpci = NULL;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index c89f4b1..94a550a 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -88,7 +88,6 @@ struct hvm_irq_dpci {
     DECLARE_BITMAP(isairq_map, NR_ISAIRQS);
     /* Record of mapped Links */
     uint8_t link_cnt[NR_LINK];
-    struct tasklet dirq_tasklet;
 };
 
 /* Machine IRQ to guest device/intx mapping. */
@@ -100,6 +99,7 @@ struct hvm_pirq_dpci {
     struct domain *dom;
     struct hvm_gmsi_info gmsi;
     struct timer timer;
+    struct tasklet tasklet;
 };
 
 void pt_pirq_init(struct domain *, struct hvm_pirq_dpci *);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 for-xen-4.5 2/3] dpci: In hvm_dirq_assist stop using pt_pirq_iterate
  2014-09-23  2:10 [PATCH v6] Fix interrupt latency of HVM PCI passthrough devices Konrad Rzeszutek Wilk
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model Konrad Rzeszutek Wilk
@ 2014-09-23  2:10 ` Konrad Rzeszutek Wilk
  2014-09-25 14:29   ` Jan Beulich
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6) Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-23  2:10 UTC (permalink / raw)
  To: JBeulich, ian.campbell, ian.jackson, xen-devel, keir, tim
  Cc: Konrad Rzeszutek Wilk

When an interrupt for an PCI (or PCIe) passthrough device
is to be sent to a guest, we find the appropiate
'hvm_dirq_dpci' structure for the interrupt (PIRQ), set
a bit, and schedule an tasklet.

As we are now called from dpci_softirq with the outstanding
'struct hvm_pirq_dpci' there is no need to call pt_pirq_iterate
which will iterate over all of the PIRQs and call us with every
one that is mapped. And then _hvm_dirq_assist figuring out
which one to execute.

This is a inefficient and not fair as:
 - We iterate starting at PIRQ 0 and up every time. That means
   the PCIe devices that have lower PIRQs get to be called
   first.
 - If we have many PCIe devices passed in with many PIRQs and
   most of the time only the highest numbered PIRQ gets an
   interrupt - we iterate over many PIRQs.

Since we know which 'hvm_pirq_dpci' to check - as the tasklet is
called for a specific 'struct hvm_pirq_dpci' - we do that
in this patch.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/drivers/passthrough/io.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 59b5c09..d2a1214 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -534,9 +534,14 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector)
     spin_unlock(&d->event_lock);
 }
 
-static int _hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
-                            void *arg)
+static void hvm_dirq_assist(unsigned long _d)
 {
+    struct hvm_pirq_dpci *pirq_dpci = (struct hvm_pirq_dpci *)_d;
+    struct domain *d = pirq_dpci->dom;
+
+    ASSERT(d->arch.hvm_domain.irq.dpci);
+
+    spin_lock(&d->event_lock);
     if ( test_and_clear_bool(pirq_dpci->masked) )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
@@ -547,13 +552,13 @@ static int _hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
             send_guest_pirq(d, pirq);
 
             if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
-                return 0;
+                goto out;
         }
 
         if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI )
         {
             vmsi_deliver_pirq(d, pirq_dpci);
-            return 0;
+            goto out;
         }
 
         list_for_each_entry ( digl, &pirq_dpci->digl_list, list )
@@ -566,7 +571,7 @@ static int _hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
         {
             /* for translated MSI to INTx interrupt, eoi as early as possible */
             __msi_pirq_eoi(pirq_dpci);
-            return 0;
+            goto out;
         }
 
         /*
@@ -579,18 +584,7 @@ static int _hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
         ASSERT(pt_irq_need_timer(pirq_dpci->flags));
         set_timer(&pirq_dpci->timer, NOW() + PT_IRQ_TIME_OUT);
     }
-
-    return 0;
-}
-
-static void hvm_dirq_assist(unsigned long _d)
-{
-    struct domain *d = ((struct hvm_pirq_dpci *)_d)->dom;
-
-    ASSERT(d->arch.hvm_domain.irq.dpci);
-
-    spin_lock(&d->event_lock);
-    pt_pirq_iterate(d, _hvm_dirq_assist, NULL);
+ out:
     spin_unlock(&d->event_lock);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6)
  2014-09-23  2:10 [PATCH v6] Fix interrupt latency of HVM PCI passthrough devices Konrad Rzeszutek Wilk
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model Konrad Rzeszutek Wilk
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 2/3] dpci: In hvm_dirq_assist stop using pt_pirq_iterate Konrad Rzeszutek Wilk
@ 2014-09-23  2:10 ` Konrad Rzeszutek Wilk
  2014-09-25 14:55   ` Jan Beulich
  2 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-23  2:10 UTC (permalink / raw)
  To: JBeulich, ian.campbell, ian.jackson, xen-devel, keir, tim
  Cc: Konrad Rzeszutek Wilk

The existing tasklet mechanism has a single global
spinlock that is taken every-time the global list
is touched. And we use this lock quite a lot - when
we call do_tasklet_work which is called via an softirq
and from the idle loop. We take the lock on any
operation on the tasklet_list.

The problem we are facing is that there are quite a lot of
tasklets scheduled. The most common one that is invoked is
the one injecting the VIRQ_TIMER in the guest. Guests
are not insane and don't set the one-shot or periodic
clocks to be in sub 1ms intervals (causing said tasklet
to be scheduled for such small intervalls).

The problem appears when PCI passthrough devices are used
over many sockets and we have an mix of heavy-interrupt
guests and idle guests. The idle guests end up seeing
1/10 of its RUNNING timeslice eaten by the hypervisor
(and 40% steal time).

The mechanism by which we inject PCI interrupts is by
hvm_do_IRQ_dpci which schedules the hvm_dirq_assist
tasklet every time an interrupt is received.
The callchain is:

_asm_vmexit_handler
 -> vmx_vmexit_handler
    ->vmx_do_extint
        -> do_IRQ
            -> __do_IRQ_guest
                -> hvm_do_IRQ_dpci
                   tasklet_schedule(&dpci->dirq_tasklet);
                   [takes lock to put the tasklet on]

[later on the schedule_tail is invoked which is 'vmx_do_resume']

vmx_do_resume
 -> vmx_asm_do_vmentry
        -> call vmx_intr_assist
          -> vmx_process_softirqs
            -> do_softirq
              [executes the tasklet function, takes the
               lock again]

While on other CPUs they might be sitting in a idle loop
and invoked to deliver an VIRQ_TIMER, which also ends
up taking the lock twice: first to schedule the
v->arch.hvm_vcpu.assert_evtchn_irq_tasklet (accounted to
the guests' BLOCKED_state); then to execute it - which is
accounted for in the guest's RUNTIME_state.

The end result is that on a 8 socket machine with
PCI passthrough, where four sockets are busy with interrupts,
and the other sockets have idle guests - we end up with
the idle guests having around 40% steal time and 1/10
of its timeslice (3ms out of 30 ms) being tied up
taking the lock. The latency of the PCI interrupts delieved
to guest is also hindered.

With this patch the problem disappears completly.
That is removing the lock for the PCI passthrough use-case
(the 'hvm_dirq_assist' case) by not using tasklets at all.

The patch is simple - instead of scheduling an tasklet
we schedule our own softirq - HVM_DPCI_SOFTIRQ, which will
take care of running 'hvm_dirq_assist'. The information we need
on each CPU is which 'struct hvm_pirq_dpci' structure the
'hvm_dirq_assist' needs to run on. That is simple solved by
threading the 'struct hvm_pirq_dpci' through a linked list.
The rule of only running one 'hvm_dirq_assist' for only
one 'hvm_pirq_dpci' is also preserved by having
'schedule_dpci_for' ignore any subsequent calls for an domain
which has already been scheduled.

Since we are using 'hvm_pirq_dpci' structure it is setup in
'pt_irq_create_bind' - via 'pt_pirq_reset' we reset the
list for our usage.

The mechanism to tear it down is more complex as there
are three ways it can be executed. To make it simpler
everything revolves around 'pt_pirq_softirq_active'. If it
returns -EGAIN that means there is an outstanding softirq
that needs to finish running before we can continue tearing
down. With that in mind:

a) pci_clean_dpci_irq. This gets called when the guest is
   being destroyed. We end up calling 'pt_pirq_softirq_active'
   to see if it is OK to continue the destruction.

   The scenarios in which the 'struct pirq' (and subsequently
   the 'hvm_pirq_dpci') gets destroyed is when:

   - guest did not use the pirq at all after setup.
   - guest did use pirq, but decided to mask and left it in that
     state.
   - guest did use pirq, but crashed.

   In all of those scenarios we end up calling
   'pt_pirq_softirq_active' to check if the softirq is still
   active. Read below on the 'pt_pirq_softirq_active' loop.

b) pt_irq_destroy_bind (guest disables the MSI). We double-check
   that the softirq has run by piggy-backing on the existing
   'pirq_cleanup_check' mechanism which calls 'pt_pirq_cleanup_check'.
   We add the extra call to 'pt_pirq_softirq_active' in
   'pt_pirq_cleanup_check'.

   NOTE: Guests that use event channels unbind first the
   event channel from PIRQs, so the 'pt_pirq_cleanup_check'
   won't be called as eventch is set to zero. In that case
   we either clean it up via the a) or c) mechanism.

   There is an extra scenario regardless of eventch being
   set or not: the guest did 'pt_irq_destroy_bind' while an
   interrupt was triggered and softirq was scheduled (but had not
   been run). It is OK to still run the softirq as
   hvm_dirq_assist won't do anything (as the flags are
   set to zero).

c) pt_irq_create_bind (not a typo). The scenarios are:

     - guest disables the MSI and then enables it
       (rmmod and modprobe in a loop). We call 'pt_pirq_reset'
       which checks to see if the softirq has been scheduled.
       Imagine the 'b)' with interrupts in flight and c) getting
       called in a loop.

     - we hit once the error paths in 'pt_irq_create_bind' while
       an interrupt was triggered and softirq was scheduled.

In all of those cases we will spin up to a second on
'pt_pirq_softirq_active' (at the start of the 'pt_irq_create_bind')
with the event_lock spinlock dropped and calling
'process_pending_softirqs'. hvm_dirq_assist will be executed and
then the softirq will clear 'mapping' which signals that that we
can re-use the 'hvm_pirq_dpci' structure.

Note that the 'flags' variable is cleared at that
point, so hvm_dirq_assist won't actually do anything.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: Jan Beulich <JBeulich@suse.com>

---
v2: On top of ref-cnts also have wait loop for the outstanding
    'struct domain' that need to be processed.
v3: Add -ERETRY, fix up StyleGuide issues
v4: Clean it up mode, redo per_cpu, this_cpu logic
v5: Instead of threading struct domain, use hvm_pirq_dpci.
v6: Ditch the 'state' bit, expand description, simplify
    softirq and teardown sequence.
---
 xen/arch/x86/domain.c         |   4 +-
 xen/drivers/passthrough/io.c  | 153 ++++++++++++++++++++++++++++++++++++++----
 xen/drivers/passthrough/pci.c |  31 ++++++---
 xen/include/xen/hvm/irq.h     |   3 +-
 xen/include/xen/pci.h         |   2 +-
 xen/include/xen/softirq.h     |   3 +
 6 files changed, 172 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 7b1dfe6..12bf241 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1924,7 +1924,9 @@ int domain_relinquish_resources(struct domain *d)
     switch ( d->arch.relmem )
     {
     case RELMEM_not_started:
-        pci_release_devices(d);
+        ret = pci_release_devices(d);
+        if ( ret )
+            return ret;
 
         /* Tear down paging-assistance stuff. */
         paging_teardown(d);
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index d2a1214..4b2aa77 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -20,14 +20,58 @@
 
 #include <xen/event.h>
 #include <xen/iommu.h>
+#include <xen/cpu.h>
 #include <xen/irq.h>
 #include <asm/hvm/irq.h>
 #include <asm/hvm/iommu.h>
 #include <asm/hvm/support.h>
 #include <xen/hvm/irq.h>
-#include <xen/tasklet.h>
 
-static void hvm_dirq_assist(unsigned long _d);
+static DEFINE_PER_CPU(struct list_head, dpci_list);
+
+/*
+ * Should only be called from hvm_do_IRQ_dpci. We use the
+ * 'masked' as an gate to thwart multiple interrupts.
+ *
+ * The 'masked' is cleared by 'softirq_dpci' when it has
+ * completed executing 'hvm_dirq_assist'.
+ *
+ */
+static void schedule_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
+{
+    unsigned long flags;
+
+    if ( pirq_dpci->masked )
+        return;
+
+    pirq_dpci->masked = 1;
+    get_knownalive_domain(pirq_dpci->dom); /* To be put by 'dpci_softirq' */
+
+    local_irq_save(flags);
+    list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
+    local_irq_restore(flags);
+
+    raise_softirq(HVM_DPCI_SOFTIRQ);
+}
+
+/*
+ * If we are racing with softirq_dpci (masked is still set) we return
+ * -EAGAIN. Otherwise we return 0.
+ *
+ *  If it is -EAGAIN, it is the callers responsibility to kick the softirq
+ *  (with the event_lock dropped).
+ */
+int pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
+{
+    if ( pirq_dpci->masked )
+        return -EAGAIN;
+    /*
+     * If in the future we would call 'schedule_softirq_for' right away
+     * after 'pt_pirq_softirq_active' we MUST reset the list (otherwise it
+     * might have stale data).
+     */
+    return 0;
+}
 
 bool_t pt_irq_need_timer(uint32_t flags)
 {
@@ -104,9 +148,20 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
  *
  * As such on every 'pt_irq_create_bind' call we MUST reset the values.
  */
-static void pt_pirq_reset(struct domain *d, struct hvm_pirq_dpci *dpci)
+static int pt_pirq_reset(struct domain *d, struct hvm_pirq_dpci *dpci)
 {
+    /*
+     * We MUST check for this condition as the softirq could be scheduled
+     * and hasn't run yet. As such we MUST delay this reset until it has
+     * completed its job.
+     */
+    if ( !pt_pirq_cleanup_check(dpci) )
+            return -EAGAIN;
+
+    INIT_LIST_HEAD(&dpci->softirq_list);
     dpci->dom = d;
+
+    return 0;
 }
 
 int pt_irq_create_bind(
@@ -116,7 +171,9 @@ int pt_irq_create_bind(
     struct hvm_pirq_dpci *pirq_dpci;
     struct pirq *info;
     int rc, pirq = pt_irq_bind->machine_irq;
+    s_time_t start = NOW();
 
+ restart:
     if ( pirq < 0 || pirq >= d->nr_pirqs )
         return -EINVAL;
 
@@ -146,7 +203,17 @@ int pt_irq_create_bind(
         return -ENOMEM;
     }
     pirq_dpci = pirq_dpci(info);
-    pt_pirq_reset(d, pirq_dpci);
+    /* A crude 'while' loop with us dropping the spinlock and giving
+     * the softirq_dpci a chance to run. We do this up to one second
+     * at which point we give up. */
+    if ( pt_pirq_reset(d, pirq_dpci) )
+    {
+        spin_unlock(&d->event_lock);
+        process_pending_softirqs();
+        if ( ( NOW() - start ) >> 30 )
+            return -EAGAIN;
+        goto restart;
+    }
 
     switch ( pt_irq_bind->irq_type )
     {
@@ -429,14 +496,12 @@ void pt_pirq_init(struct domain *d, struct hvm_pirq_dpci *dpci)
 {
     INIT_LIST_HEAD(&dpci->digl_list);
     dpci->gmsi.dest_vcpu_id = -1;
-    softirq_tasklet_init(&dpci->tasklet, hvm_dirq_assist, (unsigned long)dpci);
 }
 
 bool_t pt_pirq_cleanup_check(struct hvm_pirq_dpci *dpci)
 {
-    if ( !dpci->flags )
+    if ( !dpci->flags && !pt_pirq_softirq_active(dpci) )
     {
-        tasklet_kill(&dpci->tasklet);
         dpci->dom = NULL;
         return 1;
     }
@@ -479,8 +544,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
-    pirq_dpci->masked = 1;
-    tasklet_schedule(&pirq_dpci->tasklet);
+    schedule_softirq_for(pirq_dpci);
     return 1;
 }
 
@@ -534,15 +598,14 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector)
     spin_unlock(&d->event_lock);
 }
 
-static void hvm_dirq_assist(unsigned long _d)
+static void hvm_dirq_assist(struct hvm_pirq_dpci *pirq_dpci)
 {
-    struct hvm_pirq_dpci *pirq_dpci = (struct hvm_pirq_dpci *)_d;
     struct domain *d = pirq_dpci->dom;
 
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
     spin_lock(&d->event_lock);
-    if ( test_and_clear_bool(pirq_dpci->masked) )
+    if ( pirq_dpci->masked )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
         const struct dev_intx_gsi_link *digl;
@@ -640,3 +703,69 @@ void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi,
 unlock:
     spin_unlock(&d->event_lock);
 }
+
+static void dpci_softirq(void)
+{
+    struct hvm_pirq_dpci *pirq_dpci;
+    unsigned int cpu = smp_processor_id();
+    LIST_HEAD(our_list);
+
+    local_irq_disable();
+    list_splice_init(&per_cpu(dpci_list, cpu), &our_list);
+    local_irq_enable();
+
+    while ( !list_empty(&our_list) )
+    {
+        pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, softirq_list);
+        list_del(&pirq_dpci->softirq_list);
+
+        hvm_dirq_assist(pirq_dpci);
+
+        put_domain(pirq_dpci->dom);
+        pirq_dpci->masked = 0;
+    }
+}
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned int cpu = (unsigned long)hcpu;
+
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        INIT_LIST_HEAD(&per_cpu(dpci_list, cpu));
+        break;
+    case CPU_UP_CANCELED:
+    case CPU_DEAD:
+        /*
+         * On CPU_DYING this callback is called (on the CPU that is dying)
+         * with an possible HVM_DPIC_SOFTIRQ pending - at which point we can
+         * clear out any outstanding domains (by the virtue of the idle loop
+         * calling the softirq later). In CPU_DEAD case the CPU is deaf and
+         * there are no pending softirqs for us to handle so we can chill.
+         */
+        ASSERT(list_empty(&per_cpu(dpci_list, cpu)));
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback,
+};
+
+static int __init setup_dpci_softirq(void)
+{
+    unsigned int cpu;
+
+    for_each_online_cpu(cpu)
+        INIT_LIST_HEAD(&per_cpu(dpci_list, cpu));
+
+    open_softirq(HVM_DPCI_SOFTIRQ, dpci_softirq);
+    register_cpu_notifier(&cpu_nfb);
+    return 0;
+}
+__initcall(setup_dpci_softirq);
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 81e8a3a..d147189 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -767,40 +767,51 @@ static int pci_clean_dpci_irq(struct domain *d,
         xfree(digl);
     }
 
-    tasklet_kill(&pirq_dpci->tasklet);
-
-    return 0;
+    return pt_pirq_softirq_active(pirq_dpci);
 }
 
-static void pci_clean_dpci_irqs(struct domain *d)
+static int pci_clean_dpci_irqs(struct domain *d)
 {
     struct hvm_irq_dpci *hvm_irq_dpci = NULL;
 
     if ( !iommu_enabled )
-        return;
+        return -ESRCH;
 
     if ( !is_hvm_domain(d) )
-        return;
+        return -EINVAL;
 
     spin_lock(&d->event_lock);
     hvm_irq_dpci = domain_get_irq_dpci(d);
     if ( hvm_irq_dpci != NULL )
     {
-        pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
+        int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
+
+        if ( ret )
+        {
+            spin_unlock(&d->event_lock);
+            return ret;
+        }
 
         d->arch.hvm_domain.irq.dpci = NULL;
         free_hvm_irq_dpci(hvm_irq_dpci);
     }
     spin_unlock(&d->event_lock);
+    return 0;
 }
 
-void pci_release_devices(struct domain *d)
+int pci_release_devices(struct domain *d)
 {
     struct pci_dev *pdev;
     u8 bus, devfn;
+    int ret;
 
     spin_lock(&pcidevs_lock);
-    pci_clean_dpci_irqs(d);
+    ret = pci_clean_dpci_irqs(d);
+    if ( ret == -EAGAIN )
+    {
+        spin_unlock(&pcidevs_lock);
+        return ret;
+    }
     while ( (pdev = pci_get_pdev_by_domain(d, -1, -1, -1)) )
     {
         bus = pdev->bus;
@@ -811,6 +822,8 @@ void pci_release_devices(struct domain *d)
                    PCI_SLOT(devfn), PCI_FUNC(devfn));
     }
     spin_unlock(&pcidevs_lock);
+
+    return 0;
 }
 
 #define PCI_CLASS_BRIDGE_HOST    0x0600
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 94a550a..4fc6dcf 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -99,7 +99,7 @@ struct hvm_pirq_dpci {
     struct domain *dom;
     struct hvm_gmsi_info gmsi;
     struct timer timer;
-    struct tasklet tasklet;
+    struct list_head softirq_list;
 };
 
 void pt_pirq_init(struct domain *, struct hvm_pirq_dpci *);
@@ -109,6 +109,7 @@ int pt_pirq_iterate(struct domain *d,
                               struct hvm_pirq_dpci *, void *arg),
                     void *arg);
 
+int pt_pirq_softirq_active(struct hvm_pirq_dpci *);
 /* Modify state of a PCI INTx wire. */
 void hvm_pci_intx_assert(
     struct domain *d, unsigned int device, unsigned int intx);
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 91520bc..5f295f3 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -99,7 +99,7 @@ struct pci_dev *pci_lock_domain_pdev(
 
 void setup_hwdom_pci_devices(struct domain *,
                             int (*)(u8 devfn, struct pci_dev *));
-void pci_release_devices(struct domain *d);
+int pci_release_devices(struct domain *d);
 int pci_add_segment(u16 seg);
 const unsigned long *pci_get_ro_map(u16 seg);
 int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index 0895a16..f3da5df 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -8,6 +8,9 @@ enum {
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
     TASKLET_SOFTIRQ,
+#ifdef HAS_PASSTHROUGH
+    HVM_DPCI_SOFTIRQ,
+#endif
     NR_COMMON_SOFTIRQS
 };
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model Konrad Rzeszutek Wilk
@ 2014-09-25 14:24   ` Jan Beulich
  2014-09-25 14:48     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2014-09-25 14:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

>>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> @@ -130,6 +146,7 @@ int pt_irq_create_bind(
>          return -ENOMEM;
>      }
>      pirq_dpci = pirq_dpci(info);
> +    pt_pirq_reset(d, pirq_dpci);
>  
>      switch ( pt_irq_bind->irq_type )
>      {
> @@ -232,7 +249,6 @@ int pt_irq_create_bind(
>          {
>              unsigned int share;
>  
> -            pirq_dpci->dom = d;
>              if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
>              {
>                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
> @@ -258,7 +274,6 @@ int pt_irq_create_bind(
>              {
>                  if ( pt_irq_need_timer(pirq_dpci->flags) )
>                      kill_timer(&pirq_dpci->timer);
> -                pirq_dpci->dom = NULL;
>                  list_del(&girq->list);
>                  list_del(&digl->list);
>                  hvm_irq_dpci->link_cnt[link]--;
> @@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
>          msixtbl_pt_unregister(d, pirq);
>          if ( pt_irq_need_timer(pirq_dpci->flags) )
>              kill_timer(&pirq_dpci->timer);
> -        pirq_dpci->dom   = NULL;
>          pirq_dpci->flags = 0;
>          pirq_cleanup_check(pirq, d);
>      }

Is all of the above really necessary? I.e. I can neither see why setting
->dom earlier is needed, nor why clearing it on the error paths should
be dropped.

> @@ -459,7 +480,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
>          return 0;
>  
>      pirq_dpci->masked = 1;
> -    tasklet_schedule(&dpci->dirq_tasklet);
> +    tasklet_schedule(&pirq_dpci->tasklet);

Please also drop the effectively no longer used local variable "dpci"
here (the NULL check of it needs to stay though, but you can simply
check the return value of domain_get_irq_dpci() without using a local
variable now).

> @@ -564,7 +585,7 @@ static int _hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>  
>  static void hvm_dirq_assist(unsigned long _d)
>  {
> -    struct domain *d = (struct domain *)_d;
> +    struct domain *d = ((struct hvm_pirq_dpci *)_d)->dom;
>  
>      ASSERT(d->arch.hvm_domain.irq.dpci);
>  

This seems too little of a change - there's no point in calling
pt_pirq_iterate() here anymore, as you already have the struct
hvm_pirq_dpci instance that needs acting on (any others will get
dealt with in their own tasklets). I.e. the current hvm_dirq_assist()
can go away, and what right now is _hvm_dirq_assist() should
become hvm_dirq_assist().

Oh - peeking ahead I see this is patch 2 of your series. Not sure
this should be separate patches...

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 2/3] dpci: In hvm_dirq_assist stop using pt_pirq_iterate
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 2/3] dpci: In hvm_dirq_assist stop using pt_pirq_iterate Konrad Rzeszutek Wilk
@ 2014-09-25 14:29   ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2014-09-25 14:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

>>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -534,9 +534,14 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector)
>      spin_unlock(&d->event_lock);
>  }
>  
> -static int _hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
> -                            void *arg)
> +static void hvm_dirq_assist(unsigned long _d)

Please name this "arg" or some such, the name "_d" really was
suitable only when the function took a (converted to scalar type)
domain pointer. Beyond that this looks quite okay (pending the
decision whether to fold into the first patch); personally I would
aim at not using goto-s...

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
  2014-09-25 14:24   ` Jan Beulich
@ 2014-09-25 14:48     ` Konrad Rzeszutek Wilk
  2014-09-25 15:04       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-25 14:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

On Thu, Sep 25, 2014 at 03:24:53PM +0100, Jan Beulich wrote:
> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> > @@ -130,6 +146,7 @@ int pt_irq_create_bind(
> >          return -ENOMEM;
> >      }
> >      pirq_dpci = pirq_dpci(info);
> > +    pt_pirq_reset(d, pirq_dpci);
> >  
> >      switch ( pt_irq_bind->irq_type )
> >      {
> > @@ -232,7 +249,6 @@ int pt_irq_create_bind(
> >          {
> >              unsigned int share;
> >  
> > -            pirq_dpci->dom = d;
> >              if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
> >              {
> >                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
> > @@ -258,7 +274,6 @@ int pt_irq_create_bind(
> >              {
> >                  if ( pt_irq_need_timer(pirq_dpci->flags) )
> >                      kill_timer(&pirq_dpci->timer);
> > -                pirq_dpci->dom = NULL;
> >                  list_del(&girq->list);
> >                  list_del(&digl->list);
> >                  hvm_irq_dpci->link_cnt[link]--;
> > @@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
> >          msixtbl_pt_unregister(d, pirq);
> >          if ( pt_irq_need_timer(pirq_dpci->flags) )
> >              kill_timer(&pirq_dpci->timer);
> > -        pirq_dpci->dom   = NULL;
> >          pirq_dpci->flags = 0;
> >          pirq_cleanup_check(pirq, d);
> >      }
> 
> Is all of the above really necessary? I.e. I can neither see why setting
> ->dom earlier is needed, nor why clearing it on the error paths should
> be dropped.

Yes. We need the ->dom so that the hvm_dirq_assist can run without
hitting an NULL pointer exception. Please keep in mind that the moment
we setup the IRQ action handler, we are "live" - thought to be fair we
have not yet told QEMU of the PIRQ value so the device driver hasn't
written the value in. But I can see some drivers doing a mix of
pt_irq_create_bind->pt_irq_destroy_bind->pt_irq_create_bind->...

 setup MSI
	[calls QEMU, the pt_irq_create_bind, QEMU returns the PIRQ value]
 sets up IDT
 tells the hardware chip to use the PIRQ value that QEMU gave it.
 destroys MSI
 setup MSI
 destroys MSI
 setup MSI

.. while forgetting to tell the chip to stop sending interrupts. During those
windows the (destroy, and then create) we can have interrupts coming in
and since hvm_dirq_assist needs the ->dom, we cannot clear it.

The extra '->dom = d' in the pt_irq_create_bind was an extra one since the
pt_pirq_reset does it now.


> 
> > @@ -459,7 +480,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
> >          return 0;
> >  
> >      pirq_dpci->masked = 1;
> > -    tasklet_schedule(&dpci->dirq_tasklet);
> > +    tasklet_schedule(&pirq_dpci->tasklet);
> 
> Please also drop the effectively no longer used local variable "dpci"
> here (the NULL check of it needs to stay though, but you can simply
> check the return value of domain_get_irq_dpci() without using a local
> variable now).

That can certainly be done.
> 
> > @@ -564,7 +585,7 @@ static int _hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
> >  
> >  static void hvm_dirq_assist(unsigned long _d)
> >  {
> > -    struct domain *d = (struct domain *)_d;
> > +    struct domain *d = ((struct hvm_pirq_dpci *)_d)->dom;
> >  
> >      ASSERT(d->arch.hvm_domain.irq.dpci);
> >  
> 
> This seems too little of a change - there's no point in calling
> pt_pirq_iterate() here anymore, as you already have the struct
> hvm_pirq_dpci instance that needs acting on (any others will get
> dealt with in their own tasklets). I.e. the current hvm_dirq_assist()
> can go away, and what right now is _hvm_dirq_assist() should
> become hvm_dirq_assist().

Right :-)
> 
> Oh - peeking ahead I see this is patch 2 of your series. Not sure
> this should be separate patches...

I've been conditioned to do 'one logical change per patch' so figured
it would be far easier if I had split it apart.

Of course since you prefer to squash them I will do so promptly!
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6)
  2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6) Konrad Rzeszutek Wilk
@ 2014-09-25 14:55   ` Jan Beulich
  2014-09-25 15:27     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2014-09-25 14:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

>>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> +/*
> + * If we are racing with softirq_dpci (masked is still set) we return
> + * -EAGAIN. Otherwise we return 0.
> + *
> + *  If it is -EAGAIN, it is the callers responsibility to kick the softirq
> + *  (with the event_lock dropped).

But pt_pirq_cleanup_check() doesn't do this - is the comment
misleading or that particular call site reacting wrongly? Actually the
other call site doesn't kick any softirq either - what am I missing here?

> @@ -104,9 +148,20 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
>   *
>   * As such on every 'pt_irq_create_bind' call we MUST reset the values.
>   */
> -static void pt_pirq_reset(struct domain *d, struct hvm_pirq_dpci *dpci)
> +static int pt_pirq_reset(struct domain *d, struct hvm_pirq_dpci *dpci)
>  {
> +    /*
> +     * We MUST check for this condition as the softirq could be scheduled
> +     * and hasn't run yet. As such we MUST delay this reset until it has
> +     * completed its job.
> +     */
> +    if ( !pt_pirq_cleanup_check(dpci) )
> +            return -EAGAIN;
> +
> +    INIT_LIST_HEAD(&dpci->softirq_list);

What is this good for? This isn't a list head, and simple list elements
don't need resetting of their link fields.

> @@ -116,7 +171,9 @@ int pt_irq_create_bind(
>      struct hvm_pirq_dpci *pirq_dpci;
>      struct pirq *info;
>      int rc, pirq = pt_irq_bind->machine_irq;
> +    s_time_t start = NOW();
>  
> + restart:
>      if ( pirq < 0 || pirq >= d->nr_pirqs )
>          return -EINVAL;

I don't think this check needs re-doing on each restart.

> @@ -146,7 +203,17 @@ int pt_irq_create_bind(
>          return -ENOMEM;
>      }
>      pirq_dpci = pirq_dpci(info);
> -    pt_pirq_reset(d, pirq_dpci);
> +    /* A crude 'while' loop with us dropping the spinlock and giving
> +     * the softirq_dpci a chance to run. We do this up to one second
> +     * at which point we give up. */

Please fix the comment style, but ...

> +    if ( pt_pirq_reset(d, pirq_dpci) )
> +    {
> +        spin_unlock(&d->event_lock);
> +        process_pending_softirqs();
> +        if ( ( NOW() - start ) >> 30 )
> +            return -EAGAIN;
> +        goto restart;
> +    }

... this still looks more like a hack, and I'm still not really certain
why between two uses (which is what I understand this is for) the
pIRQ (and hence it's softirq instance) won't be fully quiesced.

> +static void dpci_softirq(void)
> +{
> +    struct hvm_pirq_dpci *pirq_dpci;
> +    unsigned int cpu = smp_processor_id();
> +    LIST_HEAD(our_list);
> +
> +    local_irq_disable();
> +    list_splice_init(&per_cpu(dpci_list, cpu), &our_list);
> +    local_irq_enable();
> +
> +    while ( !list_empty(&our_list) )
> +    {
> +        pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, softirq_list);
> +        list_del(&pirq_dpci->softirq_list);
> +
> +        hvm_dirq_assist(pirq_dpci);
> +
> +        put_domain(pirq_dpci->dom);
> +        pirq_dpci->masked = 0;
> +    }
> +}
> +static int cpu_callback(

There is again/still a blank line missing here.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
  2014-09-25 14:48     ` Konrad Rzeszutek Wilk
@ 2014-09-25 15:04       ` Jan Beulich
  2014-09-27  1:32         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2014-09-25 15:04 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

>>> On 25.09.14 at 16:48, <konrad.wilk@oracle.com> wrote:
> On Thu, Sep 25, 2014 at 03:24:53PM +0100, Jan Beulich wrote:
>> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
>> > @@ -130,6 +146,7 @@ int pt_irq_create_bind(
>> >          return -ENOMEM;
>> >      }
>> >      pirq_dpci = pirq_dpci(info);
>> > +    pt_pirq_reset(d, pirq_dpci);
>> >  
>> >      switch ( pt_irq_bind->irq_type )
>> >      {
>> > @@ -232,7 +249,6 @@ int pt_irq_create_bind(
>> >          {
>> >              unsigned int share;
>> >  
>> > -            pirq_dpci->dom = d;
>> >              if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
>> >              {
>> >                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
>> > @@ -258,7 +274,6 @@ int pt_irq_create_bind(
>> >              {
>> >                  if ( pt_irq_need_timer(pirq_dpci->flags) )
>> >                      kill_timer(&pirq_dpci->timer);
>> > -                pirq_dpci->dom = NULL;
>> >                  list_del(&girq->list);
>> >                  list_del(&digl->list);
>> >                  hvm_irq_dpci->link_cnt[link]--;
>> > @@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
>> >          msixtbl_pt_unregister(d, pirq);
>> >          if ( pt_irq_need_timer(pirq_dpci->flags) )
>> >              kill_timer(&pirq_dpci->timer);
>> > -        pirq_dpci->dom   = NULL;
>> >          pirq_dpci->flags = 0;
>> >          pirq_cleanup_check(pirq, d);
>> >      }
>> 
>> Is all of the above really necessary? I.e. I can neither see why setting
>> ->dom earlier is needed, nor why clearing it on the error paths should
>> be dropped.
> 
> Yes. We need the ->dom so that the hvm_dirq_assist can run without
> hitting an NULL pointer exception. Please keep in mind that the moment
> we setup the IRQ action handler, we are "live" - [...]

But all you need is that this happens before respective
pirq_guest_bind() calls. I.e. in the PT_IRQ_TYPE_PCI and
PT_IRQ_TYPE_MSI_TRANSLATE cases it was already done early enough
(avoiding it remaining set on error paths), so all you'd need is adding
it for the PT_IRQ_TYPE_MSI path too.

I agree that the clearing of the field in error paths might need a
little care, but otoh you could equally well have hvm_dirq_assist()
bail when it finds it to be NULL?

> The extra '->dom = d' in the pt_irq_create_bind was an extra one since the
> pt_pirq_reset does it now.

Yeah, effectively pt_pirq_reset() is just replacing that assignment.
It's not even clear wrapping this in a function is really worthwhile.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6)
  2014-09-25 14:55   ` Jan Beulich
@ 2014-09-25 15:27     ` Konrad Rzeszutek Wilk
  2014-09-25 15:45       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-25 15:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

On Thu, Sep 25, 2014 at 03:55:28PM +0100, Jan Beulich wrote:
> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> > +/*
> > + * If we are racing with softirq_dpci (masked is still set) we return
> > + * -EAGAIN. Otherwise we return 0.
> > + *
> > + *  If it is -EAGAIN, it is the callers responsibility to kick the softirq
> > + *  (with the event_lock dropped).
> 
> But pt_pirq_cleanup_check() doesn't do this - is the comment
> misleading or that particular call site reacting wrongly? Actually the
> other call site doesn't kick any softirq either - what am I missing here?

The one call side that does is the 'pt_pirq_create..' which calls
'pt_pirq_reset'. The other ones:
 a) domain_kill->domain_relinquish_resources->pci_release_devices->pci_clean_dpci_irq
 b) pt_pirq_cleanup_check

are missing it. It is easy with a)- just add the process_pending_softirq()) in
when we are not holding the lock. But b) is much harder as we would need to
alter the whole 'pirq_cleanup_check' to return an error (as the callers of
'pirq_cleanup_check' are holding the lock) and perculate that up..

One way to do this is by ignoring the 'pt_pirq_cleanup_check' case as
the ramifications of that is that we would either re-use the 'pirq'
in pt_irq_create_bind or pick 'pirq' up at pci_clean_dpci_irq and then
remove it (and deal with the process_pending_softirq()).

That leaves a) case which is simple enough. I will add that in.
> 
> > @@ -104,9 +148,20 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
> >   *
> >   * As such on every 'pt_irq_create_bind' call we MUST reset the values.
> >   */
> > -static void pt_pirq_reset(struct domain *d, struct hvm_pirq_dpci *dpci)
> > +static int pt_pirq_reset(struct domain *d, struct hvm_pirq_dpci *dpci)
> >  {
> > +    /*
> > +     * We MUST check for this condition as the softirq could be scheduled
> > +     * and hasn't run yet. As such we MUST delay this reset until it has
> > +     * completed its job.
> > +     */
> > +    if ( !pt_pirq_cleanup_check(dpci) )
> > +            return -EAGAIN;
> > +
> > +    INIT_LIST_HEAD(&dpci->softirq_list);
> 
> What is this good for? This isn't a list head, and simple list elements
> don't need resetting of their link fields.

True. Left-over.
> 
> > @@ -116,7 +171,9 @@ int pt_irq_create_bind(
> >      struct hvm_pirq_dpci *pirq_dpci;
> >      struct pirq *info;
> >      int rc, pirq = pt_irq_bind->machine_irq;
> > +    s_time_t start = NOW();
> >  
> > + restart:
> >      if ( pirq < 0 || pirq >= d->nr_pirqs )
> >          return -EINVAL;
> 
> I don't think this check needs re-doing on each restart.

OK.
> 
> > @@ -146,7 +203,17 @@ int pt_irq_create_bind(
> >          return -ENOMEM;
> >      }
> >      pirq_dpci = pirq_dpci(info);
> > -    pt_pirq_reset(d, pirq_dpci);
> > +    /* A crude 'while' loop with us dropping the spinlock and giving
> > +     * the softirq_dpci a chance to run. We do this up to one second
> > +     * at which point we give up. */
> 
> Please fix the comment style, but ...
> 
> > +    if ( pt_pirq_reset(d, pirq_dpci) )
> > +    {
> > +        spin_unlock(&d->event_lock);
> > +        process_pending_softirqs();
> > +        if ( ( NOW() - start ) >> 30 )
> > +            return -EAGAIN;
> > +        goto restart;
> > +    }
> 
> ... this still looks more like a hack, and I'm still not really certain
> why between two uses (which is what I understand this is for) the
> pIRQ (and hence it's softirq instance) won't be fully quiesced.

Just to make it clear - the 'pirq_guest_unbind' (which is called in the
pt_irq_destroy_bind) will take care of removing the action. So no more
__do_IRQ calls using the 'pirq' after that.

But we might have a pending softirq after we finished with pt_irq_destroy_bind.
And this loop will take care of waiting it out. This problem had
existed prior to this patch - this wait loop was done inside the 'tasklet_kill'.

I added the 1 second timeout as I am not a fan of unbound loops. But
I can put it back in to make it simpler (and look less hacky).

> 
> > +static void dpci_softirq(void)
> > +{
> > +    struct hvm_pirq_dpci *pirq_dpci;
> > +    unsigned int cpu = smp_processor_id();
> > +    LIST_HEAD(our_list);
> > +
> > +    local_irq_disable();
> > +    list_splice_init(&per_cpu(dpci_list, cpu), &our_list);
> > +    local_irq_enable();
> > +
> > +    while ( !list_empty(&our_list) )
> > +    {
> > +        pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, softirq_list);
> > +        list_del(&pirq_dpci->softirq_list);
> > +
> > +        hvm_dirq_assist(pirq_dpci);
> > +
> > +        put_domain(pirq_dpci->dom);
> > +        pirq_dpci->masked = 0;
> > +    }
> > +}
> > +static int cpu_callback(
> 
> There is again/still a blank line missing here.

Grrr.

Thank you again for your excellent review!
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6)
  2014-09-25 15:27     ` Konrad Rzeszutek Wilk
@ 2014-09-25 15:45       ` Jan Beulich
  2014-09-25 16:05         ` Konrad Rzeszutek Wilk
  2014-09-27  1:32         ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Beulich @ 2014-09-25 15:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

>>> On 25.09.14 at 17:27, <konrad.wilk@oracle.com> wrote:
> On Thu, Sep 25, 2014 at 03:55:28PM +0100, Jan Beulich wrote:
>> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
>> > +/*
>> > + * If we are racing with softirq_dpci (masked is still set) we return
>> > + * -EAGAIN. Otherwise we return 0.
>> > + *
>> > + *  If it is -EAGAIN, it is the callers responsibility to kick the softirq
>> > + *  (with the event_lock dropped).
>> 
>> But pt_pirq_cleanup_check() doesn't do this - is the comment
>> misleading or that particular call site reacting wrongly? Actually the
>> other call site doesn't kick any softirq either - what am I missing here?
> 
> The one call side that does is the 'pt_pirq_create..' which calls
> 'pt_pirq_reset'. The other ones:
>  a) domain_kill->domain_relinquish_resources->pci_release_devices->pci_clean_dpci_irq
>  b) pt_pirq_cleanup_check
> 
> are missing it. It is easy with a)- just add the process_pending_softirq()) in
> when we are not holding the lock. But b) is much harder as we would need to
> alter the whole 'pirq_cleanup_check' to return an error (as the callers of
> 'pirq_cleanup_check' are holding the lock) and perculate that up..

Hmm, perhaps I'm misunderstanding "kick" then: If all you want is
for it to be executed, you don't need to do anything on the -EAGAIN
way out of domain_relinquish_resources().

> One way to do this is by ignoring the 'pt_pirq_cleanup_check' case as
> the ramifications of that is that we would either re-use the 'pirq'
> in pt_irq_create_bind or pick 'pirq' up at pci_clean_dpci_irq and then
> remove it (and deal with the process_pending_softirq()).

As long as that's safe to do...

>> > +    if ( pt_pirq_reset(d, pirq_dpci) )
>> > +    {
>> > +        spin_unlock(&d->event_lock);
>> > +        process_pending_softirqs();
>> > +        if ( ( NOW() - start ) >> 30 )
>> > +            return -EAGAIN;
>> > +        goto restart;
>> > +    }
>> 
>> ... this still looks more like a hack, and I'm still not really certain
>> why between two uses (which is what I understand this is for) the
>> pIRQ (and hence it's softirq instance) won't be fully quiesced.
> 
> Just to make it clear - the 'pirq_guest_unbind' (which is called in the
> pt_irq_destroy_bind) will take care of removing the action. So no more
> __do_IRQ calls using the 'pirq' after that.
> 
> But we might have a pending softirq after we finished with 
> pt_irq_destroy_bind.
> And this loop will take care of waiting it out. This problem had
> existed prior to this patch - this wait loop was done inside the 
> 'tasklet_kill'.
> 
> I added the 1 second timeout as I am not a fan of unbound loops. But
> I can put it back in to make it simpler (and look less hacky).

If a softirq doesn't get run in a timely manner we're in bigger trouble
than what would warrant a timeout here. Perhaps simply put a
comment there referring to tasklet_kill() doing effectively the same
thing?

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6)
  2014-09-25 15:45       ` Jan Beulich
@ 2014-09-25 16:05         ` Konrad Rzeszutek Wilk
  2014-09-27  1:32         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-25 16:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

On Thu, Sep 25, 2014 at 04:45:49PM +0100, Jan Beulich wrote:
> >>> On 25.09.14 at 17:27, <konrad.wilk@oracle.com> wrote:
> > On Thu, Sep 25, 2014 at 03:55:28PM +0100, Jan Beulich wrote:
> >> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> >> > +/*
> >> > + * If we are racing with softirq_dpci (masked is still set) we return
> >> > + * -EAGAIN. Otherwise we return 0.
> >> > + *
> >> > + *  If it is -EAGAIN, it is the callers responsibility to kick the softirq
> >> > + *  (with the event_lock dropped).
> >> 
> >> But pt_pirq_cleanup_check() doesn't do this - is the comment
> >> misleading or that particular call site reacting wrongly? Actually the
> >> other call site doesn't kick any softirq either - what am I missing here?
> > 
> > The one call side that does is the 'pt_pirq_create..' which calls
> > 'pt_pirq_reset'. The other ones:
> >  a) domain_kill->domain_relinquish_resources->pci_release_devices->pci_clean_dpci_irq
> >  b) pt_pirq_cleanup_check
> > 
> > are missing it. It is easy with a)- just add the process_pending_softirq()) in
> > when we are not holding the lock. But b) is much harder as we would need to
> > alter the whole 'pirq_cleanup_check' to return an error (as the callers of
> > 'pirq_cleanup_check' are holding the lock) and perculate that up..
> 
> Hmm, perhaps I'm misunderstanding "kick" then: If all you want is
> for it to be executed, you don't need to do anything on the -EAGAIN
> way out of domain_relinquish_resources().

I used the wrong phrase. Meant to say 'need to have softirq' run at some
point to flush these stale ones out (if they are there of course).

> 
> > One way to do this is by ignoring the 'pt_pirq_cleanup_check' case as
> > the ramifications of that is that we would either re-use the 'pirq'
> > in pt_irq_create_bind or pick 'pirq' up at pci_clean_dpci_irq and then
> > remove it (and deal with the process_pending_softirq()).
> 
> As long as that's safe to do...
> 
> >> > +    if ( pt_pirq_reset(d, pirq_dpci) )
> >> > +    {
> >> > +        spin_unlock(&d->event_lock);
> >> > +        process_pending_softirqs();
> >> > +        if ( ( NOW() - start ) >> 30 )
> >> > +            return -EAGAIN;
> >> > +        goto restart;
> >> > +    }
> >> 
> >> ... this still looks more like a hack, and I'm still not really certain
> >> why between two uses (which is what I understand this is for) the
> >> pIRQ (and hence it's softirq instance) won't be fully quiesced.
> > 
> > Just to make it clear - the 'pirq_guest_unbind' (which is called in the
> > pt_irq_destroy_bind) will take care of removing the action. So no more
> > __do_IRQ calls using the 'pirq' after that.
> > 
> > But we might have a pending softirq after we finished with 
> > pt_irq_destroy_bind.
> > And this loop will take care of waiting it out. This problem had
> > existed prior to this patch - this wait loop was done inside the 
> > 'tasklet_kill'.
> > 
> > I added the 1 second timeout as I am not a fan of unbound loops. But
> > I can put it back in to make it simpler (and look less hacky).
> 
> If a softirq doesn't get run in a timely manner we're in bigger trouble
> than what would warrant a timeout here. Perhaps simply put a
> comment there referring to tasklet_kill() doing effectively the same
> thing?

Perfect!
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
  2014-09-25 15:04       ` Jan Beulich
@ 2014-09-27  1:32         ` Konrad Rzeszutek Wilk
  2014-09-29  7:21           ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-27  1:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

On Thu, Sep 25, 2014 at 04:04:46PM +0100, Jan Beulich wrote:
> >>> On 25.09.14 at 16:48, <konrad.wilk@oracle.com> wrote:
> > On Thu, Sep 25, 2014 at 03:24:53PM +0100, Jan Beulich wrote:
> >> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> >> > @@ -130,6 +146,7 @@ int pt_irq_create_bind(
> >> >          return -ENOMEM;
> >> >      }
> >> >      pirq_dpci = pirq_dpci(info);
> >> > +    pt_pirq_reset(d, pirq_dpci);
> >> >  
> >> >      switch ( pt_irq_bind->irq_type )
> >> >      {
> >> > @@ -232,7 +249,6 @@ int pt_irq_create_bind(
> >> >          {
> >> >              unsigned int share;
> >> >  
> >> > -            pirq_dpci->dom = d;
> >> >              if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
> >> >              {
> >> >                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
> >> > @@ -258,7 +274,6 @@ int pt_irq_create_bind(
> >> >              {
> >> >                  if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> >                      kill_timer(&pirq_dpci->timer);
> >> > -                pirq_dpci->dom = NULL;
> >> >                  list_del(&girq->list);
> >> >                  list_del(&digl->list);
> >> >                  hvm_irq_dpci->link_cnt[link]--;
> >> > @@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
> >> >          msixtbl_pt_unregister(d, pirq);
> >> >          if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> >              kill_timer(&pirq_dpci->timer);
> >> > -        pirq_dpci->dom   = NULL;
> >> >          pirq_dpci->flags = 0;
> >> >          pirq_cleanup_check(pirq, d);
> >> >      }
> >> 
> >> Is all of the above really necessary? I.e. I can neither see why setting
> >> ->dom earlier is needed, nor why clearing it on the error paths should
> >> be dropped.
> > 
> > Yes. We need the ->dom so that the hvm_dirq_assist can run without
> > hitting an NULL pointer exception. Please keep in mind that the moment
> > we setup the IRQ action handler, we are "live" - [...]
> 
> But all you need is that this happens before respective
> pirq_guest_bind() calls. I.e. in the PT_IRQ_TYPE_PCI and
> PT_IRQ_TYPE_MSI_TRANSLATE cases it was already done early enough
> (avoiding it remaining set on error paths), so all you'd need is adding
> it for the PT_IRQ_TYPE_MSI path too.

.. and with the below statement ["pt_pirq_reset() is just replacing
that assigment"] I believe having just
 pirq->dom = d;

before the switch would be correct.

> 
> I agree that the clearing of the field in error paths might need a
> little care, but otoh you could equally well have hvm_dirq_assist()
> bail when it finds it to be NULL?

Can't do it as is. I added in the 'get_knowalive_domain()' and 'put_domain()'
in the 'schedule_softirq_for' and 'dpci_softirq' respectively.

Which means that we would have an outstanding refcount as 'dpci_softirq'
could not access the '->dom' to get the domain and do proper refcounting.

Unless I rip out the refcounting there which I believe is OK.

The reason it was added was to make sure that the domain destruction
would delay until the last of the 'hvm_dirq_assist' would be called.

The 'domain_destroy' (as of patch #3) does -EAGAIN if it finds an
outstanding 'hvm_dirq_assist' to be called (with or without the refcounting).

That means if we outright kill the guest we still have the guarantee
that the 'hvm_dirq_assist' has run - and won't get an crash
(as pirq_dpci would not be cleared underneath it). 

> 
> > The extra '->dom = d' in the pt_irq_create_bind was an extra one since the
> > pt_pirq_reset does it now.
> 
> Yeah, effectively pt_pirq_reset() is just replacing that assignment.
> It's not even clear wrapping this in a function is really worthwhile.

OK. Moved it out to be a simple assigment.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6)
  2014-09-25 15:45       ` Jan Beulich
  2014-09-25 16:05         ` Konrad Rzeszutek Wilk
@ 2014-09-27  1:32         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-27  1:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

On Thu, Sep 25, 2014 at 04:45:49PM +0100, Jan Beulich wrote:
> >>> On 25.09.14 at 17:27, <konrad.wilk@oracle.com> wrote:
> > On Thu, Sep 25, 2014 at 03:55:28PM +0100, Jan Beulich wrote:
> >> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> >> > +/*
> >> > + * If we are racing with softirq_dpci (masked is still set) we return
> >> > + * -EAGAIN. Otherwise we return 0.
> >> > + *
> >> > + *  If it is -EAGAIN, it is the callers responsibility to kick the softirq
> >> > + *  (with the event_lock dropped).
> >> 
> >> But pt_pirq_cleanup_check() doesn't do this - is the comment
> >> misleading or that particular call site reacting wrongly? Actually the
> >> other call site doesn't kick any softirq either - what am I missing here?
> > 
> > The one call side that does is the 'pt_pirq_create..' which calls
> > 'pt_pirq_reset'. The other ones:
> >  a) domain_kill->domain_relinquish_resources->pci_release_devices->pci_clean_dpci_irq
> >  b) pt_pirq_cleanup_check
> > 
> > are missing it. It is easy with a)- just add the process_pending_softirq()) in
> > when we are not holding the lock. But b) is much harder as we would need to
> > alter the whole 'pirq_cleanup_check' to return an error (as the callers of
> > 'pirq_cleanup_check' are holding the lock) and perculate that up..
> 
> Hmm, perhaps I'm misunderstanding "kick" then: If all you want is
> for it to be executed, you don't need to do anything on the -EAGAIN
> way out of domain_relinquish_resources().
> 
> > One way to do this is by ignoring the 'pt_pirq_cleanup_check' case as
> > the ramifications of that is that we would either re-use the 'pirq'
> > in pt_irq_create_bind or pick 'pirq' up at pci_clean_dpci_irq and then
> > remove it (and deal with the process_pending_softirq()).
> 
> As long as that's safe to do...

It is.
> 
> >> > +    if ( pt_pirq_reset(d, pirq_dpci) )
> >> > +    {
> >> > +        spin_unlock(&d->event_lock);
> >> > +        process_pending_softirqs();
> >> > +        if ( ( NOW() - start ) >> 30 )
> >> > +            return -EAGAIN;
> >> > +        goto restart;
> >> > +    }
> >> 
> >> ... this still looks more like a hack, and I'm still not really certain
> >> why between two uses (which is what I understand this is for) the
> >> pIRQ (and hence it's softirq instance) won't be fully quiesced.
> > 
> > Just to make it clear - the 'pirq_guest_unbind' (which is called in the
> > pt_irq_destroy_bind) will take care of removing the action. So no more
> > __do_IRQ calls using the 'pirq' after that.
> > 
> > But we might have a pending softirq after we finished with 
> > pt_irq_destroy_bind.
> > And this loop will take care of waiting it out. This problem had
> > existed prior to this patch - this wait loop was done inside the 
> > 'tasklet_kill'.
> > 
> > I added the 1 second timeout as I am not a fan of unbound loops. But
> > I can put it back in to make it simpler (and look less hacky).
> 
> If a softirq doesn't get run in a timely manner we're in bigger trouble
> than what would warrant a timeout here. Perhaps simply put a
> comment there referring to tasklet_kill() doing effectively the same
> thing?

Yes. Let me do that.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
  2014-09-27  1:32         ` Konrad Rzeszutek Wilk
@ 2014-09-29  7:21           ` Jan Beulich
  2014-10-07 15:40             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2014-09-29  7:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

>>> On 27.09.14 at 03:32, <konrad.wilk@oracle.com> wrote:
> On Thu, Sep 25, 2014 at 04:04:46PM +0100, Jan Beulich wrote:
>> >>> On 25.09.14 at 16:48, <konrad.wilk@oracle.com> wrote:
>> > On Thu, Sep 25, 2014 at 03:24:53PM +0100, Jan Beulich wrote:
>> >> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
>> >> > @@ -130,6 +146,7 @@ int pt_irq_create_bind(
>> >> >          return -ENOMEM;
>> >> >      }
>> >> >      pirq_dpci = pirq_dpci(info);
>> >> > +    pt_pirq_reset(d, pirq_dpci);
>> >> >  
>> >> >      switch ( pt_irq_bind->irq_type )
>> >> >      {
>> >> > @@ -232,7 +249,6 @@ int pt_irq_create_bind(
>> >> >          {
>> >> >              unsigned int share;
>> >> >  
>> >> > -            pirq_dpci->dom = d;
>> >> >              if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
>> >> >              {
>> >> >                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
>> >> > @@ -258,7 +274,6 @@ int pt_irq_create_bind(
>> >> >              {
>> >> >                  if ( pt_irq_need_timer(pirq_dpci->flags) )
>> >> >                      kill_timer(&pirq_dpci->timer);
>> >> > -                pirq_dpci->dom = NULL;
>> >> >                  list_del(&girq->list);
>> >> >                  list_del(&digl->list);
>> >> >                  hvm_irq_dpci->link_cnt[link]--;
>> >> > @@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
>> >> >          msixtbl_pt_unregister(d, pirq);
>> >> >          if ( pt_irq_need_timer(pirq_dpci->flags) )
>> >> >              kill_timer(&pirq_dpci->timer);
>> >> > -        pirq_dpci->dom   = NULL;
>> >> >          pirq_dpci->flags = 0;
>> >> >          pirq_cleanup_check(pirq, d);
>> >> >      }
>> >> 
>> >> Is all of the above really necessary? I.e. I can neither see why setting
>> >> ->dom earlier is needed, nor why clearing it on the error paths should
>> >> be dropped.
>> > 
>> > Yes. We need the ->dom so that the hvm_dirq_assist can run without
>> > hitting an NULL pointer exception. Please keep in mind that the moment
>> > we setup the IRQ action handler, we are "live" - [...]
>> 
>> But all you need is that this happens before respective
>> pirq_guest_bind() calls. I.e. in the PT_IRQ_TYPE_PCI and
>> PT_IRQ_TYPE_MSI_TRANSLATE cases it was already done early enough
>> (avoiding it remaining set on error paths), so all you'd need is adding
>> it for the PT_IRQ_TYPE_MSI path too.
> 
> .. and with the below statement ["pt_pirq_reset() is just replacing
> that assigment"] I believe having just
>  pirq->dom = d;
> 
> before the switch would be correct.
> 
>> 
>> I agree that the clearing of the field in error paths might need a
>> little care, but otoh you could equally well have hvm_dirq_assist()
>> bail when it finds it to be NULL?
> 
> Can't do it as is. I added in the 'get_knowalive_domain()' and 
> 'put_domain()'
> in the 'schedule_softirq_for' and 'dpci_softirq' respectively.
> 
> Which means that we would have an outstanding refcount as 'dpci_softirq'
> could not access the '->dom' to get the domain and do proper refcounting.
> 
> Unless I rip out the refcounting there which I believe is OK.

No - instead the site clearing the field should then drop the
reference (if so needed as indicated by other state). I.e.
either there is a softirq being processed (in which case it's
there where the reference gets dropped) or you can stop it
from getting processed subsequently and drop the reference
right away. Takes maybe another cmpxchg() afaict (on the
->dom field) or in the worst case another flag to signal this.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
  2014-09-29  7:21           ` Jan Beulich
@ 2014-10-07 15:40             ` Konrad Rzeszutek Wilk
  2014-10-07 16:10               ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-10-07 15:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

On Mon, Sep 29, 2014 at 08:21:53AM +0100, Jan Beulich wrote:
> >>> On 27.09.14 at 03:32, <konrad.wilk@oracle.com> wrote:
> > On Thu, Sep 25, 2014 at 04:04:46PM +0100, Jan Beulich wrote:
> >> >>> On 25.09.14 at 16:48, <konrad.wilk@oracle.com> wrote:
> >> > On Thu, Sep 25, 2014 at 03:24:53PM +0100, Jan Beulich wrote:
> >> >> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> >> >> > @@ -130,6 +146,7 @@ int pt_irq_create_bind(
> >> >> >          return -ENOMEM;
> >> >> >      }
> >> >> >      pirq_dpci = pirq_dpci(info);
> >> >> > +    pt_pirq_reset(d, pirq_dpci);
> >> >> >  
> >> >> >      switch ( pt_irq_bind->irq_type )
> >> >> >      {
> >> >> > @@ -232,7 +249,6 @@ int pt_irq_create_bind(
> >> >> >          {
> >> >> >              unsigned int share;
> >> >> >  
> >> >> > -            pirq_dpci->dom = d;
> >> >> >              if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
> >> >> >              {
> >> >> >                  pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
> >> >> > @@ -258,7 +274,6 @@ int pt_irq_create_bind(
> >> >> >              {
> >> >> >                  if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> >> >                      kill_timer(&pirq_dpci->timer);
> >> >> > -                pirq_dpci->dom = NULL;
> >> >> >                  list_del(&girq->list);
> >> >> >                  list_del(&digl->list);
> >> >> >                  hvm_irq_dpci->link_cnt[link]--;
> >> >> > @@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
> >> >> >          msixtbl_pt_unregister(d, pirq);
> >> >> >          if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> >> >              kill_timer(&pirq_dpci->timer);
> >> >> > -        pirq_dpci->dom   = NULL;
> >> >> >          pirq_dpci->flags = 0;
> >> >> >          pirq_cleanup_check(pirq, d);
> >> >> >      }
> >> >> 
> >> >> Is all of the above really necessary? I.e. I can neither see why setting
> >> >> ->dom earlier is needed, nor why clearing it on the error paths should
> >> >> be dropped.
> >> > 
> >> > Yes. We need the ->dom so that the hvm_dirq_assist can run without
> >> > hitting an NULL pointer exception. Please keep in mind that the moment
> >> > we setup the IRQ action handler, we are "live" - [...]
> >> 
> >> But all you need is that this happens before respective
> >> pirq_guest_bind() calls. I.e. in the PT_IRQ_TYPE_PCI and
> >> PT_IRQ_TYPE_MSI_TRANSLATE cases it was already done early enough
> >> (avoiding it remaining set on error paths), so all you'd need is adding
> >> it for the PT_IRQ_TYPE_MSI path too.
> > 
> > .. and with the below statement ["pt_pirq_reset() is just replacing
> > that assigment"] I believe having just
> >  pirq->dom = d;
> > 
> > before the switch would be correct.
> > 
> >> 
> >> I agree that the clearing of the field in error paths might need a
> >> little care, but otoh you could equally well have hvm_dirq_assist()
> >> bail when it finds it to be NULL?
> > 
> > Can't do it as is. I added in the 'get_knowalive_domain()' and 
> > 'put_domain()'
> > in the 'schedule_softirq_for' and 'dpci_softirq' respectively.
> > 
> > Which means that we would have an outstanding refcount as 'dpci_softirq'
> > could not access the '->dom' to get the domain and do proper refcounting.
> > 
> > Unless I rip out the refcounting there which I believe is OK.
> 
> No - instead the site clearing the field should then drop the
> reference (if so needed as indicated by other state). I.e.
> either there is a softirq being processed (in which case it's
> there where the reference gets dropped) or you can stop it
> from getting processed subsequently and drop the reference
> right away. Takes maybe another cmpxchg() afaict (on the
> ->dom field) or in the worst case another flag to signal this.

I've been looking at this and I've come up with something that does
address this - but instead of using 'cmpxchg' it ends up using
set_bit/clear_bit with two different states. Using 'cmpxchg' was
getting a bit too complex - or more likely I was getting too
fatigued of the code.

Anyhow here is an RFC on top of this patchset that I believe does
in spirit what you described. Naturally the whole big piece
of code that does the ref_counting on failure paths needs to be done
in an function. Please advise if this is acceptable for you
and if so I can properly flesh it out.

commit e0647b03be8319907d4f2dc424c1fcaa4895b31a
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Sun Oct 5 09:44:14 2014 -0400

    Two bits using clear/set

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 47c8ed5..cae61f1 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -29,6 +29,10 @@
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
+enum {
+    STATE_SCHED, /* Bit 0 */
+    STATE_RUN,
+};
 /*
  * Should only be called from hvm_do_IRQ_dpci. We use the
  * 'masked' as an gate to thwart multiple interrupts.
@@ -37,14 +41,14 @@ static DEFINE_PER_CPU(struct list_head, dpci_list);
  * completed executing 'hvm_dirq_assist'.
  *
  */
-static void schedule_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
+static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
     unsigned long flags;
 
-    if ( pirq_dpci->masked )
+    if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->masked) )
         return;
 
-    pirq_dpci->masked = 1;
+    get_knownalive_domain(pirq_dpci->dom);
 
     local_irq_save(flags);
     list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
@@ -55,9 +59,9 @@ static void schedule_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 
 /*
  * If we are racing with softirq_dpci (masked is still set) we return
- * -EAGAIN. Otherwise we return 0.
+ * -ERESTART. Otherwise we return 0.
  *
- *  If it is -EAGAIN, it is the callers responsibility to make sure
+ *  If it is -ERESTART, it is the callers responsibility to make sure
  *  that the softirq (with the event_lock dropped) has ran. We need
  *  to flush out the outstanding 'dpci_softirq' (no more of them
  *  will be added for this pirq as the IRQ action handler has been
@@ -65,10 +69,13 @@ static void schedule_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
  */
 int pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
 {
-    if ( pirq_dpci->masked )
-        return -EAGAIN;
+    if ( test_bit(STATE_RUN, &pirq_dpci->masked) )
+        return -ERESTART;
+
+    if ( test_bit(STATE_SCHED, &pirq_dpci->masked) )
+        return -ERESTART;
     /*
-     * If in the future we would call 'schedule_softirq_for' right away
+     * If in the future we would call 'raise_softirq_for' right away
      * after 'pt_pirq_softirq_active' we MUST reset the list (otherwise it
      * might have stale data).
      */
@@ -143,7 +150,6 @@ int pt_irq_create_bind(
     struct hvm_pirq_dpci *pirq_dpci;
     struct pirq *info;
     int rc, pirq = pt_irq_bind->machine_irq;
-    s_time_t start = NOW();
 
     if ( pirq < 0 || pirq >= d->nr_pirqs )
         return -EINVAL;
@@ -188,8 +194,6 @@ int pt_irq_create_bind(
     {
         spin_unlock(&d->event_lock);
         process_pending_softirqs();
-        if ( ( NOW() - start ) > SECONDS(1) )
-            return -EAGAIN;
         goto restart;
     }
     /*
@@ -230,7 +234,35 @@ int pt_irq_create_bind(
             {
                 pirq_dpci->gmsi.gflags = 0;
                 pirq_dpci->gmsi.gvec = 0;
-                pirq_dpci->dom = NULL;
+
+                /* The softirq has saved it so we are safe to reset it. */
+                if ( test_bit(STATE_RUN, &pirq_dpci->masked) )
+                {
+                    pirq_dpci->dom = NULL;
+                }
+                else if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->masked) )
+                {
+                    /* pirq_guest_unbind has made sure we won't be getting
+                     * any more interrupts (no raise_softirq_for), so the only
+                     * ones that can be are the ones that got scheduled _right_
+                     * before the pirq_guest_unbind. If we can de-schedule them
+                     * that is good. The problem #1 is that the softirq might be
+                     * running and it has just set STATE_RUN while we cleared
+                     * STATE_SCHED. That is OK, as right after the STATE_RUN it
+                     * will check the STATE_SCHED and if cleared it will unclear
+                     * STATE_RUN and ignore this pirq. We MUST put the refcount
+                     * down. */
+                    put_domain(pirq_dpci->dom);
+                    pirq_dpci->dom = NULL;
+                }
+                else
+                {
+                    /* !STATE_RUN (stale) and !STATE_SCHED.
+                     * softirq will ignore this pirq, but we MUST put the refcount
+                     * down. */
+                    put_domain(pirq_dpci->dom);
+                    pirq_dpci->dom = NULL;
+                }
                 pirq_dpci->flags = 0;
                 pirq_cleanup_check(info, d);
                 spin_unlock(&d->event_lock);
@@ -332,7 +364,7 @@ int pt_irq_create_bind(
             {
                 if ( pt_irq_need_timer(pirq_dpci->flags) )
                     kill_timer(&pirq_dpci->timer);
-                pirq_dpci->dom = NULL;
+                /* TODO - function. pirq_dpci->dom = NULL; */
                 list_del(&girq->list);
                 list_del(&digl->list);
                 hvm_irq_dpci->link_cnt[link]--;
@@ -538,7 +570,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
-    schedule_softirq_for(pirq_dpci);
+    raise_softirq_for(pirq_dpci);
     return 1;
 }
 
@@ -592,11 +624,10 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector)
     spin_unlock(&d->event_lock);
 }
 
-static void hvm_dirq_assist(struct hvm_pirq_dpci *pirq_dpci)
+static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
 {
-    struct domain *d = pirq_dpci->dom;
-
     /*
+     * TODO: Remove
      * We can be racing with 'pt_irq_destroy_bind' - with us being scheduled
      * right before 'pirq_guest_unbind' gets called - but us not yet executed.
      *
@@ -604,8 +635,6 @@ static void hvm_dirq_assist(struct hvm_pirq_dpci *pirq_dpci)
      * 'mapping' - which is OK as later in this code we would
      * do nothing except clear the ->masked field anyhow.
      */
-    if ( !d )
-        return;
 
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
@@ -725,11 +754,24 @@ static void dpci_softirq(void)
 
     while ( !list_empty(&our_list) )
     {
+        struct domain *d;
+
         pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, softirq_list);
         list_del(&pirq_dpci->softirq_list);
 
-        hvm_dirq_assist(pirq_dpci);
-        pirq_dpci->masked = 0;
+        d = pirq_dpci->dom;
+        barrier(); /* 'd' MUST be saved before we set/clear the bits. */
+        if ( test_and_set_bit(STATE_RUN, &pirq_dpci->masked) )
+            BUG();
+        /* Asked to be descheduled. */
+        if ( !test_and_clear_bit(STATE_SCHED, &pirq_dpci->masked) )
+        {
+            clear_bit(STATE_RUN, &pirq_dpci->masked);
+            continue;
+        }
+        hvm_dirq_assist(d, pirq_dpci);
+        put_domain(d);
+        clear_bit(STATE_RUN, &pirq_dpci->masked);
     }
 }
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index d147189..1660750 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -775,7 +775,7 @@ static int pci_clean_dpci_irqs(struct domain *d)
     struct hvm_irq_dpci *hvm_irq_dpci = NULL;
 
     if ( !iommu_enabled )
-        return -ESRCH;
+        return -ENODEV;
 
     if ( !is_hvm_domain(d) )
         return -EINVAL;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 4fc6dcf..4a9324e 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -93,7 +93,7 @@ struct hvm_irq_dpci {
 /* Machine IRQ to guest device/intx mapping. */
 struct hvm_pirq_dpci {
     uint32_t flags;
-    bool_t masked;
+    unsigned long masked;
     uint16_t pending;
     struct list_head digl_list;
     struct domain *dom;
of 'latch'
> 
> Jan
> 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
  2014-10-07 15:40             ` Konrad Rzeszutek Wilk
@ 2014-10-07 16:10               ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2014-10-07 16:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, keir, ian.jackson, ian.campbell, tim

>>> On 07.10.14 at 17:40, <konrad.wilk@oracle.com> wrote:
> @@ -65,10 +69,13 @@ static void schedule_softirq_for(struct hvm_pirq_dpci 
> *pirq_dpci)
>   */
>  int pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
>  {
> -    if ( pirq_dpci->masked )
> -        return -EAGAIN;
> +    if ( test_bit(STATE_RUN, &pirq_dpci->masked) )
> +        return -ERESTART;
> +
> +    if ( test_bit(STATE_SCHED, &pirq_dpci->masked) )
> +        return -ERESTART;

Just check both in one go?

> @@ -230,7 +234,35 @@ int pt_irq_create_bind(
>              {
>                  pirq_dpci->gmsi.gflags = 0;
>                  pirq_dpci->gmsi.gvec = 0;
> -                pirq_dpci->dom = NULL;
> +
> +                /* The softirq has saved it so we are safe to reset it. */
> +                if ( test_bit(STATE_RUN, &pirq_dpci->masked) )
> +                {
> +                    pirq_dpci->dom = NULL;
> +                }
> +                else if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->masked) )
> +                {
> +                    /* pirq_guest_unbind has made sure we won't be getting
> +                     * any more interrupts (no raise_softirq_for), so the only
> +                     * ones that can be are the ones that got scheduled _right_
> +                     * before the pirq_guest_unbind. If we can de-schedule them
> +                     * that is good. The problem #1 is that the softirq might be
> +                     * running and it has just set STATE_RUN while we cleared
> +                     * STATE_SCHED. That is OK, as right after the STATE_RUN it
> +                     * will check the STATE_SCHED and if cleared it will unclear
> +                     * STATE_RUN and ignore this pirq. We MUST put the refcount
> +                     * down. */
> +                    put_domain(pirq_dpci->dom);
> +                    pirq_dpci->dom = NULL;
> +                }
> +                else
> +                {
> +                    /* !STATE_RUN (stale) and !STATE_SCHED.
> +                     * softirq will ignore this pirq, but we MUST put the refcount
> +                     * down. */
> +                    put_domain(pirq_dpci->dom);
> +                    pirq_dpci->dom = NULL;
> +                }

First of all you don't seem to convert the setting of ->dom back to
what it was before (you said this is an incremental patch).

And then the else path is suspicious: Only when STATE_SCHED
is set there is a ref to be put. I.e. it always has to be the site
clearing that flag which also drops the ref. This is supported by
both else-if and else doing exactly the same (which can't really
be right).

> @@ -332,7 +364,7 @@ int pt_irq_create_bind(
>              {
>                  if ( pt_irq_need_timer(pirq_dpci->flags) )
>                      kill_timer(&pirq_dpci->timer);
> -                pirq_dpci->dom = NULL;
> +                /* TODO - function. pirq_dpci->dom = NULL; */

I wonder whether _here_ you really need this: Is it possible
for the softirq to get invoked when pirq_guest_bind() fails? If not,
only the other error path above would seem to need extra care.

> @@ -725,11 +754,24 @@ static void dpci_softirq(void)
>  
>      while ( !list_empty(&our_list) )
>      {
> +        struct domain *d;
> +
>          pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, softirq_list);
>          list_del(&pirq_dpci->softirq_list);
>  
> -        hvm_dirq_assist(pirq_dpci);
> -        pirq_dpci->masked = 0;
> +        d = pirq_dpci->dom;
> +        barrier(); /* 'd' MUST be saved before we set/clear the bits. */

smp_mb() I think.

> +        if ( test_and_set_bit(STATE_RUN, &pirq_dpci->masked) )
> +            BUG();
> +        /* Asked to be descheduled. */
> +        if ( !test_and_clear_bit(STATE_SCHED, &pirq_dpci->masked) )

Invert the condition and ...

> +        {
> +            clear_bit(STATE_RUN, &pirq_dpci->masked);
> +            continue;
> +        }
> +        hvm_dirq_assist(d, pirq_dpci);
> +        put_domain(d);
> +        clear_bit(STATE_RUN, &pirq_dpci->masked);

... do the clear_bit() just once?

> --- a/xen/include/xen/hvm/irq.h
> +++ b/xen/include/xen/hvm/irq.h
> @@ -93,7 +93,7 @@ struct hvm_irq_dpci {
>  /* Machine IRQ to guest device/intx mapping. */
>  struct hvm_pirq_dpci {
>      uint32_t flags;
> -    bool_t masked;
> +    unsigned long masked;

Perhaps better "state" or some such?

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-10-07 16:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-23  2:10 [PATCH v6] Fix interrupt latency of HVM PCI passthrough devices Konrad Rzeszutek Wilk
2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model Konrad Rzeszutek Wilk
2014-09-25 14:24   ` Jan Beulich
2014-09-25 14:48     ` Konrad Rzeszutek Wilk
2014-09-25 15:04       ` Jan Beulich
2014-09-27  1:32         ` Konrad Rzeszutek Wilk
2014-09-29  7:21           ` Jan Beulich
2014-10-07 15:40             ` Konrad Rzeszutek Wilk
2014-10-07 16:10               ` Jan Beulich
2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 2/3] dpci: In hvm_dirq_assist stop using pt_pirq_iterate Konrad Rzeszutek Wilk
2014-09-25 14:29   ` Jan Beulich
2014-09-23  2:10 ` [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6) Konrad Rzeszutek Wilk
2014-09-25 14:55   ` Jan Beulich
2014-09-25 15:27     ` Konrad Rzeszutek Wilk
2014-09-25 15:45       ` Jan Beulich
2014-09-25 16:05         ` Konrad Rzeszutek Wilk
2014-09-27  1:32         ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).