All of lore.kernel.org
 help / color / mirror / Atom feed
* [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq.
@ 2014-11-19 22:21 Konrad Rzeszutek Wilk
  2014-11-19 22:21 ` [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
  2014-11-19 22:21 ` [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-19 22:21 UTC (permalink / raw)
  To: xen-devel, linux, jbeulich, andrew.cooper3

Hey,

Attached are two patches that fix the dpci_softirq list corruption
that Sander was observing.


 xen/drivers/passthrough/io.c | 55 +++++++++++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 16 deletions(-)

Konrad Rzeszutek Wilk (2):
      dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
      dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
  2014-11-19 22:21 [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk
@ 2014-11-19 22:21 ` Konrad Rzeszutek Wilk
  2014-11-20 10:43   ` Jan Beulich
  2014-11-19 22:21 ` [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-19 22:21 UTC (permalink / raw)
  To: xen-devel, linux, jbeulich, andrew.cooper3; +Cc: Konrad Rzeszutek Wilk

If we pass in INTx type devices to a guest on an over-subscribed
machine - and in an over-worked guest - we can cause the
pirq_dpci->softirq_list to become corrupted.

The reason for this is that the 'pt_irq_guest_eoi' ends up
setting the 'state' to zero value. However the 'state' value
(STATE_SCHED, STATE_RUN) is used to communicate between
 'raise_softirq_for' and 'dpci_softirq' to determine whether the
'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the
teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was
not adhering to the proper dialogue and was not using locked cmpxchg or
test_bit operations and ended setting 'state' set to zero. That
meant 'raise_softirq_for' was free to schedule it while the
'struct hvm_pirq_dpci'' was still on an per-cpu list.
The end result was list_del being called twice and the second call
corrupting the per-cpu list.

For this to occur one of the CPUs must be in the idle loop executing
softirqs and the interrupt handler in the guest must not
respond to the pending interrupt within 8ms, and we must receive
another interrupt for this device on another CPU.

CPU0:                                  CPU1:

timer_softirq_action
 \- pt_irq_time_out
     state = 0;                        do_IRQ
 [out of timer code, the                raise_softirq
 pirq_dpci is on the CPU0 dpci_list]      [adds the pirq_dpci to CPU1
                                           dpci_list as state == 0]

softirq_dpci:                            softirq_dpci:
        list_del
        [list entries are poisoned]
                                                list_del <= BOOM

The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked
semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was
pt_pirq_softirq_reset) to use cmpxchg. We also expand said function
to reset the '->dom' only on the teardown paths - but not on the
timeouts.

Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/drivers/passthrough/io.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..2039d31 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -57,7 +57,7 @@ enum {
  * This can be called multiple times, but the softirq is only raised once.
  * That is until the STATE_SCHED state has been cleared. The state can be
  * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
  * the softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
@@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
 }
 
 /*
- * Reset the pirq_dpci->dom parameter to NULL.
+ * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
+ * reset pirq_dpci->dom parameter to NULL (used for teardown).
  *
  * This function checks the different states to make sure it can do it
  * at the right time. If it unschedules the 'hvm_dirq_assist' from running
  * it also refcounts (which is what the softirq would have done) properly.
  */
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
+                                   unsigned int clear)
 {
     struct domain *d = pirq_dpci->dom;
 
@@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
          * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
          * in local variable before it sets STATE_RUN - and therefore will not
          * dereference '->dom' which would crash.
+         *
+         * However, if this is called from 'pt_irq_time_out' we do not want to
+         * clear the '->dom' as we can re-use the 'pirq_dpci' after that and
+         * need '->dom'.
          */
-        pirq_dpci->dom = NULL;
+        if ( clear )
+            pirq_dpci->dom = NULL;
         break;
     }
 }
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
                               &pirq_dpci->flags) )
     {
-        pirq_dpci->state = 0;
+        pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */);
         pirq_dpci->pending = 0;
         pirq_guest_eoi(dpci_pirq(pirq_dpci));
     }
@@ -285,7 +292,7 @@ int pt_irq_create_bind(
                      * to be scheduled but we must deal with the one that may be
                      * in the queue.
                      */
-                    pt_pirq_softirq_reset(pirq_dpci);
+                    pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
                 }
             }
             if ( unlikely(rc) )
@@ -536,9 +543,9 @@ int pt_irq_destroy_bind(
         pirq_dpci->flags = 0;
         /*
          * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
-         * call to pt_pirq_softirq_reset.
+         * call to pt_pirq_softirq_cancel.
          */
-        pt_pirq_softirq_reset(pirq_dpci);
+        pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
 
         pirq_cleanup_check(pirq, d);
     }
@@ -773,8 +780,8 @@ unlock:
 }
 
 /*
- * Note: 'pt_pirq_softirq_reset' can clear the STATE_SCHED before we get to
- * doing it. If that is the case we let 'pt_pirq_softirq_reset' do ref-counting.
+ * Note: 'pt_pirq_softirq_cancel' can clear the STATE_SCHED before we get to
+ * doing it. If that is the case we let 'pt_pirq_softirq_cancel' do ref-counting.
  */
 static void dpci_softirq(void)
 {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-19 22:21 [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk
  2014-11-19 22:21 ` [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
@ 2014-11-19 22:21 ` Konrad Rzeszutek Wilk
  2014-11-20 10:55   ` Jan Beulich
  1 sibling, 1 reply; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-19 22:21 UTC (permalink / raw)
  To: xen-devel, linux, jbeulich, andrew.cooper3; +Cc: Konrad Rzeszutek Wilk

When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform
and cmpxch on the state to set it to zero. That is OK on the teardown
paths as it is guarnateed that the do_IRQ action handler has been removed.
Hence no more interrupts can be scheduled. But with the introduction
of "dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked."
we now utilize the pt_pirq_softirq_cancel when we want to cancel
outstanding operations. However once we cancel them the do_IRQ is
free to schedule them back in - even if said 'struct hvm_pirq_dpci'
is still on the dpci_list.

The code base before this patch could follow this race:

\-timer_softirq_action
	pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0.
        pirq_dpci is still on dpci_list.
\- dpci_sofitrq
	while (!list_emptry(&our_list))
		list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
	raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
[interrupt is done, back to dpci_softirq]
	finishes the entry->next = LIST_POISON1;
		.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
	ends the loop, exits.

\- dpci_softirq
	while (!list_emtpry)
		list_del, but ->next already has LIST_POISON1 and we blow up.

This patch in combination adds two extra paths:

1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE.

Using the example above the code-paths would be now:
\- timer_softirq_action
	pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE.
        pirq_dpci is still on dpci_list.
\- dpci_sofitrq
	while (!list_emptry(&our_list))
		list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
	raise_softirq checks state, it is STATE_ZOMBIE so returns.
[interrupt is done, back to dpci_softirq]
	finishes the entry->next = LIST_POISON1;
		.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
	ends the loop, exits.

Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/drivers/passthrough/io.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 2039d31..1a26973 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -50,20 +50,26 @@ static DEFINE_PER_CPU(struct list_head, dpci_list);
 
 enum {
     STATE_SCHED,
-    STATE_RUN
+    STATE_RUN,
+    STATE_ZOMBIE
 };
 
 /*
  * This can be called multiple times, but the softirq is only raised once.
- * That is until the STATE_SCHED state has been cleared. The state can be
- * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
+ * That is until the STATE_SCHED and STATE_ZOMBIE state has been cleared. The
+ * STATE_SCHED and STATE_ZOMBIE state can be cleared by the 'dpci_softirq'
  * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
- * the softirq had a chance to run).
+ * (when it has executed 'hvm_dirq_assist'). The STATE_SCHED can be cleared
+ * by 'pt_pirq_softirq_cancel' (which will try to clear the state before the
+ * softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
     unsigned long flags;
 
+    if ( test_bit(STATE_ZOMBIE, &pirq_dpci->state) )
+        return;
+
     if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
         return;
 
@@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
  */
 bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
 {
-    if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
+    if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE) ) )
         return 1;
 
     /*
@@ -111,7 +117,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
 
     ASSERT(spin_is_locked(&d->event_lock));
 
-    switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
+    switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 1 << STATE_ZOMBIE ) )
     {
     case (1 << STATE_SCHED):
         /*
@@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
         /* fallthrough. */
     case (1 << STATE_RUN):
     case (1 << STATE_RUN) | (1 << STATE_SCHED):
+    case (1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE):
         /*
          * The reason it is OK to reset 'dom' when STATE_RUN bit is set is due
          * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
@@ -786,6 +793,7 @@ unlock:
 static void dpci_softirq(void)
 {
     unsigned int cpu = smp_processor_id();
+    unsigned int reset = 0;
     LIST_HEAD(our_list);
 
     local_irq_disable();
@@ -812,7 +820,15 @@ static void dpci_softirq(void)
             hvm_dirq_assist(d, pirq_dpci);
             put_domain(d);
         }
+        else
+            reset = 1;
+
         clear_bit(STATE_RUN, &pirq_dpci->state);
+        if ( reset )
+        {
+            clear_bit(STATE_ZOMBIE, &pirq_dpci->state);
+            reset = 0;
+        }
     }
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
  2014-11-19 22:21 ` [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
@ 2014-11-20 10:43   ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2014-11-20 10:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: andrew.cooper3, xen-devel, linux

>>> On 19.11.14 at 23:21, <konrad.wilk@oracle.com> wrote:
> @@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
>  }
>  
>  /*
> - * Reset the pirq_dpci->dom parameter to NULL.
> + * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
> + * reset pirq_dpci->dom parameter to NULL (used for teardown).
>   *
>   * This function checks the different states to make sure it can do it
>   * at the right time. If it unschedules the 'hvm_dirq_assist' from running
>   * it also refcounts (which is what the softirq would have done) properly.
>   */
> -static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
> +static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
> +                                   unsigned int clear)

Looks reasonable apart from this wanting to be bool_t, and apart
from the fact that iiuc it still doesn't eliminate the observed crashes
(in which case the fix for that may better be folded into here). I'm
not really convinced that this fix is what you currently have as
patch 2 (as the previously mentioned problem of _reset() [now
_cancel()] clearing STATE_SCHED without removing the list entry
from the list is now becoming even more obvious), but I'll also
comment there.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-19 22:21 ` [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq Konrad Rzeszutek Wilk
@ 2014-11-20 10:55   ` Jan Beulich
  2014-11-20 19:51     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2014-11-20 10:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: andrew.cooper3, xen-devel, linux

>>> On 19.11.14 at 23:21, <konrad.wilk@oracle.com> wrote:

Leaving aside the question of whether this is the right approach, in
case it is a couple of comments:

> @@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
>   */
>  bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
>  {
> -    if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
> +    if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE) ) )

This is becoming unwieldy - perhaps better just "if ( pirq_dpci->state )"?

> @@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
>          /* fallthrough. */
>      case (1 << STATE_RUN):
>      case (1 << STATE_RUN) | (1 << STATE_SCHED):
> +    case (1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE):

How would we get into this state? Afaict STATE_ZOMBIE can't be set
at the same time as STATE_SCHED.

> @@ -786,6 +793,7 @@ unlock:
>  static void dpci_softirq(void)
>  {
>      unsigned int cpu = smp_processor_id();
> +    unsigned int reset = 0;

bool_t (and would better be inside the loop, avoiding the pointless
re-init on the last iteration).

But taking it as a whole - aren't we now at risk of losing an interrupt
instance the guest expects, due to raise_softirq_for() bailing on
zombie entries, and dpci_softirq() being the only place where the
zombie state gets cleared?

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-20 10:55   ` Jan Beulich
@ 2014-11-20 19:51     ` Konrad Rzeszutek Wilk
  2014-11-20 20:18       ` Sander Eikelenboom
  2014-11-21  7:51       ` Jan Beulich
  0 siblings, 2 replies; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-20 19:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel, linux

On Thu, Nov 20, 2014 at 11:55:43AM +0100, Jan Beulich wrote:
> >>> On 19.11.14 at 23:21, <konrad.wilk@oracle.com> wrote:
> 
> Leaving aside the question of whether this is the right approach, in
> case it is a couple of comments:
> 
> > @@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
> >   */
> >  bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
> >  {
> > -    if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
> > +    if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE) ) )
> 
> This is becoming unwieldy - perhaps better just "if ( pirq_dpci->state )"?
> 
> > @@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
> >          /* fallthrough. */
> >      case (1 << STATE_RUN):
> >      case (1 << STATE_RUN) | (1 << STATE_SCHED):
> > +    case (1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE):
> 
> How would we get into this state? Afaict STATE_ZOMBIE can't be set
> at the same time as STATE_SCHED.
> 
> > @@ -786,6 +793,7 @@ unlock:
> >  static void dpci_softirq(void)
> >  {
> >      unsigned int cpu = smp_processor_id();
> > +    unsigned int reset = 0;
> 
> bool_t (and would better be inside the loop, avoiding the pointless
> re-init on the last iteration).
> 
> But taking it as a whole - aren't we now at risk of losing an interrupt
> instance the guest expects, due to raise_softirq_for() bailing on
> zombie entries, and dpci_softirq() being the only place where the
> zombie state gets cleared?

Ah crud.

So a simple fix could be to seperate the 'state' to only deal with the
raise_softirq and softirq_dpci. And then add a new (old) 'masked' to
deal between hvm_dirq_assist, pt_irq_guest_eoi and hvm_do_IRQ_dpci.


>From 94a98e20a8ab721a58788919f92e3524a6c6e25c Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 20 Nov 2014 14:28:11 -0500
Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

commit f6dd295381f4b6a66acddacf46bca8940586c8d8
"dpci: replace tasklet with softirq" used the 'masked' as an
two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
between 'raise_softirq_for' and 'dpci_softirq' to determine whether
the 'struct hvm_pirq_dpci' can be re-scheduled.

However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.

The code would trigger the following path causing list corruption:

    \-timer_softirq_action
    	pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
            pirq_dpci is still on dpci_list.
    \- dpci_sofitrq
    	while (!list_emptry(&our_list))
    		list_del, but has not yet done 'entry->next = LIST_POISON1;'
    [interrupt happens]
    	raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
    [interrupt is done, back to dpci_softirq]
    	finishes the entry->next = LIST_POISON1;
    		.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
    	ends the loop, exits.

    \- dpci_softirq
    	while (!list_emtpry)
    		list_del, but ->next already has LIST_POISON1 and we blow up.

An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!

This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.

The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/drivers/passthrough/io.c | 5 +++--
 xen/include/xen/hvm/irq.h    | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..4d8c02f 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -142,7 +142,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
                               &pirq_dpci->flags) )
     {
-        pirq_dpci->state = 0;
+        pirq_dpci->masked = 0;
         pirq_dpci->pending = 0;
         pirq_guest_eoi(dpci_pirq(pirq_dpci));
     }
@@ -610,6 +610,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
+    pirq_dpci->masked = 1;
     raise_softirq_for(pirq_dpci);
     return 1;
 }
@@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
     spin_lock(&d->event_lock);
-    if ( pirq_dpci->state )
+    if ( test_and_clear_bool(pirq_dpci->masked) )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
         const struct dev_intx_gsi_link *digl;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 9709397..3996f1f 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -94,6 +94,7 @@ struct hvm_irq_dpci {
 struct hvm_pirq_dpci {
     uint32_t flags;
     unsigned int state;
+    bool_t masked;
     uint16_t pending;
     struct list_head digl_list;
     struct domain *dom;
-- 
1.9.3

> 
> Jan
> 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-20 19:51     ` Konrad Rzeszutek Wilk
@ 2014-11-20 20:18       ` Sander Eikelenboom
  2014-11-20 20:31         ` Konrad Rzeszutek Wilk
  2014-11-21  7:51       ` Jan Beulich
  1 sibling, 1 reply; 19+ messages in thread
From: Sander Eikelenboom @ 2014-11-20 20:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: andrew.cooper3, Jan Beulich, xen-devel


Thursday, November 20, 2014, 8:51:33 PM, you wrote:

> Ah crud.

> So a simple fix could be to seperate the 'state' to only deal with the
> raise_softirq and softirq_dpci. And then add a new (old) 'masked' to
> deal between hvm_dirq_assist, pt_irq_guest_eoi and hvm_do_IRQ_dpci.


> From 94a98e20a8ab721a58788919f92e3524a6c6e25c Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date: Thu, 20 Nov 2014 14:28:11 -0500
> Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

> commit f6dd295381f4b6a66acddacf46bca8940586c8d8
> "dpci: replace tasklet with softirq" used the 'masked' as an
> two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
> between 'raise_softirq_for' and 'dpci_softirq' to determine whether
> the 'struct hvm_pirq_dpci' can be re-scheduled.

<SNIP>

Hi Konrad,

Is this patch supposed to replace both the previous ones ?

--
Sander

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-20 20:18       ` Sander Eikelenboom
@ 2014-11-20 20:31         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-20 20:31 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: andrew.cooper3, Jan Beulich, xen-devel

On Thu, Nov 20, 2014 at 09:18:30PM +0100, Sander Eikelenboom wrote:
> 
> Thursday, November 20, 2014, 8:51:33 PM, you wrote:
> 
> > Ah crud.
> 
> > So a simple fix could be to seperate the 'state' to only deal with the
> > raise_softirq and softirq_dpci. And then add a new (old) 'masked' to
> > deal between hvm_dirq_assist, pt_irq_guest_eoi and hvm_do_IRQ_dpci.
> 
> 
> > From 94a98e20a8ab721a58788919f92e3524a6c6e25c Mon Sep 17 00:00:00 2001
> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Date: Thu, 20 Nov 2014 14:28:11 -0500
> > Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.
> 
> > commit f6dd295381f4b6a66acddacf46bca8940586c8d8
> > "dpci: replace tasklet with softirq" used the 'masked' as an
> > two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
> > between 'raise_softirq_for' and 'dpci_softirq' to determine whether
> > the 'struct hvm_pirq_dpci' can be re-scheduled.
> 
> <SNIP>
> 
> Hi Konrad,
> 
> Is this patch supposed to replace both the previous ones ?

Yes.
> 
> --
> Sander
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-20 19:51     ` Konrad Rzeszutek Wilk
  2014-11-20 20:18       ` Sander Eikelenboom
@ 2014-11-21  7:51       ` Jan Beulich
  2014-11-21 11:50         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2014-11-21  7:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: andrew.cooper3, xen-devel, linux

>>> On 20.11.14 at 20:51, <konrad.wilk@oracle.com> wrote:
> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
>      ASSERT(d->arch.hvm_domain.irq.dpci);
>  
>      spin_lock(&d->event_lock);
> -    if ( pirq_dpci->state )
> +    if ( test_and_clear_bool(pirq_dpci->masked) )
>      {
>          struct pirq *pirq = dpci_pirq(pirq_dpci);
>          const struct dev_intx_gsi_link *digl;

So this now guards solely against the timeout enforced EOI? Why do
you no longer need to guard against cancellation (i.e. why isn't this
looking at both ->state and ->masked)?

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-21  7:51       ` Jan Beulich
@ 2014-11-21 11:50         ` Konrad Rzeszutek Wilk
  2014-11-21 12:03           ` Jan Beulich
  2014-11-21 12:51           ` Sander Eikelenboom
  0 siblings, 2 replies; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-21 11:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel, linux

On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 20.11.14 at 20:51, <konrad.wilk@oracle.com> wrote:
>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
>struct hvm_pirq_dpci *pirq_dpci)
>>      ASSERT(d->arch.hvm_domain.irq.dpci);
>>  
>>      spin_lock(&d->event_lock);
>> -    if ( pirq_dpci->state )
>> +    if ( test_and_clear_bool(pirq_dpci->masked) )
>>      {
>>          struct pirq *pirq = dpci_pirq(pirq_dpci);
>>          const struct dev_intx_gsi_link *digl;
>
>So this now guards solely against the timeout enforced EOI? Why do
>you no longer need to guard against cancellation (i.e. why isn't this
>looking at both ->state and ->masked)?
>

The previous state check was superfluous as the dpci_softirq would check for the valid STATE_ before calling hvm_dirq_assist (and deal with cancellation).

I actually had an cleanup patch that would have removed the 'if (pirq_dpci->state) ' and move the code for Xen 4.6.

Anyhow waiting to see if Sander was able to test with this patch.

>Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-21 11:50         ` Konrad Rzeszutek Wilk
@ 2014-11-21 12:03           ` Jan Beulich
  2014-11-21 15:14             ` Konrad Rzeszutek Wilk
  2014-11-21 12:51           ` Sander Eikelenboom
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2014-11-21 12:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: andrew.cooper3, xen-devel, linux

>>> On 21.11.14 at 12:50, <konrad.wilk@oracle.com> wrote:
> On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 20.11.14 at 20:51, <konrad.wilk@oracle.com> wrote:
>>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
>>struct hvm_pirq_dpci *pirq_dpci)
>>>      ASSERT(d->arch.hvm_domain.irq.dpci);
>>>  
>>>      spin_lock(&d->event_lock);
>>> -    if ( pirq_dpci->state )
>>> +    if ( test_and_clear_bool(pirq_dpci->masked) )
>>>      {
>>>          struct pirq *pirq = dpci_pirq(pirq_dpci);
>>>          const struct dev_intx_gsi_link *digl;
>>
>>So this now guards solely against the timeout enforced EOI? Why do
>>you no longer need to guard against cancellation (i.e. why isn't this
>>looking at both ->state and ->masked)?
>>
> 
> The previous state check was superfluous as the dpci_softirq would check for 
> the valid STATE_ before calling hvm_dirq_assist (and deal with cancellation).

I thought so first too, but then how is the guarding against
cancellation working now?

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-21 11:50         ` Konrad Rzeszutek Wilk
  2014-11-21 12:03           ` Jan Beulich
@ 2014-11-21 12:51           ` Sander Eikelenboom
  2014-11-21 15:14             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 19+ messages in thread
From: Sander Eikelenboom @ 2014-11-21 12:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: andrew.cooper3, Jan Beulich, xen-devel


Friday, November 21, 2014, 12:50:16 PM, you wrote:

> On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 20.11.14 at 20:51, <konrad.wilk@oracle.com> wrote:
>>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
>>struct hvm_pirq_dpci *pirq_dpci)
>>>      ASSERT(d->arch.hvm_domain.irq.dpci);
>>>  
>>>      spin_lock(&d->event_lock);
>>> -    if ( pirq_dpci->state )
>>> +    if ( test_and_clear_bool(pirq_dpci->masked) )
>>>      {
>>>          struct pirq *pirq = dpci_pirq(pirq_dpci);
>>>          const struct dev_intx_gsi_link *digl;
>>
>>So this now guards solely against the timeout enforced EOI? Why do
>>you no longer need to guard against cancellation (i.e. why isn't this
>>looking at both ->state and ->masked)?
>>

> The previous state check was superfluous as the dpci_softirq would check for the valid STATE_ before calling hvm_dirq_assist (and deal with cancellation).

> I actually had an cleanup patch that would have removed the 'if (pirq_dpci->state) ' and move the code for Xen 4.6.

> Anyhow waiting to see if Sander was able to test with this patch.

>>Jan

Hi Konrad / Jan,

I have tested it for 3 hours now, no host crash so far (even after applying some 
extra stress to the guest).

--
Sander

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-21 12:03           ` Jan Beulich
@ 2014-11-21 15:14             ` Konrad Rzeszutek Wilk
  2014-11-21 15:55               ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-21 15:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel, linux

[-- Attachment #1: Type: text/plain, Size: 10233 bytes --]

On Fri, Nov 21, 2014 at 01:03:43PM +0100, Jan Beulich wrote:
> >>> On 21.11.14 at 12:50, <konrad.wilk@oracle.com> wrote:
> > On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@suse.com> wrote:
> >>>>> On 20.11.14 at 20:51, <konrad.wilk@oracle.com> wrote:
> >>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
> >>struct hvm_pirq_dpci *pirq_dpci)
> >>>      ASSERT(d->arch.hvm_domain.irq.dpci);
> >>>  
> >>>      spin_lock(&d->event_lock);
> >>> -    if ( pirq_dpci->state )
> >>> +    if ( test_and_clear_bool(pirq_dpci->masked) )
> >>>      {
> >>>          struct pirq *pirq = dpci_pirq(pirq_dpci);
> >>>          const struct dev_intx_gsi_link *digl;
> >>
> >>So this now guards solely against the timeout enforced EOI? Why do
> >>you no longer need to guard against cancellation (i.e. why isn't this
> >>looking at both ->state and ->masked)?
> >>
> > 
> > The previous state check was superfluous as the dpci_softirq would check for 
> > the valid STATE_ before calling hvm_dirq_assist (and deal with cancellation).
> 
> I thought so first too, but then how is the guarding against
> cancellation working now?

Since there are two forms of cancellation, lets consider each one seperatly:

1). pt_irq_create_bind and pt_irq_destroy_bind. Each of them calling 
    pt_pirq_softirq_reset in the 'error' case or in the normal path of destruction.
    When that happens the action handler for the IRQ is removed the moment we call
    pirq_guest_unbind. As such we only have to deal with the potential outstanding
    pirq_dpci waiting to be serviced. Since we did the 'pt_pirq_softirq_reset'
    we have changed the state from STATE_SCHED to zero. That means dpci_softirq will
    NOT go in:
	if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->state) )               
        {                                                                       
            hvm_dirq_assist(d, pirq_dpci);                                      
            put_domain(d);                                                      
        }                                         
   and not call hvm_dirq_assist. The 'masked' will be set which is OK, as the moment
   the pirq is restored (pt_irq_create_bind), the 'hvm_do_IRQ_dpci' will stamp
   'masked' again and schedule the hvm_pirq_dpic. In that sense that logic is
   unchanged. We could add and 'else pirq_dpci->masked = 0' for clarity in
   'dpci_softirq' but it does not change the logic (as only hvm_dirq_assist cares
   about it and it is not called).

2). The 'pt_irq_guest_eoi' cancelling the 'hvm_dirq_assist' from running. That
   now is added - which f6dd295381f4b6a66acddacf46bca8940586c8d8
    "dpci: replace tasklet with softirq" broke. That is the cancellation between
   IRQ timeout and the hvm_dirq_assist.

3). 'pt_irq_destroy_bind' and an outstanding timer (so two cancellations at once).

   If during that time (say before the 8ms is up) the 'pt_irq_destroy_bind' is
   called it will:

    a) clear all the 'digl_list' entries - so if pt_irq_guest_eoi -
      (which takes the same lock as 'pt_irq_destroy_bind') is invoked at that
      time it won't be able to do much.
    b) kills the timer (no more 'pt_irq_guest_eoi' ),
    c) and clear STATE_SCHED so it will no longer try to run 'hvm_dirq_assist'.

   If during that time an interrupt does happen, depending on where we are in the
   teardown path - it will either send one more event in the guest or just not
   do much.

   (Thought I did spot a bug in that, see #5 below).


 4). The other case is where 'pt_irq_destroy_bind' is called while 'pt_irq_guest_eoi'
  (timeout) is in progress. AT that point nothing can happen as 'pt_irq_time_out' has
  taken the lock, so 'pt_irq_destroy_bind' cannot continue until the lock is released.

  If another interrupt is triggered and it is scheduled in, we are back at the 3)
  scenario.

 5). 'pt_irq_destroy_bind' is called while an 'hvm_dirq_assist' is just about
to service. The 'pt_irq_destroy_bind' takes the lock first and kills the
timer (and it failed at the cmpxchg since the 'state' is in STATE_RUN)  - and the
moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 4d8c02f..c5cf7c7 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -539,6 +539,7 @@ int pt_irq_destroy_bind(
          * call to pt_pirq_softirq_reset.
          */
         pt_pirq_softirq_reset(pirq_dpci);
+        pirq_dpci->masked = 0;
 
         pirq_cleanup_check(pirq, d);
     }


which will be make 'hvm_dirq_assist' inhibit calling 'set_timer' as it won't
go past:
    if ( test_and_clear_bool(pirq_dpci->masked) )                               

(which is right after the spin-lock that 'pt_irq_destroy_bind' spins on).

Let me roll in the above code snippet in the patch.

Now looking at the various error cancellations points, we have the PT_IRQ_TYPE_MSI
which we can ignore as we do not use the timer (and hvm_dirq_assist never gets to
call 'set_timer). The PT_IRQ_TYPE_PCI|PT_IRQ_TYPE_MSI_TRANSLATE error path we can
ignore that as 'There is no path for __do_IRQ to schedule softirq as IRQ_GUEST is
not set.' - so it won't ever schedule the 'hvm_dirq_dpci'. The only case we do
have to worry about is during the destruction.

Attached and inline patch:

>From 23cc8d6039c1a6002cbcb7ca4a06b20b818d3534 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 20 Nov 2014 14:28:11 -0500
Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

commit f6dd295381f4b6a66acddacf46bca8940586c8d8
"dpci: replace tasklet with softirq" used the 'masked' as an
two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
between 'raise_softirq_for' and 'dpci_softirq' to determine whether
the 'struct hvm_pirq_dpci' can be re-scheduled.

However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.

The code would trigger the following path causing list corruption:

    \-timer_softirq_action
    	pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
            pirq_dpci is still on dpci_list.
    \- dpci_sofitrq
    	while (!list_emptry(&our_list))
    		list_del, but has not yet done 'entry->next = LIST_POISON1;'
    [interrupt happens]
    	raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
    [interrupt is done, back to dpci_softirq]
    	finishes the entry->next = LIST_POISON1;
    		.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
    	ends the loop, exits.

    \- dpci_softirq
    	while (!list_emtpry)
    		list_del, but ->next already has LIST_POISON1 and we blow up.

An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!

This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.

The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.

However since we have now two seperate machines we have to deal with an
cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind'
is called while an 'hvm_dirq_assist' is just about to service.
The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and
the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.

By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that
scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
v2: Deal with 'masked' in pt_irq_destroy_bind.
---
 xen/drivers/passthrough/io.c | 6 ++++--
 xen/include/xen/hvm/irq.h    | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..c5cf7c7 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -142,7 +142,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
                               &pirq_dpci->flags) )
     {
-        pirq_dpci->state = 0;
+        pirq_dpci->masked = 0;
         pirq_dpci->pending = 0;
         pirq_guest_eoi(dpci_pirq(pirq_dpci));
     }
@@ -539,6 +539,7 @@ int pt_irq_destroy_bind(
          * call to pt_pirq_softirq_reset.
          */
         pt_pirq_softirq_reset(pirq_dpci);
+        pirq_dpci->masked = 0;
 
         pirq_cleanup_check(pirq, d);
     }
@@ -610,6 +611,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
+    pirq_dpci->masked = 1;
     raise_softirq_for(pirq_dpci);
     return 1;
 }
@@ -669,7 +671,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
     spin_lock(&d->event_lock);
-    if ( pirq_dpci->state )
+    if ( test_and_clear_bool(pirq_dpci->masked) )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
         const struct dev_intx_gsi_link *digl;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 9709397..3996f1f 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -94,6 +94,7 @@ struct hvm_irq_dpci {
 struct hvm_pirq_dpci {
     uint32_t flags;
     unsigned int state;
+    bool_t masked;
     uint16_t pending;
     struct list_head digl_list;
     struct domain *dom;
-- 
1.9.3


[-- Attachment #2: 0001-dpci-Add-masked-as-an-gate-for-hvm_dirq_assist-to-pr.patch --]
[-- Type: text/plain, Size: 4840 bytes --]

>From 23cc8d6039c1a6002cbcb7ca4a06b20b818d3534 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 20 Nov 2014 14:28:11 -0500
Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

commit f6dd295381f4b6a66acddacf46bca8940586c8d8
"dpci: replace tasklet with softirq" used the 'masked' as an
two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
between 'raise_softirq_for' and 'dpci_softirq' to determine whether
the 'struct hvm_pirq_dpci' can be re-scheduled.

However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.

The code would trigger the following path causing list corruption:

    \-timer_softirq_action
    	pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
            pirq_dpci is still on dpci_list.
    \- dpci_sofitrq
    	while (!list_emptry(&our_list))
    		list_del, but has not yet done 'entry->next = LIST_POISON1;'
    [interrupt happens]
    	raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
    [interrupt is done, back to dpci_softirq]
    	finishes the entry->next = LIST_POISON1;
    		.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
    	ends the loop, exits.

    \- dpci_softirq
    	while (!list_emtpry)
    		list_del, but ->next already has LIST_POISON1 and we blow up.

An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!

This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.

The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.

However since we have now two seperate machines we have to deal with an
cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind'
is called while an 'hvm_dirq_assist' is just about to service.
The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and
the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.

By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that
scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
v2: Deal with 'masked' in pt_irq_destroy_bind.
---
 xen/drivers/passthrough/io.c | 6 ++++--
 xen/include/xen/hvm/irq.h    | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..c5cf7c7 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -142,7 +142,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
                               &pirq_dpci->flags) )
     {
-        pirq_dpci->state = 0;
+        pirq_dpci->masked = 0;
         pirq_dpci->pending = 0;
         pirq_guest_eoi(dpci_pirq(pirq_dpci));
     }
@@ -539,6 +539,7 @@ int pt_irq_destroy_bind(
          * call to pt_pirq_softirq_reset.
          */
         pt_pirq_softirq_reset(pirq_dpci);
+        pirq_dpci->masked = 0;
 
         pirq_cleanup_check(pirq, d);
     }
@@ -610,6 +611,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
+    pirq_dpci->masked = 1;
     raise_softirq_for(pirq_dpci);
     return 1;
 }
@@ -669,7 +671,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
     spin_lock(&d->event_lock);
-    if ( pirq_dpci->state )
+    if ( test_and_clear_bool(pirq_dpci->masked) )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
         const struct dev_intx_gsi_link *digl;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 9709397..3996f1f 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -94,6 +94,7 @@ struct hvm_irq_dpci {
 struct hvm_pirq_dpci {
     uint32_t flags;
     unsigned int state;
+    bool_t masked;
     uint16_t pending;
     struct list_head digl_list;
     struct domain *dom;
-- 
1.9.3


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-21 12:51           ` Sander Eikelenboom
@ 2014-11-21 15:14             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-21 15:14 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: andrew.cooper3, Jan Beulich, xen-devel

On Fri, Nov 21, 2014 at 01:51:24PM +0100, Sander Eikelenboom wrote:
> 
> Friday, November 21, 2014, 12:50:16 PM, you wrote:
> 
> > On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@suse.com> wrote:
> >>>>> On 20.11.14 at 20:51, <konrad.wilk@oracle.com> wrote:
> >>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
> >>struct hvm_pirq_dpci *pirq_dpci)
> >>>      ASSERT(d->arch.hvm_domain.irq.dpci);
> >>>  
> >>>      spin_lock(&d->event_lock);
> >>> -    if ( pirq_dpci->state )
> >>> +    if ( test_and_clear_bool(pirq_dpci->masked) )
> >>>      {
> >>>          struct pirq *pirq = dpci_pirq(pirq_dpci);
> >>>          const struct dev_intx_gsi_link *digl;
> >>
> >>So this now guards solely against the timeout enforced EOI? Why do
> >>you no longer need to guard against cancellation (i.e. why isn't this
> >>looking at both ->state and ->masked)?
> >>
> 
> > The previous state check was superfluous as the dpci_softirq would check for the valid STATE_ before calling hvm_dirq_assist (and deal with cancellation).
> 
> > I actually had an cleanup patch that would have removed the 'if (pirq_dpci->state) ' and move the code for Xen 4.6.
> 
> > Anyhow waiting to see if Sander was able to test with this patch.
> 
> >>Jan
> 
> Hi Konrad / Jan,
> 
> I have tested it for 3 hours now, no host crash so far (even after applying some 
> extra stress to the guest).

Yeey! Thank you for being so flexible and willing to test these patches out!
> 
> --
> Sander
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-21 15:14             ` Konrad Rzeszutek Wilk
@ 2014-11-21 15:55               ` Jan Beulich
  2014-11-21 16:45                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2014-11-21 15:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: andrew.cooper3, xen-devel, linux

>>> On 21.11.14 at 16:14, <konrad.wilk@oracle.com> wrote:
> On Fri, Nov 21, 2014 at 01:03:43PM +0100, Jan Beulich wrote:
>> >>> On 21.11.14 at 12:50, <konrad.wilk@oracle.com> wrote:
>> > On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@suse.com> wrote:
>> >>>>> On 20.11.14 at 20:51, <konrad.wilk@oracle.com> wrote:
>> >>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
>> >>struct hvm_pirq_dpci *pirq_dpci)
>> >>>      ASSERT(d->arch.hvm_domain.irq.dpci);
>> >>>  
>> >>>      spin_lock(&d->event_lock);
>> >>> -    if ( pirq_dpci->state )
>> >>> +    if ( test_and_clear_bool(pirq_dpci->masked) )
>> >>>      {
>> >>>          struct pirq *pirq = dpci_pirq(pirq_dpci);
>> >>>          const struct dev_intx_gsi_link *digl;
>> >>
>> >>So this now guards solely against the timeout enforced EOI? Why do
>> >>you no longer need to guard against cancellation (i.e. why isn't this
>> >>looking at both ->state and ->masked)?
>> >>
>> > 
>> > The previous state check was superfluous as the dpci_softirq would check for 
>> > the valid STATE_ before calling hvm_dirq_assist (and deal with cancellation).
>> 
>> I thought so first too, but then how is the guarding against
>> cancellation working now?
> 
> Since there are two forms of cancellation, lets consider each one seperatly:
> 
> 1). pt_irq_create_bind and pt_irq_destroy_bind. Each of them calling 
>     pt_pirq_softirq_reset in the 'error' case or in the normal path of destruction.
>     When that happens the action handler for the IRQ is removed the moment we call
>     pirq_guest_unbind. As such we only have to deal with the potential outstanding
>     pirq_dpci waiting to be serviced. Since we did the 'pt_pirq_softirq_reset'
>     we have changed the state from STATE_SCHED to zero. That means dpci_softirq will
>     NOT go in:
> 	if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->state) )               

Unless the flag got set again in the meantime. Since the event lock
is being acquired right before the line quoted above, _that_ is
what needs closely looking at (and why I was asking about the
deletion of the [at the first glance unnecessary] checking of ->state
in hvm_dirq_assist()).

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-21 15:55               ` Jan Beulich
@ 2014-11-21 16:45                 ` Konrad Rzeszutek Wilk
  2014-11-24  8:58                   ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-21 16:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel, linux

On Fri, Nov 21, 2014 at 03:55:37PM +0000, Jan Beulich wrote:
> >>> On 21.11.14 at 16:14, <konrad.wilk@oracle.com> wrote:
> > On Fri, Nov 21, 2014 at 01:03:43PM +0100, Jan Beulich wrote:
> >> >>> On 21.11.14 at 12:50, <konrad.wilk@oracle.com> wrote:
> >> > On November 21, 2014 2:51:33 AM EST, Jan Beulich <JBeulich@suse.com> wrote:
> >> >>>>> On 20.11.14 at 20:51, <konrad.wilk@oracle.com> wrote:
> >> >>> @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
> >> >>struct hvm_pirq_dpci *pirq_dpci)
> >> >>>      ASSERT(d->arch.hvm_domain.irq.dpci);
> >> >>>  
> >> >>>      spin_lock(&d->event_lock);
> >> >>> -    if ( pirq_dpci->state )
> >> >>> +    if ( test_and_clear_bool(pirq_dpci->masked) )
> >> >>>      {
> >> >>>          struct pirq *pirq = dpci_pirq(pirq_dpci);
> >> >>>          const struct dev_intx_gsi_link *digl;
> >> >>
> >> >>So this now guards solely against the timeout enforced EOI? Why do
> >> >>you no longer need to guard against cancellation (i.e. why isn't this
> >> >>looking at both ->state and ->masked)?
> >> >>
> >> > 
> >> > The previous state check was superfluous as the dpci_softirq would check for 
> >> > the valid STATE_ before calling hvm_dirq_assist (and deal with cancellation).
> >> 
> >> I thought so first too, but then how is the guarding against
> >> cancellation working now?
> > 
> > Since there are two forms of cancellation, lets consider each one seperatly:
> > 
> > 1). pt_irq_create_bind and pt_irq_destroy_bind. Each of them calling 
> >     pt_pirq_softirq_reset in the 'error' case or in the normal path of destruction.
> >     When that happens the action handler for the IRQ is removed the moment we call
> >     pirq_guest_unbind. As such we only have to deal with the potential outstanding
> >     pirq_dpci waiting to be serviced. Since we did the 'pt_pirq_softirq_reset'
> >     we have changed the state from STATE_SCHED to zero. That means dpci_softirq will
> >     NOT go in:
> > 	if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->state) )               
> 
> Unless the flag got set again in the meantime. Since the event lock

I am having a hard time seeing when 'state' be set in that scenario.

The 'raise_softirq_for' is called when 'pirq->dpci->flags' has some value. That
gets cleared in 'pt_irq_create_bind' and 'pt_irq_destroy_bind'. Also inside
of those functions we try our best to change the 'state' from STATE_SCHED to zero.

But lets imagine we are in 'hvm_dirq_assist' right about to take the lock,
and another CPU is calling 'pt_irq_create_bind'.

The 'pt_irq_create_bind' will spin on:

                                                                                 
234     /*                                                                          
235      * A crude 'while' loop with us dropping the spinlock and giving            
236      * the softirq_dpci a chance to run.                                        
237      * We MUST check for this condition as the softirq could be scheduled       
238      * and hasn't run yet. Note that this code replaced tasklet_kill which      
239      * would have spun forever and would do the same thing (wait to flush out   
240      * outstanding hvm_dirq_assist calls.                                       
241      */                                                                         
242     if ( pt_pirq_softirq_active(pirq_dpci) )                                    
243     {                                                                           
244         spin_unlock(&d->event_lock);                                            
245         cpu_relax();                                                            
246         goto restart;                                                           

OK, so that is not it.

Lets try another case - 'pt_irq_creates_bind' is in the error path:

   rc = msixtbl_pt_register(d, info, pt_irq_bind->u.msi.gtable);   
279                 if ( unlikely(rc) )                                             
280                 {                                                               
281                     pirq_guest_unbind(d, info);                                 
282                     /*                                                          
283                      * Between 'pirq_guest_bind' and before 'pirq_guest_unbind' 
284                      * an interrupt can be scheduled. No more of them are going 
285                      * to be scheduled but we must deal with the one that may be
286                      * in the queue.                                            
287                      */                                                         
288                     pt_pirq_softirq_reset(pirq_dpci);                  

And right at that moment the softirq is running. It has already set
STATE_RUN and just cleared STATE_SCHED and is executing 'hvm_dirq_assist'.
It is stuck on the spin_lock waiting for that.

The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset' to change the 'state'
(as the state is in 'STATE_RUN', not 'STATE_SCHED').  'pt_irq_create_bind' continues
on with setting '->flag=0' and unlocks the lock.

'hvm_dirq_assist' grabs the lock and continues one (and state is STATE_RUN).
Since 'flag = 0' and 'digl_list' is empty, it thunders through the 'hvm_dirq_assist'
not doing anything. Well, it ends up calling the dreaded 'set_timer' - which will hit
another assert as it has never been initialized. Adding in 'mapping = 0' on the
error paths for MSI takes care of that.  That will take care of that - and I just
rolled 'masked =0' in the pt_pirq_softirq_reset function.

The legacy interrupt does not have this window, so it cannot do it.

> is being acquired right before the line quoted above, _that_ is

No. The event lock is acquired in 'hvm_dirq_assist' which would be
checking the 'mapping', not 'state'.

   spin_lock(&d->event_lock);                                                  
   if ( test_and_clear_bool(pirq_dpci->masked) )  

> what needs closely looking at (and why I was asking about the
> deletion of the [at the first glance unnecessary] checking of ->state
> in hvm_dirq_assist()).


>From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 20 Nov 2014 14:28:11 -0500
Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

commit f6dd295381f4b6a66acddacf46bca8940586c8d8
"dpci: replace tasklet with softirq" used the 'masked' as an
two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
between 'raise_softirq_for' and 'dpci_softirq' to determine whether
the 'struct hvm_pirq_dpci' can be re-scheduled.

However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.

The code would trigger the following path causing list corruption:

    \-timer_softirq_action
    	pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
            pirq_dpci is still on dpci_list.
    \- dpci_sofitrq
    	while (!list_emptry(&our_list))
    		list_del, but has not yet done 'entry->next = LIST_POISON1;'
    [interrupt happens]
    	raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
    [interrupt is done, back to dpci_softirq]
    	finishes the entry->next = LIST_POISON1;
    		.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
    	ends the loop, exits.

    \- dpci_softirq
    	while (!list_emtpry)
    		list_del, but ->next already has LIST_POISON1 and we blow up.

An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!

This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.

The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.

However since we have now two seperate machines we have to deal with an
cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind'
is called while an 'hvm_dirq_assist' is just about to service.
The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and
the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.

By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that
scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'.

In the 'pt_irq_create_bind' - in the error cases we could be seeing
an softirq scheduled right away and being serviced (though stuck at
the spinlock).  The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset'
to change the 'state' (as the state is in 'STATE_RUN', not 'STATE_SCHED').
'pt_irq_create_bind' continues on with setting '->flag=0' and unlocks the lock.

'hvm_dirq_assist' grabs the lock and continues one. Since 'flag = 0' and
'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing
anything until it hits 'set_timer' which is undefined for MSI. Adding
in 'masked=0' for the MSI case fixes that.

The legacy interrupt one does not need it as there is no chance of
do_IRQ being called at that point.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
v2: Deal with 'masked' in pt_irq_destroy_bind.
v3: Deal with 'masked' in pt_irq_create_bind.
---
 xen/drivers/passthrough/io.c | 12 ++++++++++--
 xen/include/xen/hvm/irq.h    |  1 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..ae050df 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -129,6 +129,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
         pirq_dpci->dom = NULL;
         break;
     }
+    /*
+     * Inhibit 'hvm_dirq_assist' from doing anything useful and at worst
+     * calling 'set_timer' which will blow up (as we have called kill_timer
+     * or never initialized it). Note that we hold the lock that
+     * 'hvm_dirq_assist' could be spinning on.
+     */
+    pirq_dpci->masked = 0;
 }
 
 bool_t pt_irq_need_timer(uint32_t flags)
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
                               &pirq_dpci->flags) )
     {
-        pirq_dpci->state = 0;
+        pirq_dpci->masked = 0;
         pirq_dpci->pending = 0;
         pirq_guest_eoi(dpci_pirq(pirq_dpci));
     }
@@ -610,6 +617,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
+    pirq_dpci->masked = 1;
     raise_softirq_for(pirq_dpci);
     return 1;
 }
@@ -669,7 +677,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
     spin_lock(&d->event_lock);
-    if ( pirq_dpci->state )
+    if ( test_and_clear_bool(pirq_dpci->masked) )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
         const struct dev_intx_gsi_link *digl;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 9709397..3996f1f 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -94,6 +94,7 @@ struct hvm_irq_dpci {
 struct hvm_pirq_dpci {
     uint32_t flags;
     unsigned int state;
+    bool_t masked;
     uint16_t pending;
     struct list_head digl_list;
     struct domain *dom;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-21 16:45                 ` Konrad Rzeszutek Wilk
@ 2014-11-24  8:58                   ` Jan Beulich
  2014-11-24 12:58                     ` Sander Eikelenboom
  2014-11-24 14:40                     ` Is: dpci: Add 'masked' as an gate for hvm_dirq_assist to process. Was:Re: " Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Beulich @ 2014-11-24  8:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: andrew.cooper3, xen-devel, linux

>>> On 21.11.14 at 17:45, <konrad.wilk@oracle.com> wrote:
> From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date: Thu, 20 Nov 2014 14:28:11 -0500
> Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to 
> process.

So am I right in understanding that this is then the only thing that
needs to go into the tree to fix Sander's problem? No zombie state
anymore?

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-24  8:58                   ` Jan Beulich
@ 2014-11-24 12:58                     ` Sander Eikelenboom
  2014-11-24 14:40                     ` Is: dpci: Add 'masked' as an gate for hvm_dirq_assist to process. Was:Re: " Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 19+ messages in thread
From: Sander Eikelenboom @ 2014-11-24 12:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel


Monday, November 24, 2014, 9:58:05 AM, you wrote:

>>>> On 21.11.14 at 17:45, <konrad.wilk@oracle.com> wrote:
>> From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001
>> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Date: Thu, 20 Nov 2014 14:28:11 -0500
>> Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to 
>> process.

> So am I right in understanding that this is then the only thing that
> needs to go into the tree to fix Sander's problem? No zombie state
> anymore?

> Jan


Hi Jan,

Yes this single patch "works for me"™.

--
Sander


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Is: dpci: Add 'masked' as an gate for hvm_dirq_assist to process. Was:Re: [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
  2014-11-24  8:58                   ` Jan Beulich
  2014-11-24 12:58                     ` Sander Eikelenboom
@ 2014-11-24 14:40                     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-24 14:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel, linux

[-- Attachment #1: Type: text/plain, Size: 635 bytes --]

On Mon, Nov 24, 2014 at 08:58:05AM +0000, Jan Beulich wrote:
> >>> On 21.11.14 at 17:45, <konrad.wilk@oracle.com> wrote:
> > From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001
> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Date: Thu, 20 Nov 2014 14:28:11 -0500
> > Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to 
> > process.
> 
> So am I right in understanding that this is then the only thing that
> needs to go into the tree to fix Sander's problem? No zombie state

Yes.
> anymore?

Correct. Just that single tiny patch. Here it is attached for your
convience.

> 
> Jan
> 

[-- Attachment #2: 0001-dpci-Add-masked-as-an-gate-for-hvm_dirq_assist-to-pr.patch --]
[-- Type: text/plain, Size: 5902 bytes --]

>From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 20 Nov 2014 14:28:11 -0500
Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

commit f6dd295381f4b6a66acddacf46bca8940586c8d8
"dpci: replace tasklet with softirq" used the 'masked' as an
two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
between 'raise_softirq_for' and 'dpci_softirq' to determine whether
the 'struct hvm_pirq_dpci' can be re-scheduled.

However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.

The code would trigger the following path causing list corruption:

    \-timer_softirq_action
    	pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
            pirq_dpci is still on dpci_list.
    \- dpci_sofitrq
    	while (!list_emptry(&our_list))
    		list_del, but has not yet done 'entry->next = LIST_POISON1;'
    [interrupt happens]
    	raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
    [interrupt is done, back to dpci_softirq]
    	finishes the entry->next = LIST_POISON1;
    		.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
    	ends the loop, exits.

    \- dpci_softirq
    	while (!list_emtpry)
    		list_del, but ->next already has LIST_POISON1 and we blow up.

An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!

This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.

The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.

However since we have now two seperate machines we have to deal with an
cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind'
is called while an 'hvm_dirq_assist' is just about to service.
The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and
the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.

By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that
scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'.

In the 'pt_irq_create_bind' - in the error cases we could be seeing
an softirq scheduled right away and being serviced (though stuck at
the spinlock).  The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset'
to change the 'state' (as the state is in 'STATE_RUN', not 'STATE_SCHED').
'pt_irq_create_bind' continues on with setting '->flag=0' and unlocks the lock.

'hvm_dirq_assist' grabs the lock and continues one. Since 'flag = 0' and
'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing
anything until it hits 'set_timer' which is undefined for MSI. Adding
in 'masked=0' for the MSI case fixes that.

The legacy interrupt one does not need it as there is no chance of
do_IRQ being called at that point.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
v2: Deal with 'masked' in pt_irq_destroy_bind.
v3: Deal with 'masked' in pt_irq_create_bind.
---
 xen/drivers/passthrough/io.c | 12 ++++++++++--
 xen/include/xen/hvm/irq.h    |  1 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..ae050df 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -129,6 +129,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
         pirq_dpci->dom = NULL;
         break;
     }
+    /*
+     * Inhibit 'hvm_dirq_assist' from doing anything useful and at worst
+     * calling 'set_timer' which will blow up (as we have called kill_timer
+     * or never initialized it). Note that we hold the lock that
+     * 'hvm_dirq_assist' could be spinning on.
+     */
+    pirq_dpci->masked = 0;
 }
 
 bool_t pt_irq_need_timer(uint32_t flags)
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
                               &pirq_dpci->flags) )
     {
-        pirq_dpci->state = 0;
+        pirq_dpci->masked = 0;
         pirq_dpci->pending = 0;
         pirq_guest_eoi(dpci_pirq(pirq_dpci));
     }
@@ -610,6 +617,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
          !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         return 0;
 
+    pirq_dpci->masked = 1;
     raise_softirq_for(pirq_dpci);
     return 1;
 }
@@ -669,7 +677,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
     ASSERT(d->arch.hvm_domain.irq.dpci);
 
     spin_lock(&d->event_lock);
-    if ( pirq_dpci->state )
+    if ( test_and_clear_bool(pirq_dpci->masked) )
     {
         struct pirq *pirq = dpci_pirq(pirq_dpci);
         const struct dev_intx_gsi_link *digl;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 9709397..3996f1f 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -94,6 +94,7 @@ struct hvm_irq_dpci {
 struct hvm_pirq_dpci {
     uint32_t flags;
     unsigned int state;
+    bool_t masked;
     uint16_t pending;
     struct list_head digl_list;
     struct domain *dom;
-- 
1.9.3


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-11-24 14:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-19 22:21 [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk
2014-11-19 22:21 ` [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
2014-11-20 10:43   ` Jan Beulich
2014-11-19 22:21 ` [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq Konrad Rzeszutek Wilk
2014-11-20 10:55   ` Jan Beulich
2014-11-20 19:51     ` Konrad Rzeszutek Wilk
2014-11-20 20:18       ` Sander Eikelenboom
2014-11-20 20:31         ` Konrad Rzeszutek Wilk
2014-11-21  7:51       ` Jan Beulich
2014-11-21 11:50         ` Konrad Rzeszutek Wilk
2014-11-21 12:03           ` Jan Beulich
2014-11-21 15:14             ` Konrad Rzeszutek Wilk
2014-11-21 15:55               ` Jan Beulich
2014-11-21 16:45                 ` Konrad Rzeszutek Wilk
2014-11-24  8:58                   ` Jan Beulich
2014-11-24 12:58                     ` Sander Eikelenboom
2014-11-24 14:40                     ` Is: dpci: Add 'masked' as an gate for hvm_dirq_assist to process. Was:Re: " Konrad Rzeszutek Wilk
2014-11-21 12:51           ` Sander Eikelenboom
2014-11-21 15:14             ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.