* [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.
@ 2014-11-19 17:31 Konrad Rzeszutek Wilk
2014-11-19 17:31 ` [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
2014-11-19 18:54 ` [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Sander Eikelenboom
0 siblings, 2 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-19 17:31 UTC (permalink / raw)
To: xen-devel, JBeulich, linux
Hey,
This patch should fix the issue that Sander had seen. The full details
are in the patch itself. Sander, if you could - please test origin/staging
with this patch to make sure it does fix the issue.
xen/drivers/passthrough/io.c | 27 +++++++++++++++++----------
Konrad Rzeszutek Wilk (1):
dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
1 file changed, 17 insertions(+), 10 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
2014-11-19 17:31 [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk
@ 2014-11-19 17:31 ` Konrad Rzeszutek Wilk
2014-11-19 18:54 ` [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Sander Eikelenboom
1 sibling, 0 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-19 17:31 UTC (permalink / raw)
To: xen-devel, JBeulich, linux; +Cc: Konrad Rzeszutek Wilk
If we pass in INTx type devices to a guest on an over-subscribed
machine - and in an over-worked guest - we can cause the
pirq_dpci->softirq_list to become corrupted.
The reason for this is that the 'pt_irq_guest_eoi' ends up
setting the 'state' to zero value. However the 'state' value
(STATE_SCHED, STATE_RUN) is used to communicate between
'raise_softirq_for' and 'dpci_softirq' to determine whether the
'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the
teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was
not adhering to the proper dialogue and was not using locked cmpxchg or
test_bit operations and ended setting 'state' set to zero. That
meant 'raise_softirq_for' was free to schedule it while the
'struct hvm_pirq_dpci'' was still on an per-cpu list.
The end result was list_del being called twice and the second call
corrupting the per-cpu list.
For this to occur one of the CPUs must be in the idle loop executing
softirqs and the interrupt handler in the guest must not
respond to the pending interrupt within 8ms, and we must receive
another interrupt for this device on another CPU.
CPU0: CPU1:
timer_softirq_action
\- pt_irq_time_out
state = 0; do_IRQ
[out of timer code, the raise_softirq
pirq_dpci is on the CPU0 dpci_list] [adds the pirq_dpci to CPU1
dpci_list as state == 0]
softirq_dpci: softirq_dpci:
list_del
[list entries are poisoned]
list_del <= BOOM
The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked
semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was
pt_pirq_softirq_reset) to use cmpxchg. We also expand said function
to reset the '->dom' only on the teardown paths - but not on the
timeouts.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
xen/drivers/passthrough/io.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..2039d31 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -57,7 +57,7 @@ enum {
* This can be called multiple times, but the softirq is only raised once.
* That is until the STATE_SCHED state has been cleared. The state can be
* cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
* the softirq had a chance to run).
*/
static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
@@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
}
/*
- * Reset the pirq_dpci->dom parameter to NULL.
+ * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
+ * reset pirq_dpci->dom parameter to NULL (used for teardown).
*
* This function checks the different states to make sure it can do it
* at the right time. If it unschedules the 'hvm_dirq_assist' from running
* it also refcounts (which is what the softirq would have done) properly.
*/
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
+ unsigned int clear)
{
struct domain *d = pirq_dpci->dom;
@@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
* to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
* in local variable before it sets STATE_RUN - and therefore will not
* dereference '->dom' which would crash.
+ *
+ * However, if this is called from 'pt_irq_time_out' we do not want to
+ * clear the '->dom' as we can re-use the 'pirq_dpci' after that and
+ * need '->dom'.
*/
- pirq_dpci->dom = NULL;
+ if ( clear )
+ pirq_dpci->dom = NULL;
break;
}
}
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
&pirq_dpci->flags) )
{
- pirq_dpci->state = 0;
+ pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */);
pirq_dpci->pending = 0;
pirq_guest_eoi(dpci_pirq(pirq_dpci));
}
@@ -285,7 +292,7 @@ int pt_irq_create_bind(
* to be scheduled but we must deal with the one that may be
* in the queue.
*/
- pt_pirq_softirq_reset(pirq_dpci);
+ pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
}
}
if ( unlikely(rc) )
@@ -536,9 +543,9 @@ int pt_irq_destroy_bind(
pirq_dpci->flags = 0;
/*
* See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
- * call to pt_pirq_softirq_reset.
+ * call to pt_pirq_softirq_cancel.
*/
- pt_pirq_softirq_reset(pirq_dpci);
+ pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
pirq_cleanup_check(pirq, d);
}
@@ -773,8 +780,8 @@ unlock:
}
/*
- * Note: 'pt_pirq_softirq_reset' can clear the STATE_SCHED before we get to
- * doing it. If that is the case we let 'pt_pirq_softirq_reset' do ref-counting.
+ * Note: 'pt_pirq_softirq_cancel' can clear the STATE_SCHED before we get to
+ * doing it. If that is the case we let 'pt_pirq_softirq_cancel' do ref-counting.
*/
static void dpci_softirq(void)
{
--
1.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.
2014-11-19 17:31 [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk
2014-11-19 17:31 ` [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
@ 2014-11-19 18:54 ` Sander Eikelenboom
2014-11-19 19:22 ` Andrew Cooper
[not found] ` <20141119190131.GA15580@laptop.dumpdata.com>
1 sibling, 2 replies; 5+ messages in thread
From: Sander Eikelenboom @ 2014-11-19 18:54 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: xen-devel, JBeulich
Wednesday, November 19, 2014, 6:31:39 PM, you wrote:
> Hey,
> This patch should fix the issue that Sander had seen. The full details
> are in the patch itself. Sander, if you could - please test origin/staging
> with this patch to make sure it does fix the issue.
> xen/drivers/passthrough/io.c | 27 +++++++++++++++++----------
> Konrad Rzeszutek Wilk (1):
> dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
> 1 file changed, 17 insertions(+), 10 deletions(-)
Hi Konrad,
Hmm just tested with a freshly cloned tree .. unfortunately it blew up again.
(i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq).
(XEN) [2014-11-19 18:41:25.999] ----[ Xen-4.5.0-rc x86_64 debug=y Not tainted ]----
(XEN) [2014-11-19 18:41:25.999] CPU: 5
(XEN) [2014-11-19 18:41:25.999] RIP: e008:[<ffff82d0801490ac>] dpci_softirq+0x9c/0x23d
(XEN) [2014-11-19 18:41:25.999] RFLAGS: 0000000000010283 CONTEXT: hypervisor
(XEN) [2014-11-19 18:41:25.999] rax: 0100100100100100 rbx: ffff8303bb688d90 rcx: 0000000000000001
(XEN) [2014-11-19 18:41:25.999] rdx: ffff83054ef18000 rsi: 0000000000000002 rdi: ffff83050b29e0b8
(XEN) [2014-11-19 18:41:25.999] rbp: ffff83054ef1feb0 rsp: ffff83054ef1fe50 r8: ffff8303bb688d60
(XEN) [2014-11-19 18:41:25.999] r9: 000001d5f62fff63 r10: 00000000deadbeef r11: 0000000000000246
(XEN) [2014-11-19 18:41:25.999] r12: ffff8303bb688d38 r13: ffff83050b29e000 r14: ffff8303bb688d28
(XEN) [2014-11-19 18:41:25.999] r15: ffff8303bb688d28 cr0: 000000008005003b cr4: 00000000000006f0
(XEN) [2014-11-19 18:41:25.999] cr3: 000000050b2c7000 cr2: ffffffffff600400
(XEN) [2014-11-19 18:41:25.999] ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) [2014-11-19 18:41:25.999] Xen stack trace from rsp=ffff83054ef1fe50:
(XEN) [2014-11-19 18:41:25.999] 0000000000000c23 ffff83050b29e0b8 ffff8303bb688d38 ffff83054ef1fe70
(XEN) [2014-11-19 18:41:25.999] ffff8303bb688d90 ffff8303bb688d90 000000fb00000000 ffff82d080300200
(XEN) [2014-11-19 18:41:25.999] ffff82d0802fff80 ffffffffffffffff ffff83054ef18000 0000000000000002
(XEN) [2014-11-19 18:41:25.999] ffff83054ef1fee0 ffff82d08012be31 ffff83054ef18000 ffff83009fd2d000
(XEN) [2014-11-19 18:41:25.999] 00000000ffffffff ffff83054ef28068 ffff83054ef1fef0 ffff82d08012be89
(XEN) [2014-11-19 18:41:25.999] ffff83054ef1ff10 ffff82d0801633e5 ffff82d08012be89 ffff83009ff8b000
(XEN) [2014-11-19 18:41:25.999] ffff83054ef1fde8 ffff880059bf8000 ffff880059bf8000 0000000000000000
(XEN) [2014-11-19 18:41:25.999] 0000000000000000 ffff880059bfbeb0 ffffffff822f3ec0 0000000000000246
(XEN) [2014-11-19 18:41:25.999] 0000000000000001 0000000000000000 0000000000000000 0000000000000000
(XEN) [2014-11-19 18:41:25.999] ffffffff810013aa ffff880059bde480 00000000deadbeef 00000000deadbeef
(XEN) [2014-11-19 18:41:25.999] 0000010000000000 ffffffff810013aa 000000000000e033 0000000000000246
(XEN) [2014-11-19 18:41:25.999] ffff880059bfbe98 000000000000e02b 1862060042c8beef 224d41480704beef
(XEN) [2014-11-19 18:41:25.999] 99171042639bbeef 74c88180108cbeef c0dc604c00000005 ffff83009ff8b000
(XEN) [2014-11-19 18:41:26.000] 00000034cebff280 ca836183a4020303
(XEN) [2014-11-19 18:41:26.000] Xen call trace:
(XEN) [2014-11-19 18:41:26.000] [<ffff82d0801490ac>] dpci_softirq+0x9c/0x23d
(XEN) [2014-11-19 18:41:26.000] [<ffff82d08012be31>] __do_softirq+0x81/0x8c
(XEN) [2014-11-19 18:41:26.000] [<ffff82d08012be89>] do_softirq+0x13/0x15
(XEN) [2014-11-19 18:41:26.000] [<ffff82d0801633e5>] idle_loop+0x5e/0x6e
(XEN) [2014-11-19 18:41:26.000]
(XEN) [2014-11-19 18:41:26.778]
(XEN) [2014-11-19 18:41:26.787] ****************************************
(XEN) [2014-11-19 18:41:26.806] Panic on CPU 5:
(XEN) [2014-11-19 18:41:26.819] GENERAL PROTECTION FAULT
(XEN) [2014-11-19 18:41:26.834] [error_code=0000]
(XEN) [2014-11-19 18:41:26.847] ****************************************
(XEN) [2014-11-19 18:41:26.867]
(XEN) [2014-11-19 18:41:26.876] Reboot in five seconds...
(XEN) [2014-11-19 18:41:26.891] APIC error on CPU0: 00(08)
(XEN) [2014-11-19 18:41:26.906] APIC error on CPU0: 08(08)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.
2014-11-19 18:54 ` [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Sander Eikelenboom
@ 2014-11-19 19:22 ` Andrew Cooper
[not found] ` <20141119190131.GA15580@laptop.dumpdata.com>
1 sibling, 0 replies; 5+ messages in thread
From: Andrew Cooper @ 2014-11-19 19:22 UTC (permalink / raw)
To: Sander Eikelenboom, Konrad Rzeszutek Wilk; +Cc: xen-devel, JBeulich
On 19/11/2014 18:54, Sander Eikelenboom wrote:
> Wednesday, November 19, 2014, 6:31:39 PM, you wrote:
>
>> Hey,
>> This patch should fix the issue that Sander had seen. The full details
>> are in the patch itself. Sander, if you could - please test origin/staging
>> with this patch to make sure it does fix the issue.
>
>> xen/drivers/passthrough/io.c | 27 +++++++++++++++++----------
>> Konrad Rzeszutek Wilk (1):
>> dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
>> 1 file changed, 17 insertions(+), 10 deletions(-)
>
> Hi Konrad,
>
> Hmm just tested with a freshly cloned tree .. unfortunately it blew up again.
> (i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq).
>
> (XEN) [2014-11-19 18:41:25.999] ----[ Xen-4.5.0-rc x86_64 debug=y Not tainted ]----
> (XEN) [2014-11-19 18:41:25.999] CPU: 5
> (XEN) [2014-11-19 18:41:25.999] RIP: e008:[<ffff82d0801490ac>] dpci_softirq+0x9c/0x23d
> (XEN) [2014-11-19 18:41:25.999] RFLAGS: 0000000000010283 CONTEXT: hypervisor
> (XEN) [2014-11-19 18:41:25.999] rax: 0100100100100100 rbx: ffff8303bb688d90 rcx: 0000000000000001
> (XEN) [2014-11-19 18:41:25.999] rdx: ffff83054ef18000 rsi: 0000000000000002 rdi: ffff83050b29e0b8
> (XEN) [2014-11-19 18:41:25.999] rbp: ffff83054ef1feb0 rsp: ffff83054ef1fe50 r8: ffff8303bb688d60
> (XEN) [2014-11-19 18:41:25.999] r9: 000001d5f62fff63 r10: 00000000deadbeef r11: 0000000000000246
> (XEN) [2014-11-19 18:41:25.999] r12: ffff8303bb688d38 r13: ffff83050b29e000 r14: ffff8303bb688d28
> (XEN) [2014-11-19 18:41:25.999] r15: ffff8303bb688d28 cr0: 000000008005003b cr4: 00000000000006f0
> (XEN) [2014-11-19 18:41:25.999] cr3: 000000050b2c7000 cr2: ffffffffff600400
> (XEN) [2014-11-19 18:41:25.999] ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) [2014-11-19 18:41:25.999] Xen stack trace from rsp=ffff83054ef1fe50:
> (XEN) [2014-11-19 18:41:25.999] 0000000000000c23 ffff83050b29e0b8 ffff8303bb688d38 ffff83054ef1fe70
> (XEN) [2014-11-19 18:41:25.999] ffff8303bb688d90 ffff8303bb688d90 000000fb00000000 ffff82d080300200
> (XEN) [2014-11-19 18:41:25.999] ffff82d0802fff80 ffffffffffffffff ffff83054ef18000 0000000000000002
> (XEN) [2014-11-19 18:41:25.999] ffff83054ef1fee0 ffff82d08012be31 ffff83054ef18000 ffff83009fd2d000
> (XEN) [2014-11-19 18:41:25.999] 00000000ffffffff ffff83054ef28068 ffff83054ef1fef0 ffff82d08012be89
> (XEN) [2014-11-19 18:41:25.999] ffff83054ef1ff10 ffff82d0801633e5 ffff82d08012be89 ffff83009ff8b000
> (XEN) [2014-11-19 18:41:25.999] ffff83054ef1fde8 ffff880059bf8000 ffff880059bf8000 0000000000000000
> (XEN) [2014-11-19 18:41:25.999] 0000000000000000 ffff880059bfbeb0 ffffffff822f3ec0 0000000000000246
> (XEN) [2014-11-19 18:41:25.999] 0000000000000001 0000000000000000 0000000000000000 0000000000000000
> (XEN) [2014-11-19 18:41:25.999] ffffffff810013aa ffff880059bde480 00000000deadbeef 00000000deadbeef
> (XEN) [2014-11-19 18:41:25.999] 0000010000000000 ffffffff810013aa 000000000000e033 0000000000000246
> (XEN) [2014-11-19 18:41:25.999] ffff880059bfbe98 000000000000e02b 1862060042c8beef 224d41480704beef
> (XEN) [2014-11-19 18:41:25.999] 99171042639bbeef 74c88180108cbeef c0dc604c00000005 ffff83009ff8b000
> (XEN) [2014-11-19 18:41:26.000] 00000034cebff280 ca836183a4020303
> (XEN) [2014-11-19 18:41:26.000] Xen call trace:
> (XEN) [2014-11-19 18:41:26.000] [<ffff82d0801490ac>] dpci_softirq+0x9c/0x23d
> (XEN) [2014-11-19 18:41:26.000] [<ffff82d08012be31>] __do_softirq+0x81/0x8c
> (XEN) [2014-11-19 18:41:26.000] [<ffff82d08012be89>] do_softirq+0x13/0x15
> (XEN) [2014-11-19 18:41:26.000] [<ffff82d0801633e5>] idle_loop+0x5e/0x6e
> (XEN) [2014-11-19 18:41:26.000]
> (XEN) [2014-11-19 18:41:26.778]
> (XEN) [2014-11-19 18:41:26.787] ****************************************
> (XEN) [2014-11-19 18:41:26.806] Panic on CPU 5:
> (XEN) [2014-11-19 18:41:26.819] GENERAL PROTECTION FAULT
> (XEN) [2014-11-19 18:41:26.834] [error_code=0000]
> (XEN) [2014-11-19 18:41:26.847] ****************************************
> (XEN) [2014-11-19 18:41:26.867]
> (XEN) [2014-11-19 18:41:26.876] Reboot in five seconds...
> (XEN) [2014-11-19 18:41:26.891] APIC error on CPU0: 00(08)
> (XEN) [2014-11-19 18:41:26.906] APIC error on CPU0: 08(08)
For the avoidance of any confusion, this is still LIST_POISON1 (see
%rax), but now a #GP fault following c/s 404227138 (now with 100% less
chance of dereferencing into guest-controlled virtual address space)
~Andrew
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.
[not found] ` <1173563549.20141119201735@eikelenboom.it>
@ 2014-11-19 22:16 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-19 22:16 UTC (permalink / raw)
To: Sander Eikelenboom, xen-devel, jbeulich, andrew.cooper3
On Wed, Nov 19, 2014 at 08:17:35PM +0100, Sander Eikelenboom wrote:
>
> Wednesday, November 19, 2014, 8:01:31 PM, you wrote:
>
> > On Wed, Nov 19, 2014 at 07:54:39PM +0100, Sander Eikelenboom wrote:
> >>
> >> Wednesday, November 19, 2014, 6:31:39 PM, you wrote:
> >>
> >> > Hey,
> >>
> >> > This patch should fix the issue that Sander had seen. The full details
> >> > are in the patch itself. Sander, if you could - please test origin/staging
> >> > with this patch to make sure it does fix the issue.
> >>
> >>
> >> > xen/drivers/passthrough/io.c | 27 +++++++++++++++++----------
> >>
> >> > Konrad Rzeszutek Wilk (1):
> >> > dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
> >>
> >> > 1 file changed, 17 insertions(+), 10 deletions(-)
> >>
> >>
> >> Hi Konrad,
> >>
> >> Hmm just tested with a freshly cloned tree .. unfortunately it blew up again.
> >> (i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq).
>
> > Argh.
>
> > Could you also try the first patch the STATE_ZOMBIE one?
>
> Building now ..
(Attached and inline)
Sander mentioned to me over IRC that with the STATE_ZOMBIE patch things work peachy for him.
The patch in combination with the previous adds two extra paths:
1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE.
Lets follow the case without the zombie patch and with the zombie patch:
w/o zombie:
timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0.
pirq_dpci is still on dpci_list.
dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
ends the loop, exits.
dpci_softirq
while (!list_emtpry)
list_del, but ->next already has LIST_POISON1 and we blow up.
w/ zombie:
timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE.
pirq_dpci is still on dpci_list.
dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state, it is STATE_ZOMBIE so returns.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
ends the loop, exits.
So it seems that the STATE_ZOMBIE is needed, but for a different reason that
Jan initially thought of:
>From c89a97f695fda245f5fcb16ddb36d3df7f6f28b9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 14 Nov 2014 12:15:26 -0500
Subject: [PATCH] dpci: Add ZOMBIE state to allow the softirq to finish with
the dpci_pirq.
When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform
and cmpxch on the state to set it to zero. That is OK on the teardown
paths as it is guarnateed that the do_IRQ action handler has been removed.
Hence no more interrupts can be scheduled. But with the introduction
of "dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked."
we now utilize the pt_pirq_softirq_cancel when we want to cancel
outstanding operations. However once we cancel them the do_IRQ is
free to schedule them back in - even if said 'struct hvm_pirq_dpci'
is still on the dpci_list.
The code base before this patch could follow this race:
\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
ends the loop, exits.
\- dpci_softirq
while (!list_emtpry)
list_del, but ->next already has LIST_POISON1 and we blow up.
This patch in combination adds two extra paths:
1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE.
Using the example above the code-paths would be now:
\- timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state, it is STATE_ZOMBIE so returns.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
ends the loop, exits.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
xen/drivers/passthrough/io.c
---
xen/drivers/passthrough/io.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 2039d31..1a26973 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -50,20 +50,26 @@ static DEFINE_PER_CPU(struct list_head, dpci_list);
enum {
STATE_SCHED,
- STATE_RUN
+ STATE_RUN,
+ STATE_ZOMBIE
};
/*
* This can be called multiple times, but the softirq is only raised once.
- * That is until the STATE_SCHED state has been cleared. The state can be
- * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
+ * That is until the STATE_SCHED and STATE_ZOMBIE state has been cleared. The
+ * STATE_SCHED and STATE_ZOMBIE state can be cleared by the 'dpci_softirq'
* or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
- * the softirq had a chance to run).
+ * (when it has executed 'hvm_dirq_assist'). The STATE_SCHED can be cleared
+ * by 'pt_pirq_softirq_cancel' (which will try to clear the state before the
+ * softirq had a chance to run).
*/
static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
{
unsigned long flags;
+ if ( test_bit(STATE_ZOMBIE, &pirq_dpci->state) )
+ return;
+
if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
return;
@@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
*/
bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
{
- if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
+ if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE) ) )
return 1;
/*
@@ -111,7 +117,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
ASSERT(spin_is_locked(&d->event_lock));
- switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
+ switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 1 << STATE_ZOMBIE ) )
{
case (1 << STATE_SCHED):
/*
@@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
/* fallthrough. */
case (1 << STATE_RUN):
case (1 << STATE_RUN) | (1 << STATE_SCHED):
+ case (1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE):
/*
* The reason it is OK to reset 'dom' when STATE_RUN bit is set is due
* to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
@@ -786,6 +793,7 @@ unlock:
static void dpci_softirq(void)
{
unsigned int cpu = smp_processor_id();
+ unsigned int reset = 0;
LIST_HEAD(our_list);
local_irq_disable();
@@ -812,7 +820,15 @@ static void dpci_softirq(void)
hvm_dirq_assist(d, pirq_dpci);
put_domain(d);
}
+ else
+ reset = 1;
+
clear_bit(STATE_RUN, &pirq_dpci->state);
+ if ( reset )
+ {
+ clear_bit(STATE_ZOMBIE, &pirq_dpci->state);
+ reset = 0;
+ }
}
}
--
1.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-11-19 22:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-19 17:31 [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk
2014-11-19 17:31 ` [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
2014-11-19 18:54 ` [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Sander Eikelenboom
2014-11-19 19:22 ` Andrew Cooper
[not found] ` <20141119190131.GA15580@laptop.dumpdata.com>
[not found] ` <1173563549.20141119201735@eikelenboom.it>
2014-11-19 22:16 ` Konrad Rzeszutek Wilk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.