* [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.
@ 2014-11-19 17:31 Konrad Rzeszutek Wilk
2014-11-19 17:31 ` [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
2014-11-19 18:54 ` [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Sander Eikelenboom
0 siblings, 2 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-19 17:31 UTC (permalink / raw)
To: xen-devel, JBeulich, linux
Hey,
This patch should fix the issue that Sander had seen. The full details
are in the patch itself. Sander, if you could - please test origin/staging
with this patch to make sure it does fix the issue.
xen/drivers/passthrough/io.c | 27 +++++++++++++++++----------
Konrad Rzeszutek Wilk (1):
dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
1 file changed, 17 insertions(+), 10 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread* [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. 2014-11-19 17:31 [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk @ 2014-11-19 17:31 ` Konrad Rzeszutek Wilk 2014-11-19 18:54 ` [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Sander Eikelenboom 1 sibling, 0 replies; 5+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-11-19 17:31 UTC (permalink / raw) To: xen-devel, JBeulich, linux; +Cc: Konrad Rzeszutek Wilk If we pass in INTx type devices to a guest on an over-subscribed machine - and in an over-worked guest - we can cause the pirq_dpci->softirq_list to become corrupted. The reason for this is that the 'pt_irq_guest_eoi' ends up setting the 'state' to zero value. However the 'state' value (STATE_SCHED, STATE_RUN) is used to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list. The end result was list_del being called twice and the second call corrupting the per-cpu list. For this to occur one of the CPUs must be in the idle loop executing softirqs and the interrupt handler in the guest must not respond to the pending interrupt within 8ms, and we must receive another interrupt for this device on another CPU. CPU0: CPU1: timer_softirq_action \- pt_irq_time_out state = 0; do_IRQ [out of timer code, the raise_softirq pirq_dpci is on the CPU0 dpci_list] [adds the pirq_dpci to CPU1 dpci_list as state == 0] softirq_dpci: softirq_dpci: list_del [list entries are poisoned] list_del <= BOOM The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was pt_pirq_softirq_reset) to use cmpxchg. We also expand said function to reset the '->dom' only on the teardown paths - but not on the timeouts. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- xen/drivers/passthrough/io.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..2039d31 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -57,7 +57,7 @@ enum { * This can be called multiple times, but the softirq is only raised once. * That is until the STATE_SCHED state has been cleared. The state can be * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before + * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before * the softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) @@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) } /* - * Reset the pirq_dpci->dom parameter to NULL. + * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set, + * reset pirq_dpci->dom parameter to NULL (used for teardown). * * This function checks the different states to make sure it can do it * at the right time. If it unschedules the 'hvm_dirq_assist' from running * it also refcounts (which is what the softirq would have done) properly. */ -static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, + unsigned int clear) { struct domain *d = pirq_dpci->dom; @@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' * in local variable before it sets STATE_RUN - and therefore will not * dereference '->dom' which would crash. + * + * However, if this is called from 'pt_irq_time_out' we do not want to + * clear the '->dom' as we can re-use the 'pirq_dpci' after that and + * need '->dom'. */ - pirq_dpci->dom = NULL; + if ( clear ) + pirq_dpci->dom = NULL; break; } } @@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT, &pirq_dpci->flags) ) { - pirq_dpci->state = 0; + pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */); pirq_dpci->pending = 0; pirq_guest_eoi(dpci_pirq(pirq_dpci)); } @@ -285,7 +292,7 @@ int pt_irq_create_bind( * to be scheduled but we must deal with the one that may be * in the queue. */ - pt_pirq_softirq_reset(pirq_dpci); + pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */); } } if ( unlikely(rc) ) @@ -536,9 +543,9 @@ int pt_irq_destroy_bind( pirq_dpci->flags = 0; /* * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the - * call to pt_pirq_softirq_reset. + * call to pt_pirq_softirq_cancel. */ - pt_pirq_softirq_reset(pirq_dpci); + pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */); pirq_cleanup_check(pirq, d); } @@ -773,8 +780,8 @@ unlock: } /* - * Note: 'pt_pirq_softirq_reset' can clear the STATE_SCHED before we get to - * doing it. If that is the case we let 'pt_pirq_softirq_reset' do ref-counting. + * Note: 'pt_pirq_softirq_cancel' can clear the STATE_SCHED before we get to + * doing it. If that is the case we let 'pt_pirq_softirq_cancel' do ref-counting. */ static void dpci_softirq(void) { -- 1.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq. 2014-11-19 17:31 [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk 2014-11-19 17:31 ` [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk @ 2014-11-19 18:54 ` Sander Eikelenboom 2014-11-19 19:22 ` Andrew Cooper [not found] ` <20141119190131.GA15580@laptop.dumpdata.com> 1 sibling, 2 replies; 5+ messages in thread From: Sander Eikelenboom @ 2014-11-19 18:54 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel, JBeulich Wednesday, November 19, 2014, 6:31:39 PM, you wrote: > Hey, > This patch should fix the issue that Sander had seen. The full details > are in the patch itself. Sander, if you could - please test origin/staging > with this patch to make sure it does fix the issue. > xen/drivers/passthrough/io.c | 27 +++++++++++++++++---------- > Konrad Rzeszutek Wilk (1): > dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. > 1 file changed, 17 insertions(+), 10 deletions(-) Hi Konrad, Hmm just tested with a freshly cloned tree .. unfortunately it blew up again. (i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq). (XEN) [2014-11-19 18:41:25.999] ----[ Xen-4.5.0-rc x86_64 debug=y Not tainted ]---- (XEN) [2014-11-19 18:41:25.999] CPU: 5 (XEN) [2014-11-19 18:41:25.999] RIP: e008:[<ffff82d0801490ac>] dpci_softirq+0x9c/0x23d (XEN) [2014-11-19 18:41:25.999] RFLAGS: 0000000000010283 CONTEXT: hypervisor (XEN) [2014-11-19 18:41:25.999] rax: 0100100100100100 rbx: ffff8303bb688d90 rcx: 0000000000000001 (XEN) [2014-11-19 18:41:25.999] rdx: ffff83054ef18000 rsi: 0000000000000002 rdi: ffff83050b29e0b8 (XEN) [2014-11-19 18:41:25.999] rbp: ffff83054ef1feb0 rsp: ffff83054ef1fe50 r8: ffff8303bb688d60 (XEN) [2014-11-19 18:41:25.999] r9: 000001d5f62fff63 r10: 00000000deadbeef r11: 0000000000000246 (XEN) [2014-11-19 18:41:25.999] r12: ffff8303bb688d38 r13: ffff83050b29e000 r14: ffff8303bb688d28 (XEN) [2014-11-19 18:41:25.999] r15: ffff8303bb688d28 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) [2014-11-19 18:41:25.999] cr3: 000000050b2c7000 cr2: ffffffffff600400 (XEN) [2014-11-19 18:41:25.999] ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) [2014-11-19 18:41:25.999] Xen stack trace from rsp=ffff83054ef1fe50: (XEN) [2014-11-19 18:41:25.999] 0000000000000c23 ffff83050b29e0b8 ffff8303bb688d38 ffff83054ef1fe70 (XEN) [2014-11-19 18:41:25.999] ffff8303bb688d90 ffff8303bb688d90 000000fb00000000 ffff82d080300200 (XEN) [2014-11-19 18:41:25.999] ffff82d0802fff80 ffffffffffffffff ffff83054ef18000 0000000000000002 (XEN) [2014-11-19 18:41:25.999] ffff83054ef1fee0 ffff82d08012be31 ffff83054ef18000 ffff83009fd2d000 (XEN) [2014-11-19 18:41:25.999] 00000000ffffffff ffff83054ef28068 ffff83054ef1fef0 ffff82d08012be89 (XEN) [2014-11-19 18:41:25.999] ffff83054ef1ff10 ffff82d0801633e5 ffff82d08012be89 ffff83009ff8b000 (XEN) [2014-11-19 18:41:25.999] ffff83054ef1fde8 ffff880059bf8000 ffff880059bf8000 0000000000000000 (XEN) [2014-11-19 18:41:25.999] 0000000000000000 ffff880059bfbeb0 ffffffff822f3ec0 0000000000000246 (XEN) [2014-11-19 18:41:25.999] 0000000000000001 0000000000000000 0000000000000000 0000000000000000 (XEN) [2014-11-19 18:41:25.999] ffffffff810013aa ffff880059bde480 00000000deadbeef 00000000deadbeef (XEN) [2014-11-19 18:41:25.999] 0000010000000000 ffffffff810013aa 000000000000e033 0000000000000246 (XEN) [2014-11-19 18:41:25.999] ffff880059bfbe98 000000000000e02b 1862060042c8beef 224d41480704beef (XEN) [2014-11-19 18:41:25.999] 99171042639bbeef 74c88180108cbeef c0dc604c00000005 ffff83009ff8b000 (XEN) [2014-11-19 18:41:26.000] 00000034cebff280 ca836183a4020303 (XEN) [2014-11-19 18:41:26.000] Xen call trace: (XEN) [2014-11-19 18:41:26.000] [<ffff82d0801490ac>] dpci_softirq+0x9c/0x23d (XEN) [2014-11-19 18:41:26.000] [<ffff82d08012be31>] __do_softirq+0x81/0x8c (XEN) [2014-11-19 18:41:26.000] [<ffff82d08012be89>] do_softirq+0x13/0x15 (XEN) [2014-11-19 18:41:26.000] [<ffff82d0801633e5>] idle_loop+0x5e/0x6e (XEN) [2014-11-19 18:41:26.000] (XEN) [2014-11-19 18:41:26.778] (XEN) [2014-11-19 18:41:26.787] **************************************** (XEN) [2014-11-19 18:41:26.806] Panic on CPU 5: (XEN) [2014-11-19 18:41:26.819] GENERAL PROTECTION FAULT (XEN) [2014-11-19 18:41:26.834] [error_code=0000] (XEN) [2014-11-19 18:41:26.847] **************************************** (XEN) [2014-11-19 18:41:26.867] (XEN) [2014-11-19 18:41:26.876] Reboot in five seconds... (XEN) [2014-11-19 18:41:26.891] APIC error on CPU0: 00(08) (XEN) [2014-11-19 18:41:26.906] APIC error on CPU0: 08(08) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq. 2014-11-19 18:54 ` [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Sander Eikelenboom @ 2014-11-19 19:22 ` Andrew Cooper [not found] ` <20141119190131.GA15580@laptop.dumpdata.com> 1 sibling, 0 replies; 5+ messages in thread From: Andrew Cooper @ 2014-11-19 19:22 UTC (permalink / raw) To: Sander Eikelenboom, Konrad Rzeszutek Wilk; +Cc: xen-devel, JBeulich On 19/11/2014 18:54, Sander Eikelenboom wrote: > Wednesday, November 19, 2014, 6:31:39 PM, you wrote: > >> Hey, >> This patch should fix the issue that Sander had seen. The full details >> are in the patch itself. Sander, if you could - please test origin/staging >> with this patch to make sure it does fix the issue. > >> xen/drivers/passthrough/io.c | 27 +++++++++++++++++---------- >> Konrad Rzeszutek Wilk (1): >> dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. >> 1 file changed, 17 insertions(+), 10 deletions(-) > > Hi Konrad, > > Hmm just tested with a freshly cloned tree .. unfortunately it blew up again. > (i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq). > > (XEN) [2014-11-19 18:41:25.999] ----[ Xen-4.5.0-rc x86_64 debug=y Not tainted ]---- > (XEN) [2014-11-19 18:41:25.999] CPU: 5 > (XEN) [2014-11-19 18:41:25.999] RIP: e008:[<ffff82d0801490ac>] dpci_softirq+0x9c/0x23d > (XEN) [2014-11-19 18:41:25.999] RFLAGS: 0000000000010283 CONTEXT: hypervisor > (XEN) [2014-11-19 18:41:25.999] rax: 0100100100100100 rbx: ffff8303bb688d90 rcx: 0000000000000001 > (XEN) [2014-11-19 18:41:25.999] rdx: ffff83054ef18000 rsi: 0000000000000002 rdi: ffff83050b29e0b8 > (XEN) [2014-11-19 18:41:25.999] rbp: ffff83054ef1feb0 rsp: ffff83054ef1fe50 r8: ffff8303bb688d60 > (XEN) [2014-11-19 18:41:25.999] r9: 000001d5f62fff63 r10: 00000000deadbeef r11: 0000000000000246 > (XEN) [2014-11-19 18:41:25.999] r12: ffff8303bb688d38 r13: ffff83050b29e000 r14: ffff8303bb688d28 > (XEN) [2014-11-19 18:41:25.999] r15: ffff8303bb688d28 cr0: 000000008005003b cr4: 00000000000006f0 > (XEN) [2014-11-19 18:41:25.999] cr3: 000000050b2c7000 cr2: ffffffffff600400 > (XEN) [2014-11-19 18:41:25.999] ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) [2014-11-19 18:41:25.999] Xen stack trace from rsp=ffff83054ef1fe50: > (XEN) [2014-11-19 18:41:25.999] 0000000000000c23 ffff83050b29e0b8 ffff8303bb688d38 ffff83054ef1fe70 > (XEN) [2014-11-19 18:41:25.999] ffff8303bb688d90 ffff8303bb688d90 000000fb00000000 ffff82d080300200 > (XEN) [2014-11-19 18:41:25.999] ffff82d0802fff80 ffffffffffffffff ffff83054ef18000 0000000000000002 > (XEN) [2014-11-19 18:41:25.999] ffff83054ef1fee0 ffff82d08012be31 ffff83054ef18000 ffff83009fd2d000 > (XEN) [2014-11-19 18:41:25.999] 00000000ffffffff ffff83054ef28068 ffff83054ef1fef0 ffff82d08012be89 > (XEN) [2014-11-19 18:41:25.999] ffff83054ef1ff10 ffff82d0801633e5 ffff82d08012be89 ffff83009ff8b000 > (XEN) [2014-11-19 18:41:25.999] ffff83054ef1fde8 ffff880059bf8000 ffff880059bf8000 0000000000000000 > (XEN) [2014-11-19 18:41:25.999] 0000000000000000 ffff880059bfbeb0 ffffffff822f3ec0 0000000000000246 > (XEN) [2014-11-19 18:41:25.999] 0000000000000001 0000000000000000 0000000000000000 0000000000000000 > (XEN) [2014-11-19 18:41:25.999] ffffffff810013aa ffff880059bde480 00000000deadbeef 00000000deadbeef > (XEN) [2014-11-19 18:41:25.999] 0000010000000000 ffffffff810013aa 000000000000e033 0000000000000246 > (XEN) [2014-11-19 18:41:25.999] ffff880059bfbe98 000000000000e02b 1862060042c8beef 224d41480704beef > (XEN) [2014-11-19 18:41:25.999] 99171042639bbeef 74c88180108cbeef c0dc604c00000005 ffff83009ff8b000 > (XEN) [2014-11-19 18:41:26.000] 00000034cebff280 ca836183a4020303 > (XEN) [2014-11-19 18:41:26.000] Xen call trace: > (XEN) [2014-11-19 18:41:26.000] [<ffff82d0801490ac>] dpci_softirq+0x9c/0x23d > (XEN) [2014-11-19 18:41:26.000] [<ffff82d08012be31>] __do_softirq+0x81/0x8c > (XEN) [2014-11-19 18:41:26.000] [<ffff82d08012be89>] do_softirq+0x13/0x15 > (XEN) [2014-11-19 18:41:26.000] [<ffff82d0801633e5>] idle_loop+0x5e/0x6e > (XEN) [2014-11-19 18:41:26.000] > (XEN) [2014-11-19 18:41:26.778] > (XEN) [2014-11-19 18:41:26.787] **************************************** > (XEN) [2014-11-19 18:41:26.806] Panic on CPU 5: > (XEN) [2014-11-19 18:41:26.819] GENERAL PROTECTION FAULT > (XEN) [2014-11-19 18:41:26.834] [error_code=0000] > (XEN) [2014-11-19 18:41:26.847] **************************************** > (XEN) [2014-11-19 18:41:26.867] > (XEN) [2014-11-19 18:41:26.876] Reboot in five seconds... > (XEN) [2014-11-19 18:41:26.891] APIC error on CPU0: 00(08) > (XEN) [2014-11-19 18:41:26.906] APIC error on CPU0: 08(08) For the avoidance of any confusion, this is still LIST_POISON1 (see %rax), but now a #GP fault following c/s 404227138 (now with 100% less chance of dereferencing into guest-controlled virtual address space) ~Andrew ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20141119190131.GA15580@laptop.dumpdata.com>]
[parent not found: <1173563549.20141119201735@eikelenboom.it>]
* Re: [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq. [not found] ` <1173563549.20141119201735@eikelenboom.it> @ 2014-11-19 22:16 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 5+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-11-19 22:16 UTC (permalink / raw) To: Sander Eikelenboom, xen-devel, jbeulich, andrew.cooper3 On Wed, Nov 19, 2014 at 08:17:35PM +0100, Sander Eikelenboom wrote: > > Wednesday, November 19, 2014, 8:01:31 PM, you wrote: > > > On Wed, Nov 19, 2014 at 07:54:39PM +0100, Sander Eikelenboom wrote: > >> > >> Wednesday, November 19, 2014, 6:31:39 PM, you wrote: > >> > >> > Hey, > >> > >> > This patch should fix the issue that Sander had seen. The full details > >> > are in the patch itself. Sander, if you could - please test origin/staging > >> > with this patch to make sure it does fix the issue. > >> > >> > >> > xen/drivers/passthrough/io.c | 27 +++++++++++++++++---------- > >> > >> > Konrad Rzeszutek Wilk (1): > >> > dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. > >> > >> > 1 file changed, 17 insertions(+), 10 deletions(-) > >> > >> > >> Hi Konrad, > >> > >> Hmm just tested with a freshly cloned tree .. unfortunately it blew up again. > >> (i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq). > > > Argh. > > > Could you also try the first patch the STATE_ZOMBIE one? > > Building now .. (Attached and inline) Sander mentioned to me over IRC that with the STATE_ZOMBIE patch things work peachy for him. The patch in combination with the previous adds two extra paths: 1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared. 2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE. Lets follow the case without the zombie patch and with the zombie patch: w/o zombie: timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0. pirq_dpci is still on dpci_list. dpci_sofitrq while (!list_emptry(&our_list)) list_del, but has not yet done 'entry->next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry->next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. dpci_softirq while (!list_emtpry) list_del, but ->next already has LIST_POISON1 and we blow up. w/ zombie: timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE. pirq_dpci is still on dpci_list. dpci_sofitrq while (!list_emptry(&our_list)) list_del, but has not yet done 'entry->next = LIST_POISON1;' [interrupt happens] raise_softirq checks state, it is STATE_ZOMBIE so returns. [interrupt is done, back to dpci_softirq] finishes the entry->next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. So it seems that the STATE_ZOMBIE is needed, but for a different reason that Jan initially thought of: >From c89a97f695fda245f5fcb16ddb36d3df7f6f28b9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Fri, 14 Nov 2014 12:15:26 -0500 Subject: [PATCH] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq. When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform and cmpxch on the state to set it to zero. That is OK on the teardown paths as it is guarnateed that the do_IRQ action handler has been removed. Hence no more interrupts can be scheduled. But with the introduction of "dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked." we now utilize the pt_pirq_softirq_cancel when we want to cancel outstanding operations. However once we cancel them the do_IRQ is free to schedule them back in - even if said 'struct hvm_pirq_dpci' is still on the dpci_list. The code base before this patch could follow this race: \-timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(&our_list)) list_del, but has not yet done 'entry->next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry->next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. \- dpci_softirq while (!list_emtpry) list_del, but ->next already has LIST_POISON1 and we blow up. This patch in combination adds two extra paths: 1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared. 2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE. Using the example above the code-paths would be now: \- timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(&our_list)) list_del, but has not yet done 'entry->next = LIST_POISON1;' [interrupt happens] raise_softirq checks state, it is STATE_ZOMBIE so returns. [interrupt is done, back to dpci_softirq] finishes the entry->next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Conflicts: xen/drivers/passthrough/io.c --- xen/drivers/passthrough/io.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index 2039d31..1a26973 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -50,20 +50,26 @@ static DEFINE_PER_CPU(struct list_head, dpci_list); enum { STATE_SCHED, - STATE_RUN + STATE_RUN, + STATE_ZOMBIE }; /* * This can be called multiple times, but the softirq is only raised once. - * That is until the STATE_SCHED state has been cleared. The state can be - * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), + * That is until the STATE_SCHED and STATE_ZOMBIE state has been cleared. The + * STATE_SCHED and STATE_ZOMBIE state can be cleared by the 'dpci_softirq' * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before - * the softirq had a chance to run). + * (when it has executed 'hvm_dirq_assist'). The STATE_SCHED can be cleared + * by 'pt_pirq_softirq_cancel' (which will try to clear the state before the + * softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) { unsigned long flags; + if ( test_bit(STATE_ZOMBIE, &pirq_dpci->state) ) + return; + if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) return; @@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) */ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) { - if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) ) + if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE) ) ) return 1; /* @@ -111,7 +117,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, ASSERT(spin_is_locked(&d->event_lock)); - switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) ) + switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 1 << STATE_ZOMBIE ) ) { case (1 << STATE_SCHED): /* @@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, /* fallthrough. */ case (1 << STATE_RUN): case (1 << STATE_RUN) | (1 << STATE_SCHED): + case (1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE): /* * The reason it is OK to reset 'dom' when STATE_RUN bit is set is due * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' @@ -786,6 +793,7 @@ unlock: static void dpci_softirq(void) { unsigned int cpu = smp_processor_id(); + unsigned int reset = 0; LIST_HEAD(our_list); local_irq_disable(); @@ -812,7 +820,15 @@ static void dpci_softirq(void) hvm_dirq_assist(d, pirq_dpci); put_domain(d); } + else + reset = 1; + clear_bit(STATE_RUN, &pirq_dpci->state); + if ( reset ) + { + clear_bit(STATE_ZOMBIE, &pirq_dpci->state); + reset = 0; + } } } -- 1.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-11-19 22:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-19 17:31 [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Konrad Rzeszutek Wilk
2014-11-19 17:31 ` [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked Konrad Rzeszutek Wilk
2014-11-19 18:54 ` [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq Sander Eikelenboom
2014-11-19 19:22 ` Andrew Cooper
[not found] ` <20141119190131.GA15580@laptop.dumpdata.com>
[not found] ` <1173563549.20141119201735@eikelenboom.it>
2014-11-19 22:16 ` Konrad Rzeszutek Wilk
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.