[PATCH v2] xen: credit1: fix tickling when it happens from a remote pCPU

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] xen: credit1: fix tickling when it happens from a remote pCPU
@ 2015-09-25  7:46 Dario Faggioli
  2015-09-25  8:10 ` Dario Faggioli
  2015-09-29 13:47 ` George Dunlap
  0 siblings, 2 replies; 4+ messages in thread
From: Dario Faggioli @ 2015-09-25  7:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich

especially if that is also from a different cpupool than the
processor of the vCPU that triggered the tickling.

In fact, it is possible that we get as far as calling vcpu_unblock()-->
vcpu_wake()-->csched_vcpu_wake()-->__runq_tickle() for the vCPU 'vc',
but all while running on a pCPU that is different from 'vc->processor'.

For instance, this can happen when an HVM domain runs in a cpupool,
with a different scheduler than the default one, and issues IOREQs
to Dom0, running in Pool-0 with the default scheduler.
In fact, right in this case, the following crash can be observed:

(XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    7
(XEN) RIP:    e008:[<ffff82d0801230de>] __runq_tickle+0x18f/0x430
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor (d1v0)
(XEN) rax: 0000000000000001   rbx: ffff8303184fee00   rcx: 0000000000000000
(XEN) ... ... ...
(XEN) Xen stack trace from rsp=ffff83031fa57a08:
(XEN)    ffff82d0801fe664 ffff82d08033c820 0000000100000002 0000000a00000001
(XEN)    0000000000006831 0000000000000000 0000000000000000 0000000000000000
(XEN) ... ... ...
(XEN) Xen call trace:
(XEN)    [<ffff82d0801230de>] __runq_tickle+0x18f/0x430
(XEN)    [<ffff82d08012348a>] csched_vcpu_wake+0x10b/0x110
(XEN)    [<ffff82d08012b421>] vcpu_wake+0x20a/0x3ce
(XEN)    [<ffff82d08012b91c>] vcpu_unblock+0x4b/0x4e
(XEN)    [<ffff82d080167bd0>] vcpu_kick+0x17/0x61
(XEN)    [<ffff82d080167c46>] vcpu_mark_events_pending+0x2c/0x2f
(XEN)    [<ffff82d08010ac35>] evtchn_fifo_set_pending+0x381/0x3f6
(XEN)    [<ffff82d08010a0f6>] notify_via_xen_event_channel+0xc9/0xd6
(XEN)    [<ffff82d0801c29ed>] hvm_send_ioreq+0x3e9/0x441
(XEN)    [<ffff82d0801bba7d>] hvmemul_do_io+0x23f/0x2d2
(XEN)    [<ffff82d0801bbb43>] hvmemul_do_io_buffer+0x33/0x64
(XEN)    [<ffff82d0801bc92b>] hvmemul_do_pio_buffer+0x35/0x37
(XEN)    [<ffff82d0801cc49f>] handle_pio+0x58/0x14c
(XEN)    [<ffff82d0801eabcb>] vmx_vmexit_handler+0x16b3/0x1bea
(XEN)    [<ffff82d0801efd21>] vmx_asm_vmexit_handler+0x41/0xc0

In this case, pCPU 7 is not in Pool-0, while the (Dom0's) vCPU being
woken is. pCPU's 7 pool has a different scheduler than credit, but it
is, however, right from pCPU 7 that we are waking the Dom0's vCPUs.
Therefore, the current code tries to access csched_balance_mask for
pCPU 7, but that is not defined, and hence the Oops.

(Note that, in case the two pools run the same scheduler we see no
Oops, but things are still conceptually wrong.)

Cure things by making the csched_balance_mask macro accept a
parameter for fetching a specific pCPU's mask (instead than always
using smp_processor_id()).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes from v1:
 * get rid of the old macro and always use the new one,
   as suggested during review (Juergen)
---
 xen/common/sched_credit.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index a1945ac..57967c1 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -171,10 +171,10 @@ struct csched_pcpu {
  * Convenience macro for accessing the per-PCPU cpumask we need for
  * implementing the two steps (soft and hard affinity) balancing logic.
  * It is stored in csched_pcpu so that serialization is not an issue,
- * as there is a csched_pcpu for each PCPU and we always hold the
- * runqueue spin-lock when using this.
+ * as there is a csched_pcpu for each PCPU, and we always hold the
+ * runqueue lock for the proper PCPU when using this.
  */
-#define csched_balance_mask (CSCHED_PCPU(smp_processor_id())->balance_mask)
+#define csched_balance_mask(c) (CSCHED_PCPU(c)->balance_mask)
 
 /*
  * Virtual CPU
@@ -412,9 +412,10 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 
             /* Are there idlers suitable for new (for this balance step)? */
             csched_balance_cpumask(new->vcpu, balance_step,
-                                   csched_balance_mask);
-            cpumask_and(csched_balance_mask, csched_balance_mask, &idle_mask);
-            new_idlers_empty = cpumask_empty(csched_balance_mask);
+                                   csched_balance_mask(cpu));
+            cpumask_and(csched_balance_mask(cpu),
+                        csched_balance_mask(cpu), &idle_mask);
+            new_idlers_empty = cpumask_empty(csched_balance_mask(cpu));
 
             /*
              * Let's not be too harsh! If there aren't idlers suitable
@@ -1491,8 +1492,9 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
                  && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) )
                 continue;
 
-            csched_balance_cpumask(vc, balance_step, csched_balance_mask);
-            if ( __csched_vcpu_is_migrateable(vc, cpu, csched_balance_mask) )
+            csched_balance_cpumask(vc, balance_step, csched_balance_mask(cpu));
+            if ( __csched_vcpu_is_migrateable(vc, cpu,
+                                              csched_balance_mask(cpu)) )
             {
                 /* We got a candidate. Grab it! */
                 TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xen: credit1: fix tickling when it happens from a remote pCPU
  2015-09-25  7:46 [PATCH v2] xen: credit1: fix tickling when it happens from a remote pCPU Dario Faggioli
@ 2015-09-25  8:10 ` Dario Faggioli
  2015-09-25 11:05   ` Wei Liu
  2015-09-29 13:47 ` George Dunlap
  1 sibling, 1 reply; 4+ messages in thread
From: Dario Faggioli @ 2015-09-25  8:10 UTC (permalink / raw)
  To: Wei Liu; +Cc: Juergen Gross, George Dunlap, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2538 bytes --]

Hey Wei,

On Fri, 2015-09-25 at 09:46 +0200, Dario Faggioli wrote:

> For instance, this can happen when an HVM domain runs in a cpupool,
> with a different scheduler than the default one, and issues IOREQs
> to Dom0, running in Pool-0 with the default scheduler.
> In fact, right in this case, the following crash can be observed:

> (XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    7
> (XEN) RIP:    e008:[<ffff82d0801230de>] __runq_tickle+0x18f/0x430
> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor (d1v0)
> (XEN) rax: 0000000000000001   rbx: ffff8303184fee00   rcx:
> 0000000000000000
> (XEN) ... ... ...
> (XEN) Xen stack trace from rsp=ffff83031fa57a08:
> (XEN)    ffff82d0801fe664 ffff82d08033c820 0000000100000002
> 0000000a00000001
> (XEN)    0000000000006831 0000000000000000 0000000000000000
> 0000000000000000
> (XEN) ... ... ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801230de>] __runq_tickle+0x18f/0x430
> (XEN)    [<ffff82d08012348a>] csched_vcpu_wake+0x10b/0x110
> (XEN)    [<ffff82d08012b421>] vcpu_wake+0x20a/0x3ce
> (XEN)    [<ffff82d08012b91c>] vcpu_unblock+0x4b/0x4e
> (XEN)    [<ffff82d080167bd0>] vcpu_kick+0x17/0x61
> (XEN)    [<ffff82d080167c46>] vcpu_mark_events_pending+0x2c/0x2f
> (XEN)    [<ffff82d08010ac35>] evtchn_fifo_set_pending+0x381/0x3f6
> (XEN)    [<ffff82d08010a0f6>] notify_via_xen_event_channel+0xc9/0xd6
> (XEN)    [<ffff82d0801c29ed>] hvm_send_ioreq+0x3e9/0x441
> (XEN)    [<ffff82d0801bba7d>] hvmemul_do_io+0x23f/0x2d2
> (XEN)    [<ffff82d0801bbb43>] hvmemul_do_io_buffer+0x33/0x64
> (XEN)    [<ffff82d0801bc92b>] hvmemul_do_pio_buffer+0x35/0x37
> (XEN)    [<ffff82d0801cc49f>] handle_pio+0x58/0x14c
> (XEN)    [<ffff82d0801eabcb>] vmx_vmexit_handler+0x16b3/0x1bea
> (XEN)    [<ffff82d0801efd21>] vmx_asm_vmexit_handler+0x41/0xc0
> 
This patch is a bugfix, so I think it should be included in 4.6.

I'm able to trigger the above Oops pretty reliably by running an HVM
guest in a cpupool that has a scheduler different than Credit (by,
e.g., starting some load in the guest itself, or even just simply doing
`xl shutdown' on it).

It may not be the most common of the configurations, but I think it's
worth.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xen: credit1: fix tickling when it happens from a remote pCPU
  2015-09-25  8:10 ` Dario Faggioli
@ 2015-09-25 11:05   ` Wei Liu
  0 siblings, 0 replies; 4+ messages in thread
From: Wei Liu @ 2015-09-25 11:05 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, George Dunlap, Wei Liu, Jan Beulich, xen-devel

On Fri, Sep 25, 2015 at 10:10:45AM +0200, Dario Faggioli wrote:
> Hey Wei,
> 
> On Fri, 2015-09-25 at 09:46 +0200, Dario Faggioli wrote:
> 
> > For instance, this can happen when an HVM domain runs in a cpupool,
> > with a different scheduler than the default one, and issues IOREQs
> > to Dom0, running in Pool-0 with the default scheduler.
> > In fact, right in this case, the following crash can be observed:
> 
> > (XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
> > (XEN) CPU:    7
> > (XEN) RIP:    e008:[<ffff82d0801230de>] __runq_tickle+0x18f/0x430
> > (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor (d1v0)
> > (XEN) rax: 0000000000000001   rbx: ffff8303184fee00   rcx:
> > 0000000000000000
> > (XEN) ... ... ...
> > (XEN) Xen stack trace from rsp=ffff83031fa57a08:
> > (XEN)    ffff82d0801fe664 ffff82d08033c820 0000000100000002
> > 0000000a00000001
> > (XEN)    0000000000006831 0000000000000000 0000000000000000
> > 0000000000000000
> > (XEN) ... ... ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d0801230de>] __runq_tickle+0x18f/0x430
> > (XEN)    [<ffff82d08012348a>] csched_vcpu_wake+0x10b/0x110
> > (XEN)    [<ffff82d08012b421>] vcpu_wake+0x20a/0x3ce
> > (XEN)    [<ffff82d08012b91c>] vcpu_unblock+0x4b/0x4e
> > (XEN)    [<ffff82d080167bd0>] vcpu_kick+0x17/0x61
> > (XEN)    [<ffff82d080167c46>] vcpu_mark_events_pending+0x2c/0x2f
> > (XEN)    [<ffff82d08010ac35>] evtchn_fifo_set_pending+0x381/0x3f6
> > (XEN)    [<ffff82d08010a0f6>] notify_via_xen_event_channel+0xc9/0xd6
> > (XEN)    [<ffff82d0801c29ed>] hvm_send_ioreq+0x3e9/0x441
> > (XEN)    [<ffff82d0801bba7d>] hvmemul_do_io+0x23f/0x2d2
> > (XEN)    [<ffff82d0801bbb43>] hvmemul_do_io_buffer+0x33/0x64
> > (XEN)    [<ffff82d0801bc92b>] hvmemul_do_pio_buffer+0x35/0x37
> > (XEN)    [<ffff82d0801cc49f>] handle_pio+0x58/0x14c
> > (XEN)    [<ffff82d0801eabcb>] vmx_vmexit_handler+0x16b3/0x1bea
> > (XEN)    [<ffff82d0801efd21>] vmx_asm_vmexit_handler+0x41/0xc0
> > 
> This patch is a bugfix, so I think it should be included in 4.6.
> 
> I'm able to trigger the above Oops pretty reliably by running an HVM
> guest in a cpupool that has a scheduler different than Credit (by,
> e.g., starting some load in the guest itself, or even just simply doing
> `xl shutdown' on it).
> 
> It may not be the most common of the configurations, but I think it's
> worth.
> 

Release-acked-by: Wei Liu <wei.liu2@citrix.com>

> Thanks and Regards,
> Dario
> -- 
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xen: credit1: fix tickling when it happens from a remote pCPU
  2015-09-25  7:46 [PATCH v2] xen: credit1: fix tickling when it happens from a remote pCPU Dario Faggioli
  2015-09-25  8:10 ` Dario Faggioli
@ 2015-09-29 13:47 ` George Dunlap
  1 sibling, 0 replies; 4+ messages in thread
From: George Dunlap @ 2015-09-29 13:47 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich

On 25/09/15 08:46, Dario Faggioli wrote:
> especially if that is also from a different cpupool than the
> processor of the vCPU that triggered the tickling.
> 
> In fact, it is possible that we get as far as calling vcpu_unblock()-->
> vcpu_wake()-->csched_vcpu_wake()-->__runq_tickle() for the vCPU 'vc',
> but all while running on a pCPU that is different from 'vc->processor'.
> 
> For instance, this can happen when an HVM domain runs in a cpupool,
> with a different scheduler than the default one, and issues IOREQs
> to Dom0, running in Pool-0 with the default scheduler.
> In fact, right in this case, the following crash can be observed:
> 
> (XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    7
> (XEN) RIP:    e008:[<ffff82d0801230de>] __runq_tickle+0x18f/0x430
> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor (d1v0)
> (XEN) rax: 0000000000000001   rbx: ffff8303184fee00   rcx: 0000000000000000
> (XEN) ... ... ...
> (XEN) Xen stack trace from rsp=ffff83031fa57a08:
> (XEN)    ffff82d0801fe664 ffff82d08033c820 0000000100000002 0000000a00000001
> (XEN)    0000000000006831 0000000000000000 0000000000000000 0000000000000000
> (XEN) ... ... ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801230de>] __runq_tickle+0x18f/0x430
> (XEN)    [<ffff82d08012348a>] csched_vcpu_wake+0x10b/0x110
> (XEN)    [<ffff82d08012b421>] vcpu_wake+0x20a/0x3ce
> (XEN)    [<ffff82d08012b91c>] vcpu_unblock+0x4b/0x4e
> (XEN)    [<ffff82d080167bd0>] vcpu_kick+0x17/0x61
> (XEN)    [<ffff82d080167c46>] vcpu_mark_events_pending+0x2c/0x2f
> (XEN)    [<ffff82d08010ac35>] evtchn_fifo_set_pending+0x381/0x3f6
> (XEN)    [<ffff82d08010a0f6>] notify_via_xen_event_channel+0xc9/0xd6
> (XEN)    [<ffff82d0801c29ed>] hvm_send_ioreq+0x3e9/0x441
> (XEN)    [<ffff82d0801bba7d>] hvmemul_do_io+0x23f/0x2d2
> (XEN)    [<ffff82d0801bbb43>] hvmemul_do_io_buffer+0x33/0x64
> (XEN)    [<ffff82d0801bc92b>] hvmemul_do_pio_buffer+0x35/0x37
> (XEN)    [<ffff82d0801cc49f>] handle_pio+0x58/0x14c
> (XEN)    [<ffff82d0801eabcb>] vmx_vmexit_handler+0x16b3/0x1bea
> (XEN)    [<ffff82d0801efd21>] vmx_asm_vmexit_handler+0x41/0xc0
> 
> In this case, pCPU 7 is not in Pool-0, while the (Dom0's) vCPU being
> woken is. pCPU's 7 pool has a different scheduler than credit, but it
> is, however, right from pCPU 7 that we are waking the Dom0's vCPUs.
> Therefore, the current code tries to access csched_balance_mask for
> pCPU 7, but that is not defined, and hence the Oops.
> 
> (Note that, in case the two pools run the same scheduler we see no
> Oops, but things are still conceptually wrong.)
> 
> Cure things by making the csched_balance_mask macro accept a
> parameter for fetching a specific pCPU's mask (instead than always
> using smp_processor_id()).
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>

Looks good!

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-09-29 13:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-25  7:46 [PATCH v2] xen: credit1: fix tickling when it happens from a remote pCPU Dario Faggioli
2015-09-25  8:10 ` Dario Faggioli
2015-09-25 11:05   ` Wei Liu
2015-09-29 13:47 ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).