From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH] xen: credit1: fix tickling when it happens
 from a remote pCPU
Date: Thu, 24 Sep 2015 08:44:56 +0200
Message-ID: <56039BE8.1070209@suse.com>
References: <20150924043127.10326.62756.stgit@Solace.station>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <jgross@suse.com>) id 1Zf0H8-00083Y-48
	for xen-devel@lists.xenproject.org; Thu, 24 Sep 2015 06:45:02 +0000
In-Reply-To: <20150924043127.10326.62756.stgit@Solace.station>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>, xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>
List-Id: xen-devel@lists.xenproject.org

On 09/24/2015 06:31 AM, Dario Faggioli wrote:
> especially if that is also from a different cpupool than the
> processor of the vCPU that triggered the tickling.
>
> In fact, it is possible that we get as far as calling vcpu_unblock()-->
> vcpu_wake()-->csched_vcpu_wake()-->__runq_tickle() for the vCPU 'vc',
> but all while running on a pCPU that is different from 'vc->processor'.
>
> For instance, this can happen when an HVM domain runs in a cpupool,
> with a different scheduler than the default one, and issues IOREQs
> to Dom0, running in Pool-0 with the default scheduler.
> In fact, right in this case, the following crash can be observed:
>
> (XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    7
> (XEN) RIP:    e008:[<ffff82d0801230de>] __runq_tickle+0x18f/0x430
> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor (d1v0)
> (XEN) rax: 0000000000000001   rbx: ffff8303184fee00   rcx: 0000000000000000
> (XEN) ... ... ...
> (XEN) Xen stack trace from rsp=ffff83031fa57a08:
> (XEN)    ffff82d0801fe664 ffff82d08033c820 0000000100000002 0000000a00000001
> (XEN)    0000000000006831 0000000000000000 0000000000000000 0000000000000000
> (XEN) ... ... ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801230de>] __runq_tickle+0x18f/0x430
> (XEN)    [<ffff82d08012348a>] csched_vcpu_wake+0x10b/0x110
> (XEN)    [<ffff82d08012b421>] vcpu_wake+0x20a/0x3ce
> (XEN)    [<ffff82d08012b91c>] vcpu_unblock+0x4b/0x4e
> (XEN)    [<ffff82d080167bd0>] vcpu_kick+0x17/0x61
> (XEN)    [<ffff82d080167c46>] vcpu_mark_events_pending+0x2c/0x2f
> (XEN)    [<ffff82d08010ac35>] evtchn_fifo_set_pending+0x381/0x3f6
> (XEN)    [<ffff82d08010a0f6>] notify_via_xen_event_channel+0xc9/0xd6
> (XEN)    [<ffff82d0801c29ed>] hvm_send_ioreq+0x3e9/0x441
> (XEN)    [<ffff82d0801bba7d>] hvmemul_do_io+0x23f/0x2d2
> (XEN)    [<ffff82d0801bbb43>] hvmemul_do_io_buffer+0x33/0x64
> (XEN)    [<ffff82d0801bc92b>] hvmemul_do_pio_buffer+0x35/0x37
> (XEN)    [<ffff82d0801cc49f>] handle_pio+0x58/0x14c
> (XEN)    [<ffff82d0801eabcb>] vmx_vmexit_handler+0x16b3/0x1bea
> (XEN)    [<ffff82d0801efd21>] vmx_asm_vmexit_handler+0x41/0xc0
>
> In this case, pCPU 7 is not in Pool-0, while the (Dom0's) vCPU being
> woken is. pCPU's 7 pool has a different scheduler than credit, but it
> is, however, right from pCPU 7 that we are waking the Dom0's vCPUs.
> Therefore, the current code tries to access csched_balance_mask for
> pCPU 7, but that is not defined, and hence the Oops.
>
> (Note that, in case the two pools run the same scheduler we see no
> Oops, but things are still conceptually wrong.)
>
> Cure things by providing a second macro allowing to fetch the scratch
> mask of a specific pCPU (instead than always using smp_processor_id()),
> and use that one in __runq_tickle(), with such pCPU equal to the
> processor of the vCPU being woken.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: Juergen Gross <jgross@suse.com>

regardless whether you address my suggestion below or not.

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>   xen/common/sched_credit.c |   15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index a1945ac..f58c3d9 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -168,14 +168,16 @@ struct csched_pcpu {
>   };
>
>   /*
> - * Convenience macro for accessing the per-PCPU cpumask we need for
> + * Convenience macros for accessing the per-PCPU cpumask we need for
>    * implementing the two steps (soft and hard affinity) balancing logic.
>    * It is stored in csched_pcpu so that serialization is not an issue,
> - * as there is a csched_pcpu for each PCPU and we always hold the
> - * runqueue spin-lock when using this.
> + * as there is a csched_pcpu for each PCPU, and we always hold the
> + * runqueue lock for the proper PCPU when using these.
>    */
>   #define csched_balance_mask (CSCHED_PCPU(smp_processor_id())->balance_mask)

Wouldn't it make sense to get rid of that macro? After your patch it is
used only in csched_runq_steal() which is called with cpu being always
smp_processor_id(). You'd eliminate a possible source of future error
and avoid multiple evaluation of smp_processor_id().


Juergen

>
> +#define csched_balance_mask_cpu(c) (CSCHED_PCPU(c)->balance_mask)
> +
>   /*
>    * Virtual CPU
>    */
> @@ -412,9 +414,10 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>
>               /* Are there idlers suitable for new (for this balance step)? */
>               csched_balance_cpumask(new->vcpu, balance_step,
> -                                   csched_balance_mask);
> -            cpumask_and(csched_balance_mask, csched_balance_mask, &idle_mask);
> -            new_idlers_empty = cpumask_empty(csched_balance_mask);
> +                                   csched_balance_mask_cpu(cpu));
> +            cpumask_and(csched_balance_mask_cpu(cpu),
> +                        csched_balance_mask_cpu(cpu), &idle_mask);
> +            new_idlers_empty = cpumask_empty(csched_balance_mask_cpu(cpu));
>
>               /*
>                * Let's not be too harsh! If there aren't idlers suitable
>
>