From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH 8/9] xen: sched: allow for choosing credit2
 runqueues configuration at boot
Date: Thu, 1 Oct 2015 07:48:11 +0200
Message-ID: <560CC91B.80308@suse.com>
References: <20150929164726.17589.96920.stgit@Solace.station>
	<20150929165625.17589.17838.stgit@Solace.station>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <jgross@suse.com>) id 1ZhWj1-00021C-9E
	for xen-devel@lists.xenproject.org; Thu, 01 Oct 2015 05:48:15 +0000
In-Reply-To: <20150929165625.17589.17838.stgit@Solace.station>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>, xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>, Uma Sharma <uma.sharma523@gmail.com>
List-Id: xen-devel@lists.xenproject.org

On 09/29/2015 06:56 PM, Dario Faggioli wrote:
> In fact, credit2 uses CPU topology to decide how to arrange
> its internal runqueues. Before this change, only 'one runqueue
> per socket' was allowed. However, experiments have shown that,
> for instance, having one runqueue per physical core improves
> performance, especially in case hyperthreading is available.
>
> In general, it makes sense to allow users to pick one runqueue
> arrangement at boot time, so that:
>   - more experiments can be easily performed to even better
>     assess and improve performance;
>   - one can select the best configuration for his specific
>     use case and/or hardware.
>
> This patch enables the above.
>
> Note that, for correctly arranging runqueues to be per-core,
> just checking cpu_to_core() on the host CPUs is not enough.
> In fact, cores (and hyperthreads) on different sockets, can
> have the same core (and thread) IDs! We, therefore, need to
> check whether the full topology of two CPUs matches, for
> them to be put in the same runqueue.
>
> Note also that the default (although not functional) for
> credit2, since now, has been per-socket runqueue. This patch
> leaves things that way, to avoid mixing policy and technical
> changes.

I think you should think about a way to make this parameter a per
cpupool one instead a system global one. As this will require some
extra work regarding the tools interface I'd be absolutely fine with
adding this at a later time, but you should have that in mind when
setting this up now.

>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Uma Sharma <uma.sharma523@gmail.com>
> ---
>   docs/misc/xen-command-line.markdown |   11 +++++++
>   xen/common/sched_credit2.c          |   57 ++++++++++++++++++++++++++++++++---
>   2 files changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index a2e427c..71315b8 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -467,6 +467,17 @@ combination with the `low_crashinfo` command line option.
>   ### credit2\_load\_window\_shift
>   > `= <integer>`
>
> +### credit2\_runqueue
> +> `= socket | core`
> +
> +> Default: `socket`
> +
> +Specify how host CPUs are arranged in runqueues. Runqueues are kept
> +balanced with respect to the load generated by the vCPUs running on
> +them. Smaller runqueues (as in with `core`) means more accurate load
> +balancing (for instance, it will deal better with hyperthreading),
> +but also more overhead.
> +
>   ### dbgp
>   > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 38f382e..025626f 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -82,10 +82,6 @@
>    * Credits are "reset" when the next vcpu in the runqueue is less than
>    * or equal to zero.  At that point, everyone's credits are "clipped"
>    * to a small value, and a fixed credit is added to everyone.
> - *
> - * The plan is for all cores that share an L2 will share the same
> - * runqueue.  At the moment, there is one global runqueue for all
> - * cores.
>    */
>
>   /*
> @@ -194,6 +190,41 @@ static int __read_mostly opt_overload_balance_tolerance = -3;
>   integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>
>   /*
> + * Runqueue organization.
> + *
> + * The various cpus are to be assigned each one to a runqueue, and we
> + * want that to happen basing on topology. At the moment, it is possible
> + * to choose to arrange runqueues to be:
> + *
> + * - per-core: meaning that there will be one runqueue per each physical
> + *             core of the host. This will happen if the opt_runqueue
> + *             parameter is set to 'core';
> + *
> + * - per-socket: meaning that there will be one runqueue per each physical
> + *               socket (AKA package, which often, but not always, also
> + *               matches a NUMA node) of the host; This will happen if
> + *               the opt_runqueue parameter is set to 'socket';

Wouldn't it be a nice idea to add "per-numa-node" as well?

This would make a difference for systems with:

- multiple sockets per numa-node
- multiple numa-nodes per socket

It might even be a good idea to be able to have only one runqueue in
small cpupools (again, this will apply only in case you have a per
cpupool setting instead a global one).


Juergen