From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH 8/9] xen: sched: allow for choosing credit2
 runqueues configuration at boot
Date: Thu, 1 Oct 2015 09:46:41 +0200
Message-ID: <560CE4E1.2020405@suse.com>
References: <20150929164726.17589.96920.stgit@Solace.station>
	<20150929165625.17589.17838.stgit@Solace.station>
	<560CC91B.80308@suse.com> <1443684219.3276.175.camel@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <jgross@suse.com>) id 1ZhYZf-0006B5-Ei
	for xen-devel@lists.xenproject.org; Thu, 01 Oct 2015 07:46:43 +0000
In-Reply-To: <1443684219.3276.175.camel@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>, xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>, Uma Sharma <uma.sharma523@gmail.com>
List-Id: xen-devel@lists.xenproject.org

On 10/01/2015 09:23 AM, Dario Faggioli wrote:
> On Thu, 2015-10-01 at 07:48 +0200, Juergen Gross wrote:
>> On 09/29/2015 06:56 PM, Dario Faggioli wrote:
>>> In fact, credit2 uses CPU topology to decide how to arrange
>>> its internal runqueues. Before this change, only 'one runqueue
>>> per socket' was allowed. However, experiments have shown that,
>>> for instance, having one runqueue per physical core improves
>>> performance, especially in case hyperthreading is available.
>>>
>>> In general, it makes sense to allow users to pick one runqueue
>>> arrangement at boot time, so that:
>>>    - more experiments can be easily performed to even better
>>>      assess and improve performance;
>>>    - one can select the best configuration for his specific
>>>      use case and/or hardware.
>>>
>>> This patch enables the above.
>>>
>>> Note that, for correctly arranging runqueues to be per-core,
>>> just checking cpu_to_core() on the host CPUs is not enough.
>>> In fact, cores (and hyperthreads) on different sockets, can
>>> have the same core (and thread) IDs! We, therefore, need to
>>> check whether the full topology of two CPUs matches, for
>>> them to be put in the same runqueue.
>>>
>>> Note also that the default (although not functional) for
>>> credit2, since now, has been per-socket runqueue. This patch
>>> leaves things that way, to avoid mixing policy and technical
>>> changes.
>>
>> I think you should think about a way to make this parameter a per
>> cpupool one instead a system global one.
>>
> Believe it or not, I though about this already, and yes, it is in my
> plans to make this per-cpupool. However...
>
>> As this will require some
>> extra work regarding the tools interface I'd be absolutely fine with
>> adding this at a later time, but you should have that in mind when
>> setting this up now.
>>
> ...yes, that was phase II in my mind as well.
>
> So (sorry, but just to make sure I understand), since you said you're
> fine with it coming later, are you also fine with this patch, or do you
> think some adjustments are necessary, right here, right now, because of
> that future plan?

No, I'm fine.

>
>>> --- a/xen/common/sched_credit2.c
>>> +++ b/xen/common/sched_credit2.c
>>> @@ -82,10 +82,6 @@
>
>>> @@ -194,6 +190,41 @@ static int __read_mostly
>>> opt_overload_balance_tolerance = -3;
>>>    integer_param("credit2_balance_over",
>>> opt_overload_balance_tolerance);
>>>
>>>    /*
>>> + * Runqueue organization.
>>> + *
>>> + * The various cpus are to be assigned each one to a runqueue, and
>>> we
>>> + * want that to happen basing on topology. At the moment, it is
>>> possible
>>> + * to choose to arrange runqueues to be:
>>> + *
>>> + * - per-core: meaning that there will be one runqueue per each
>>> physical
>>> + *             core of the host. This will happen if the
>>> opt_runqueue
>>> + *             parameter is set to 'core';
>>> + *
>>> + * - per-socket: meaning that there will be one runqueue per each
>>> physical
>>> + *               socket (AKA package, which often, but not always,
>>> also
>>> + *               matches a NUMA node) of the host; This will
>>> happen if
>>> + *               the opt_runqueue parameter is set to 'socket';
>>
>> Wouldn't it be a nice idea to add "per-numa-node" as well?
>>
> I think it is.
>
>> This would make a difference for systems with:
>>
>> - multiple sockets per numa-node
>> - multiple numa-nodes per socket
>>
> Yep.
>
>> It might even be a good idea to be able to have only one runqueue in
>> small cpupools (again, this will apply only in case you have a per
>> cpupool setting instead a global one).
>>
> And I agree on this too.
>
> TBH, I had considered these too, and I was thinking to make them happen
> in phase II as well. However, they're simple enough to be implemented
> now (as in, in v2 of this series), so I think I'll do that.

Thanks.


Juergen