Re: [RFC 0/1] sched/fair: Consider asymmetric scheduler groups in load balancer

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Tobias Huschle <huschle@linux.ibm.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, sshegde@linux.vnet.ibm.com,
	srikar@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [RFC 0/1] sched/fair: Consider asymmetric scheduler groups in load balancer
Date: Tue, 04 Jul 2023 11:11:00 +0200	[thread overview]
Message-ID: <4c28b46b59bcc083956757074d1fe059@linux.ibm.com> (raw)
In-Reply-To: <26fe6dc1-33c5-b825-c019-b346e8bedc0a@arm.com>

On 2023-05-16 18:35, Dietmar Eggemann wrote:
> On 15/05/2023 13:46, Tobias Huschle wrote:
>> The current load balancer implementation implies that scheduler 
>> groups,
>> within the same scheduler domain, all host the same number of CPUs.
>> 
>> This appears to be valid for non-s390 architectures. Nevertheless, 
>> s390
>> can actually have scheduler groups of unequal size.
> 
> Arm (classical) big.Little had this for years before we switched to 
> flat
> scheduling (only MC sched domain) over CPU capacity boundaries for Arm
> DynamIQ.
> 
> Arm64 Juno platform in mainline:
> 
> root@juno:~# cat 
> /sys/devices/system/cpu/cpu*/topology/cluster_cpus_list
> 0,3-5
> 1-2
> 1-2
> 0,3-5
> 0,3-5
> 0,3-5
> 
> root@juno:~# cat /proc/schedstat | grep ^domain | awk '{print $1, $2}'
> 
> domain0 39 <--
> domain1 3f
> domain0 06 <--
> domain1 3f
> domain0 06
> domain1 3f
> domain0 39
> domain1 3f
> domain0 39
> domain1 3f
> domain0 39
> domain1 3f
> 
> root@juno:~# cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
> MC
> DIE
> 
> But we don't have SMT on the mobile processors.
> 
> It looks like you are only interested to get group_weight dependency
> into this 'prefer_sibling' condition of find_busiest_group()?
> 
Sorry, looks like your reply hit some bad filter of my mail program.
Let me answer, although it's a bit late.

Yes, I would like to get the group_weight into the prefer_sibling path.
Unfortunately, we cannot go for a flat hierarchy as the s390 hardware
allows to have CPUs to be pretty far apart (cache-wise), which means,
the load balancer should avoid to move tasks back and forth between
those CPUs if possible.

We can't remove SD_PREFER_SIBLING either, as this would cause the load
balancer to aim for having the same number of idle CPUs in all groups,
which is a problem as well in asymmetric groups, for example:

With SD_PREFER_SIBLING, aiming for same number of non-idle CPUs
00 01 02 03 04 05 06 07 08 09 10 11  || 12 13 14 15
                 x     x     x     x      x  x  x  x

Without SD_PREFER_SIBLING, aiming for the same number of idle CPUs
00 01 02 03 04 05 06 07 08 09 10 11  || 12 13 14 15
     x  x  x     x  x     x     x  x


Hence the idea to add the group_weight to the prefer_sibling path.

I was wondering if this would be the right place to address this issue
or if I should go down another route.

> We in (classical) big.LITTLE (sd flag SD_ASYM_CPUCAPACITY) remove
> SD_PREFER_SIBLING from sd->child so we don't run this condition.
> 
>> The current scheduler behavior causes some s390 configs to use SMT
>> while some cores are still idle, leading to a performance degredation
>> under certain levels of workload.
>> 
>> Please refer to the patch's commit message for more details and an
>> example. This patch is a proposal on how to integrate the size of
>> scheduler groups into the decision process.
>> 
>> This patch is the most basic approach to address this issue and does
>> not claim to be perfect as-is.
>> 
>> Other ideas that also proved to address the problem but are more
>> complex but also potentially more precise:
>>   1. On scheduler group building, count the number of CPUs within each
>>      group that are first in their sibling mask. This represents the
>>      number of CPUs that can be used before running into SMT. This
>>      should be slightly more accurate than using the full group weight
>>      if the number of available SMT threads per core varies.
>>   2. Introduce a new scheduler group classification (smt_busy) in
>>      between of fully_busy and has_spare. This classification would
>>      indicate that a group still has spare capacity, but will run
>>      into SMT when using that capacity. This would make the load
>>      balancer prefer groups with fully idle CPUs over ones that are
>>      about to run into SMT.
>> 
>> Feedback would be greatly appreciated.
>> 
>> Tobias Huschle (1):
>>   sched/fair: Consider asymmetric scheduler groups in load balancer
>> 
>>  kernel/sched/fair.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>

next prev parent reply	other threads:[~2023-07-04  9:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 11:46 [RFC 0/1] sched/fair: Consider asymmetric scheduler groups in load balancer Tobias Huschle
2023-05-15 11:46 ` [RFC 1/1] " Tobias Huschle
2023-05-16 13:36   ` Vincent Guittot
2023-06-05  8:07     ` Tobias Huschle
2023-07-05  7:52       ` Vincent Guittot
2023-07-07  7:44         ` Tobias Huschle
2023-07-07 14:33           ` Shrikanth Hegde
2023-07-07 15:59             ` Tobias Huschle
2023-07-07 16:26               ` Shrikanth Hegde
2023-07-04 13:40   ` Peter Zijlstra
2023-07-07  7:44     ` Tobias Huschle
2023-07-06 17:19   ` Shrikanth Hegde
2023-07-07  7:45     ` Tobias Huschle
2023-05-16 16:35 ` [RFC 0/1] " Dietmar Eggemann
2023-07-04  9:11   ` Tobias Huschle [this message]
2023-07-06 11:11     ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c28b46b59bcc083956757074d1fe059@linux.ibm.com \
    --to=huschle@linux.ibm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=sshegde@linux.vnet.ibm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox