From: Vincent Guittot <vincent.guittot@linaro.org>
To: Tobias Huschle <huschle@linux.ibm.com>
Cc: juri.lelli@redhat.com, vschneid@redhat.com,
srikar@linux.vnet.ibm.com, peterz@infradead.org,
sshegde@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, rostedt@goodmis.org,
bsegall@google.com, mingo@redhat.com, mgorman@suse.de,
bristot@redhat.com, dietmar.eggemann@arm.com
Subject: Re: [RFC 1/1] sched/fair: Consider asymmetric scheduler groups in load balancer
Date: Tue, 16 May 2023 15:36:19 +0200 [thread overview]
Message-ID: <CAKfTPtC9050oY2EikUTAXTL8pAui3L+Sr4DBS0T-TccGNaA2hw@mail.gmail.com> (raw)
In-Reply-To: <20230515114601.12737-2-huschle@linux.ibm.com>
On Mon, 15 May 2023 at 13:46, Tobias Huschle <huschle@linux.ibm.com> wrote:
>
> The current load balancer implementation implies that scheduler groups,
> within the same domain, all host the same number of CPUs. This is
> reflected in the condition, that a scheduler group, which is load
> balancing and classified as having spare capacity, should pull work
> from the busiest group, if the local group runs less processes than
> the busiest one. This implies that these two groups should run the
> same number of processes, which is problematic if the groups are not
> of the same size.
>
> The assumption that scheduler groups within the same scheduler domain
> host the same number of CPUs appears to be true for non-s390
> architectures. Nevertheless, s390 can have scheduler groups of unequal
> size.
>
> This introduces a performance degredation in the following scenario:
>
> Consider a system with 8 CPUs, 6 CPUs are located on one CPU socket,
> the remaining 2 are located on another socket:
>
> Socket -----1----- -2-
> CPU 1 2 3 4 5 6 7 8
>
> Placing some workload ( x = one task ) yields the following
> scenarios:
>
> The first 5 tasks are distributed evenly across the two groups.
>
> Socket -----1----- -2-
> CPU 1 2 3 4 5 6 7 8
> x x x x x
>
> Adding a 6th task yields the following distribution:
>
> Socket -----1----- -2-
> CPU 1 2 3 4 5 6 7 8
> SMT1 x x x x x
> SMT2 x
Your description is a bit confusing for me. What you name CPU above
should be named Core, doesn' it ?
Could you share with us your scheduler topology ?
>
> The task is added to the 2nd scheduler group, as the scheduler has the
> assumption that scheduler groups are of the same size, so they should
> also host the same number of tasks. This makes CPU 7 run into SMT
> thread, which comes with a performance penalty. This means, that in
> the window of 6-8 tasks, load balancing is done suboptimally, because
> SMT is used although there is no reason to do so as fully idle CPUs
> are still available.
>
> Taking the weight of the scheduler groups into account, ensures that
> a load balancing CPU within a smaller group will not try to pull tasks
> from a bigger group while the bigger group still has idle CPUs
> available.
>
> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
> ---
> kernel/sched/fair.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 48b6f0ca13ac..b1307d7e4065 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10426,7 +10426,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> * group's child domain.
> */
> if (sds.prefer_sibling && local->group_type == group_has_spare &&
> - busiest->sum_nr_running > local->sum_nr_running + 1)
> + busiest->sum_nr_running * local->group_weight >
> + local->sum_nr_running * busiest->group_weight + 1)
This is the prefer_sibling path. Could it be that you should disable
prefer_siling between your sockets for such topology ? the default
path compares the number of idle CPUs when groups has spare capacity
> goto force_balance;
>
> if (busiest->group_type != group_overloaded) {
> --
> 2.34.1
>
next prev parent reply other threads:[~2023-05-16 13:37 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-15 11:46 [RFC 0/1] sched/fair: Consider asymmetric scheduler groups in load balancer Tobias Huschle
2023-05-15 11:46 ` [RFC 1/1] " Tobias Huschle
2023-05-16 13:36 ` Vincent Guittot [this message]
2023-06-05 8:07 ` Tobias Huschle
2023-07-05 7:52 ` Vincent Guittot
2023-07-07 7:44 ` Tobias Huschle
2023-07-07 14:33 ` Shrikanth Hegde
2023-07-07 15:59 ` Tobias Huschle
2023-07-07 16:26 ` Shrikanth Hegde
2023-07-04 13:40 ` Peter Zijlstra
2023-07-07 7:44 ` Tobias Huschle
2023-07-06 17:19 ` Shrikanth Hegde
2023-07-07 7:45 ` Tobias Huschle
2023-05-16 16:35 ` [RFC 0/1] " Dietmar Eggemann
2023-07-04 9:11 ` Tobias Huschle
2023-07-06 11:11 ` Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKfTPtC9050oY2EikUTAXTL8pAui3L+Sr4DBS0T-TccGNaA2hw@mail.gmail.com \
--to=vincent.guittot@linaro.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=huschle@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=sshegde@linux.vnet.ibm.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).