From: Tim Chen <tim.c.chen@linux.intel.com>
To: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>
Subject: Re: [PATCH] sched/fair: Enable group_asym_packing in find_idlest_group
Date: Tue, 09 Jan 2024 16:58:27 -0800 [thread overview]
Message-ID: <a100b38341e13afbb5f8753b731c9e469e704667.camel@linux.intel.com> (raw)
In-Reply-To: <ea049b25-ba49-4790-8b79-05078adbfc77@linux.vnet.ibm.com>
On Thu, 2024-01-04 at 21:20 +0530, Shrikanth Hegde wrote:
> On 10/18/23 9:20 PM, Srikar Dronamraju wrote:
>
> Hi Srikar,
>
> > Current scheduler code doesn't handle SD_ASYM_PACKING in the
> > find_idlest_cpu path. On few architectures, like Powerpc, cache is at a
> > core. Moving threads across cores may end up in cache misses.
> >
> > While asym_packing can be enabled above SMT level, enabling Asym packing
> > across cores could result in poorer performance due to cache misses.
> > However if the initial task placement via find_idlest_cpu does take
> > Asym_packing into consideration, then scheduler can avoid asym_packing
> > migrations. This will result in lesser migrations and better packing and
> > better overall performance.
> >
>
> This would handle asym packing case when finding the idle CPU for newly woken
> up task and thereby reducing the number of migrations if it is placed correctly in
> the first place. I think thats helpful.
>
> Currently intel cluster and powerVM shared LPAR's are the two where ASYM PACKING
> is enabled at higher domain than SMT. Is that correct or is there any other topology?
>
> +tim
>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> > kernel/sched/fair.c | 33 ++++++++++++++++++++++++++++++---
> > 1 file changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index cb225921bbca..7164f79a3d13 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9931,11 +9931,13 @@ static int idle_cpu_without(int cpu, struct task_struct *p)
> > * @group: sched_group whose statistics are to be updated.
> > * @sgs: variable to hold the statistics for this group.
> > * @p: The task for which we look for the idlest group/CPU.
> > + * @this_cpu: current cpu
> > */
> > static inline void update_sg_wakeup_stats(struct sched_domain *sd,
> > struct sched_group *group,
> > struct sg_lb_stats *sgs,
> > - struct task_struct *p)
> > + struct task_struct *p,
> > + int this_cpu)
> > {
> > int i, nr_running;
> >
> > @@ -9972,6 +9974,11 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
> >
> > }
> >
> > + if (sd->flags & SD_ASYM_PACKING && sgs->sum_h_nr_running &&
> > + sched_asym_prefer(group->asym_prefer_cpu, this_cpu)) {
> > + sgs->group_asym_packing = 1;
I disagree with the above criteria for doing asym_packing.
I think asym packing only makes sense if you have an idle CPU availabe
in the group that is preferred over this_cpu, and you have fewer
tasks than CPU. Using group->asym_prefer_cpu
is inappropriate as that most preferred CPU may be busy.
You should be migrating task from this_cpu to that highest
priority idle_cpu identified
If the group is fully busy or overloaded, we should stick with the original
logic of picking the most lightly loaded group and not use asym_packing.
You may want to note down the idle CPU in the group with highest priority,
or most preferred if there are more than 1 cpu in the group to compare
between two idle groups that have idle CPUs.
Tim
> > + }
> > +
>
>
> I think there is a corner case here which could be taken care. please correct me if i
> am wrong.
>
> Assume there are four sched groups, sg1, sg2, sg3 and sg4. asym packing is enabled at sd.
> sg1, and sg3 have one task each and a new task is being created. So find_idlest_cpu is
> called for this new task.
>
> Because of sgs->sum_h_nr_running check, sg1 and sg3 will have group_asym_packing, while
> sg2 and sg4 will have group_has_spare. update_pick_idlest will choose the lowest type.
> so group_has_spare. TIE would be between sg2 and sg4. Because of asym packing (atleast true
> for powerpc shared LPAR case) sg4 will have lower utilization compared to sg2, and hence sg4
> will be given as the idlest_cpu. On the next load balance sg2 will pull task from sg4 due to
> asym packing.
>
> Additional migration may be avoided if we omit the sum_h_nr_running check?
>
>
> > sgs->group_capacity = group->sgc->capacity;
> >
> > sgs->group_weight = group->group_weight;
> > @@ -10012,8 +10019,17 @@ static bool update_pick_idlest(struct sched_group *idlest,
> > return false;
> > break;
> >
> > - case group_imbalanced:
> > case group_asym_packing:
> > + if (sched_asym_prefer(group->asym_prefer_cpu, idlest->asym_prefer_cpu)) {
> > + int busy_cpus = idlest_sgs->group_weight - idlest_sgs->idle_cpus;
> > +
> > + busy_cpus -= (sgs->group_weight - sgs->idle_cpus);
> > + if (busy_cpus >= 0)
> > + return true;
>
>
> wouldn't using idle_cpus would be simpler? something like,
>
> if (sgs->idle_cpus - idlest->idle_cpus > 0)
> return true
>
> > + }
> > + return false;
> > +
> > + case group_imbalanced:
> > case group_smt_balance:
> > /* Those types are not used in the slow wakeup path */
> > return false;
> > @@ -10080,7 +10096,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
> > sgs = &tmp_sgs;
> > }
> >
> > - update_sg_wakeup_stats(sd, group, sgs, p);
> > + update_sg_wakeup_stats(sd, group, sgs, p, this_cpu);
> >
> > if (!local_group && update_pick_idlest(idlest, &idlest_sgs, group, sgs)) {
> > idlest = group;
> > @@ -10112,6 +10128,17 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
> > if (local_sgs.group_type > idlest_sgs.group_type)
> > return idlest;
> >
> > + if (idlest_sgs.group_type == group_asym_packing) {
> > + if (sched_asym_prefer(idlest->asym_prefer_cpu, local->asym_prefer_cpu)) {
> > + int busy_cpus = local_sgs.group_weight - local_sgs.idle_cpus;
> > +
> > + busy_cpus -= (idlest_sgs.group_weight - idlest_sgs.idle_cpus);
> > + if (busy_cpus >= 0)
> > + return idlest;
> > + }
> > + return NULL;
> > + }
>
> same comment of using idle_cpus
>
> > +
> > switch (local_sgs.group_type) {
> > case group_overloaded:
> > case group_fully_busy:
>
prev parent reply other threads:[~2024-01-10 0:58 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-18 15:50 [PATCH] sched/fair: Enable group_asym_packing in find_idlest_group Srikar Dronamraju
2023-12-15 4:10 ` Srikar Dronamraju
2024-01-04 15:50 ` Shrikanth Hegde
2024-01-10 0:58 ` Tim Chen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a100b38341e13afbb5f8753b731c9e469e704667.camel@linux.intel.com \
--to=tim.c.chen@linux.intel.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=sshegde@linux.vnet.ibm.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox