From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>,
Mel Gorman <mgorman@techsingularity.net>,
Rik van Riel <riel@surriel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Valentin Schneider <valentin.schneider@arm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Michael Neuling <mikey@neuling.org>,
Gautham R Shenoy <ego@linux.vnet.ibm.com>,
Parth Shah <parth@linux.ibm.com>
Subject: Re: [PATCH] sched/fair: Prefer idle CPU to cache affinity
Date: Wed, 10 Mar 2021 11:22:41 +0530 [thread overview]
Message-ID: <20210310055241.GO2028034@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKfTPtA2XSmqt1L2X9WvdtdA5eqNYuhSws8jDOr1HA1xqXWfDQ@mail.gmail.com>
* Vincent Guittot <vincent.guittot@linaro.org> [2021-03-08 14:52:39]:
> On Fri, 26 Feb 2021 at 17:41, Srikar Dronamraju
> <srikar@linux.vnet.ibm.com> wrote:
> >
Thanks Vincent for your review comments.
> > +static int prefer_idler_llc(int this_cpu, int prev_cpu, int sync)
> > +{
> > + struct sched_domain_shared *tsds, *psds;
> > + int pnr_busy, pllc_size, tnr_busy, tllc_size, diff;
> > +
> > + tsds = rcu_dereference(per_cpu(sd_llc_shared, this_cpu));
> > + tnr_busy = atomic_read(&tsds->nr_busy_cpus);
> > + tllc_size = per_cpu(sd_llc_size, this_cpu);
> > +
> > + psds = rcu_dereference(per_cpu(sd_llc_shared, prev_cpu));
> > + pnr_busy = atomic_read(&psds->nr_busy_cpus);
> > + pllc_size = per_cpu(sd_llc_size, prev_cpu);
> > +
> > + /* No need to compare, if both LLCs are fully loaded */
> > + if (pnr_busy == pllc_size && tnr_busy == pllc_size)
> > + return nr_cpumask_bits;
> > +
> > + if (sched_feat(WA_WAKER) && tnr_busy < tllc_size)
> > + return this_cpu;
>
> Why have you chosen to favor this_cpu instead of prev_cpu unlike for wake_idle ?
At this point, we know the waker running on this_cpu and wakee which was
running on prev_cpu are affine to each other and this_cpu and prev_cpu dont
share cache. I chose to move them close to each other to benefit from the
cache sharing. Based on feedback from Peter and Rik, I made the check more
conservative i.e tnr_busy <= tllc_size/smt_weight (where smt_weight is the
cpumask weight of smt domain for this_cpu) i.e if we have a free core in
this llc domain, chose this_cpu. select_idle_sibling() should pick an idle
cpu/core/smt within the llc domain for this_cpu.
Do you feel, this may not be the correct option?
We are also experimenting with another option, were we call prefer_idler_cpu
after wa_weight. I.e
1. if wake_affine_weight choses this_cpu but llc in prev_cpu has an idle
smt/CPU but there are no idle smt/CPU in this_cpu, then chose idle smt/CPU
in prev_cpu
2. if wake_affine_weight choses nr_cpumask(aka prev_cpu) but llc in this_cpu
has an idle smt/CPU but there are no idle smt/CPU in prev_cpu, then chose
idle smt/CPU in this_cpu
> > +
> > + /* For better wakeup latency, prefer idler LLC to cache affinity */
> > + diff = tnr_busy * pllc_size - sync - pnr_busy * tllc_size;
> > + if (!diff)
> > + return nr_cpumask_bits;
> > + if (diff < 0)
> > + return this_cpu;
> > +
> > + return prev_cpu;
> > +}
> > +
> > static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> > int this_cpu, int prev_cpu, int sync)
> > {
> > @@ -5877,6 +5907,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> > if (sched_feat(WA_IDLE))
> > target = wake_affine_idle(this_cpu, prev_cpu, sync);
> >
> > + if (sched_feat(WA_IDLER_LLC) && target == nr_cpumask_bits &&
> > + !cpus_share_cache(this_cpu, prev_cpu))
> > + target = prefer_idler_llc(this_cpu, prev_cpu, sync);
>
> could you use the same naming convention as others function ?
> wake_affine_llc as an example
I guess you meant s/prefer_idler_llc/wake_affine_llc/
Sure. I can modify.
>
> > +
> > if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits)
> > target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
> >
> > @@ -5884,8 +5918,11 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> > if (target == nr_cpumask_bits)
> > return prev_cpu;
> >
> > - schedstat_inc(sd->ttwu_move_affine);
> > - schedstat_inc(p->se.statistics.nr_wakeups_affine);
> > + if (target == this_cpu) {
>
> How is this condition related to $subject ?
Before this change, wake_affine_weight and wake_affine_idle would either
return this_cpu or nr_cpumask_bits. Just before this check, we check if
target is nr_cpumask_bits and return prev_cpu. So the stats were only
incremented when target was this_cpu.
However with prefer_idler_llc, we may return this_cpu, prev_cpu or
nr_cpumask_bits. Now we only to update stats when we have chosen to migrate
the task to this_cpu. Hence I had this check.
If we use the slightly lazier approach which is check for wa_weight first
before wa_idler_llc, then we may not need this change at all.
--
Thanks and Regards
Srikar Dronamraju
next prev parent reply other threads:[~2021-03-10 5:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-26 16:40 [PATCH] sched/fair: Prefer idle CPU to cache affinity Srikar Dronamraju
2021-02-27 19:56 ` Rik van Riel
2021-03-01 13:37 ` Srikar Dronamraju
2021-03-01 15:44 ` Peter Zijlstra
2021-03-01 17:06 ` Srikar Dronamraju
2021-03-01 17:18 ` Peter Zijlstra
2021-03-02 7:39 ` Srikar Dronamraju
2021-03-02 9:10 ` Peter Zijlstra
2021-03-02 10:05 ` Srikar Dronamraju
2021-03-01 15:40 ` Peter Zijlstra
2021-03-01 17:08 ` Srikar Dronamraju
2021-03-02 9:53 ` Dietmar Eggemann
2021-03-02 10:04 ` Srikar Dronamraju
2021-03-08 13:52 ` Vincent Guittot
2021-03-10 5:52 ` Srikar Dronamraju [this message]
2021-03-10 15:37 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210310055241.GO2028034@linux.vnet.ibm.com \
--to=srikar@linux.vnet.ibm.com \
--cc=dietmar.eggemann@arm.com \
--cc=ego@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mikey@neuling.org \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=parth@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=tglx@linutronix.de \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox