From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Li, Aubrey" <aubrey.li@linux.intel.com>,
mingo@redhat.com, juri.lelli@redhat.com,
vincent.guittot@linaro.org, valentin.schneider@arm.com,
qais.yousef@arm.com, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com,
tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org,
Mel Gorman <mgorman@suse.de>, Jiang Biao <benbjiang@gmail.com>
Subject: Re: [RFC PATCH v7] sched/fair: select idle cpu from idle cpumask for task wakeup
Date: Fri, 11 Dec 2020 20:43:37 +0000 [thread overview]
Message-ID: <20201211204337.GX3371@techsingularity.net> (raw)
In-Reply-To: <20201211174442.GU3040@hirez.programming.kicks-ass.net>
On Fri, Dec 11, 2020 at 06:44:42PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 10, 2020 at 12:58:33PM +0000, Mel Gorman wrote:
> > The prequisite patch to make that approach work was rejected though
> > as on its own, it's not very helpful and Vincent didn't like that the
> > load_balance_mask was abused to make it effective.
>
> So last time I poked at all this, I found that using more masks was a
> performance issue as well due to added cache misses.
>
> Anyway, I finally found some time to look at all this, and while reading
> over the whole SiS crud to refresh the brain came up with the below...
>
> It's still naf, but at least it gets rid of a bunch of duplicate
> scanning and LoC decreases, so it should be awesome. Ofcourse, as
> always, benchmarks will totally ruin everything, we'll see, I started
> some.
>
> It goes on top of Mel's first two patches (which is about as far as I
> got)...
>
It's not free of bugs but it's interesting! The positive aspects are that
it does a single scan, inits select_idle_mask once and caches an idle
candidate when searching for an idle core but in a far cleaner fashion
than what I did.
One bug is in __select_idle_core() though. It's scanning the SMT mask,
not select_idle_mask so it can return an idle candidate that is not in
p->cpus_ptr.
I queued tests anyway to see what it looks like.
There are some other potential caveats.
This is a single pass so when test_idle_cores() is true, __select_idle_core
is used to to check all the siblings even if the core is not idle. That
could have been cut short if __select_idle_core checked *idle_cpu ==
1 and terminated the SMT scan if an idle candidate had already been found.
Second downside is related. If test_idle_cpus() was true but no idle
CPU is found then __select_idle_core has been called enough to scan
the entire domain. In this corner case, the new code does *more* work
because the old code would have failed select_idle_core() quickly and
then select_idle_cpu() would be throttled by SIS_PROP. I think this will
only be noticable in the heavily overloaded case but if the corner case
hits enough then the new code will be slower than the old code for the
over-saturated case (i.e. hackbench with lots of groups).
The third potential downside is that the SMT sibling is not guaranteed to
be checked due to SIS_PROP throttling but in the old code, that would have
been checked by select_idle_smt(). That might result in premature stacking
of runnable tasks on the same CPU. Similarly, as __select_idle_core may
find multiple idle candidates, it will not pick the targets SMT sibling
if it is idle like select_idle_smt would have.
That said, I am skeptical that select_idle_smt() matters all that often.
If select_idle_cpu() has been throttled heavily then the machine is
either over-saturated or was over-saturated in the very recent pass. In
that situation, the chances of the SMT siblings being free is slim.
Each of the downsides could potentially addressed with this untested
mess
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 637f610c7059..260592d143d8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6061,7 +6061,7 @@ void __update_idle_core(struct rq *rq)
rcu_read_unlock();
}
-static int __select_idle_core(int core, struct cpumask *cpus, int *idle_cpu)
+static int __select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
{
bool idle = true;
int cpu;
@@ -6070,9 +6070,11 @@ static int __select_idle_core(int core, struct cpumask *cpus, int *idle_cpu)
schedstat_inc(this_rq()->sis_scanned);
if (!available_idle_cpu(cpu)) {
idle = false;
- continue;
+ if (*idle_cpu == -1)
+ continue;
+ break;
}
- if (idle_cpu)
+ if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr))
*idle_cpu = cpu;
}
@@ -6094,7 +6096,7 @@ static inline bool test_idle_cores(int cpu, bool def)
return def;
}
-static inline int __select_idle_core(int core, struct cpumask *cpus, int *idle_cpu)
+static inline int __select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
{
return -1;
}
@@ -6142,7 +6144,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
for_each_cpu_wrap(cpu, cpus, target) {
if (smt) {
- i = __select_idle_core(cpu, cpus, &idle_cpu);
+ i = __select_idle_core(p, cpu, cpus, &idle_cpu);
if ((unsigned)i < nr_cpumask_bits)
return i;
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2020-12-11 21:38 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-09 6:24 [RFC PATCH v7] sched/fair: select idle cpu from idle cpumask for task wakeup Aubrey Li
2020-12-09 8:15 ` Vincent Guittot
2020-12-09 10:58 ` Li, Aubrey
2020-12-09 13:09 ` Vincent Guittot
2020-12-09 14:53 ` Li, Aubrey
2020-12-09 14:36 ` Mel Gorman
2020-12-10 8:23 ` Li, Aubrey
2020-12-10 11:34 ` Mel Gorman
2020-12-10 12:21 ` Li, Aubrey
2020-12-10 12:58 ` Mel Gorman
2020-12-11 17:44 ` Peter Zijlstra
2020-12-11 20:43 ` Mel Gorman [this message]
2020-12-11 22:19 ` Peter Zijlstra
2020-12-11 22:50 ` Mel Gorman
2020-12-14 8:11 ` Vincent Guittot
2020-12-14 9:31 ` Peter Zijlstra
2020-12-14 12:36 ` Mel Gorman
2020-12-14 15:01 ` Peter Zijlstra
2020-12-14 9:32 ` Peter Zijlstra
2020-12-14 9:18 ` Vincent Guittot
2020-12-14 12:42 ` Mel Gorman
2020-12-14 7:53 ` Li, Aubrey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201211204337.GX3371@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=aubrey.li@linux.intel.com \
--cc=benbjiang@gmail.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=qais.yousef@arm.com \
--cc=rostedt@goodmis.org \
--cc=tim.c.chen@linux.intel.com \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).