From: Rik van Riel <riel@redhat.com>
To: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
linaro-kernel@lists.linaro.org
Subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
Date: Thu, 2 Oct 2014 13:15:48 -0400 [thread overview]
Message-ID: <20141002131548.6cd377d5@cuia.bos.redhat.com> (raw)
In-Reply-To: <alpine.LFD.2.11.1409301904150.5311@knanqh.ubzr>
On Tue, 30 Sep 2014 19:15:00 -0400 (EDT)
Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 30 Sep 2014, Rik van Riel wrote:
> > The main thing it does not cover is already running tasks that
> > get woken up again, since select_idle_sibling() covers everything
> > except for newly forked and newly executed tasks.
>
> True. Now that you bring this up, I remember that Peter mentioned it as
> well.
>
> > I am looking at adding similar logic to select_idle_sibling()
>
> OK thanks.
This patch is ugly. I have not bothered cleaning it up, because it
causes a regression with hackbench. Apparently for hackbench (and
potentially other sync wakeups), locality is more important than
idleness.
We may need to add a third clause before the search, something
along the lines of, to ensure target gets selected if neither
target or i are idle and the wakeup is synchronous...
if (sync_wakeup && cpu_of(target)->nr_running == 1)
return target;
I still need to run tests with other workloads, too.
Another consideration is that search costs with this patch
are potentially much increased. I suspect we may want to simply
propagate the load on each sched_group up the tree hierarchically,
with delta accounting and propagating the info upwards only when
the delta is significant, like done in __update_tg_runnable_avg.
---8<---
Subject: sched,idle: teach select_idle_sibling about idle states
Change select_idle_sibling to take cpu idle exit latency into
account. First preference is to select the cpu with the lowest
exit latency from a completely idle sched_group inside the CPU;
if that is not available, we pick the CPU with the lowest exit
latency in any sched_group.
This increases the total search time of select_idle_sibling,
we may want to look into propagating load info up the sched_group
tree in some way. That information would also be useful to prevent
the wake_affine logic from causing a load imbalance between
sched_groups.
It is not clear when locality (from staying on the old CPU) beats
a lower idle exit latency. Having information on whether the CPU
drops content from the CPU caches in certain idle states would
help with that, but with multiple CPUs bound together in the same
physical CPU core, the hardware often does not do what we tell it,
anyway...
Signed-off-by: Rik van Riel <riel@redhat.com>
---
kernel/sched/fair.c | 47 +++++++++++++++++++++++++++++++++++++++++------
1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 10a5a28..12540cd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4465,41 +4465,76 @@ static int select_idle_sibling(struct task_struct *p, int target)
{
struct sched_domain *sd;
struct sched_group *sg;
+ unsigned int min_exit_latency_thread = UINT_MAX;
+ unsigned int min_exit_latency_core = UINT_MAX;
+ int shallowest_idle_thread = -1;
+ int shallowest_idle_core = -1;
int i = task_cpu(p);
+ /* target always has some code running and is not in an idle state */
if (idle_cpu(target))
return target;
/*
* If the prevous cpu is cache affine and idle, don't be stupid.
+ * XXX: does i's exit latency exceed sysctl_sched_migration_cost?
*/
if (i != target && cpus_share_cache(i, target) && idle_cpu(i))
return i;
/*
* Otherwise, iterate the domains and find an elegible idle cpu.
+ * First preference is finding a totally idle core with a thread
+ * in a shallow idle state; second preference is whatever idle
+ * thread has the shallowest idle state anywhere.
*/
sd = rcu_dereference(per_cpu(sd_llc, target));
for_each_lower_domain(sd) {
sg = sd->groups;
do {
+ unsigned int min_sg_exit_latency = UINT_MAX;
+ int shallowest_sg_idle_thread = -1;
+ bool all_idle = true;
+
if (!cpumask_intersects(sched_group_cpus(sg),
tsk_cpus_allowed(p)))
goto next;
for_each_cpu(i, sched_group_cpus(sg)) {
- if (i == target || !idle_cpu(i))
- goto next;
+ struct rq *rq;
+ struct cpuidle_state *idle;
+
+ if (i == target || !idle_cpu(i)) {
+ all_idle = false;
+ continue;
+ }
+
+ rq = cpu_rq(i);
+ idle = idle_get_state(rq);
+
+ if (idle && idle->exit_latency < min_sg_exit_latency) {
+ min_sg_exit_latency = idle->exit_latency;
+ shallowest_sg_idle_thread = i;
+ }
+ }
+
+ if (all_idle && min_sg_exit_latency < min_exit_latency_core) {
+ shallowest_idle_core = shallowest_sg_idle_thread;
+ min_exit_latency_core = min_sg_exit_latency;
+ } else if (min_sg_exit_latency < min_exit_latency_thread) {
+ shallowest_idle_thread = shallowest_sg_idle_thread;
+ min_exit_latency_thread = min_sg_exit_latency;
}
- target = cpumask_first_and(sched_group_cpus(sg),
- tsk_cpus_allowed(p));
- goto done;
next:
sg = sg->next;
} while (sg != sd->groups);
}
-done:
+ if (shallowest_idle_core >= 0)
+ target = shallowest_idle_core;
+ else if (shallowest_idle_thread >= 0)
+ target = shallowest_idle_thread;
+
return target;
}
next prev parent reply other threads:[~2014-10-02 17:18 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-04 15:32 [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
2014-09-04 15:32 ` [PATCH v2 1/2] sched: let the scheduler see CPU idle states Nicolas Pitre
2014-09-18 17:37 ` Paul E. McKenney
2014-09-18 17:39 ` Paul E. McKenney
2014-09-18 23:15 ` Peter Zijlstra
2014-09-18 18:32 ` Nicolas Pitre
2014-09-18 23:17 ` Peter Zijlstra
2014-09-18 23:28 ` Peter Zijlstra
2014-09-19 18:30 ` Nicolas Pitre
2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
2014-09-05 7:52 ` Daniel Lezcano
2014-09-18 23:46 ` Peter Zijlstra
2014-09-19 0:05 ` Peter Zijlstra
2014-09-19 4:49 ` Yao Dongdong
2014-09-30 21:58 ` Rik van Riel
2014-09-30 23:15 ` Nicolas Pitre
2014-10-02 17:15 ` Rik van Riel [this message]
2014-10-03 6:04 ` [PATCH RFC] sched,idle: teach select_idle_sibling about idle states Mike Galbraith
2014-10-03 6:23 ` Mike Galbraith
2014-10-03 7:50 ` Peter Zijlstra
2014-10-03 13:05 ` Mike Galbraith
2014-10-03 14:28 ` Rik van Riel
2014-10-03 14:46 ` Peter Zijlstra
2014-10-03 15:37 ` Rik van Riel
2014-10-09 16:04 ` Peter Zijlstra
2014-10-03 18:52 ` Nicolas Pitre
2014-09-10 21:35 ` [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
2014-09-10 22:50 ` Rafael J. Wysocki
2014-09-10 23:25 ` Nicolas Pitre
2014-09-10 23:28 ` Nicolas Pitre
2014-09-10 23:50 ` Rafael J. Wysocki
2014-09-18 0:39 ` Nicolas Pitre
2014-09-18 23:24 ` Peter Zijlstra
2014-09-19 18:22 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141002131548.6cd377d5@cuia.bos.redhat.com \
--to=riel@redhat.com \
--cc=daniel.lezcano@linaro.org \
--cc=linaro-kernel@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nicolas.pitre@linaro.org \
--cc=peterz@infradead.org \
--cc=rjw@rjwysocki.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).