All of lore.kernel.org
 help / color / mirror / Atom feed
From: tip-bot for Suresh Siddha <suresh.b.siddha@intel.com>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@redhat.com,
	a.p.zijlstra@chello.nl, suresh.b.siddha@intel.com,
	tglx@linutronix.de, mingo@elte.hu
Subject: [tip:sched/core] sched: Fix select_idle_sibling() logic in select_task_rq_fair()
Date: Fri, 23 Apr 2010 10:50:33 GMT	[thread overview]
Message-ID: <tip-99bd5e2f245d8cd17d040c82d40becdb3efd9b69@git.kernel.org> (raw)
In-Reply-To: <1270079265.7835.8.camel@sbs-t61.sc.intel.com>

Commit-ID:  99bd5e2f245d8cd17d040c82d40becdb3efd9b69
Gitweb:     http://git.kernel.org/tip/99bd5e2f245d8cd17d040c82d40becdb3efd9b69
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Wed, 31 Mar 2010 16:47:45 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 23 Apr 2010 11:02:02 +0200

sched: Fix select_idle_sibling() logic in select_task_rq_fair()

Issues in the current select_idle_sibling() logic in select_task_rq_fair()
in the context of a task wake-up:

a) Once we select the idle sibling, we use that domain (spanning the cpu that
   the task is currently woken-up and the idle sibling that we found) in our
   wake_affine() decisions. This domain is completely different from the
   domain(we are supposed to use) that spans the cpu that the task currently
   woken-up and the cpu where the task previously ran.

b) We do select_idle_sibling() check only for the cpu that the task is
   currently woken-up on. If select_task_rq_fair() selects the previously run
   cpu for waking the task, doing a select_idle_sibling() check
   for that cpu also helps and we don't do this currently.

c) In the scenarios where the cpu that the task is woken-up is busy but
   with its HT siblings are idle, we are selecting the task be woken-up
   on the idle HT sibling instead of a core that it previously ran
   and currently completely idle. i.e., we are not taking decisions based on
   wake_affine() but directly selecting an idle sibling that can cause
   an imbalance at the SMT/MC level which will be later corrected by the
   periodic load balancer.

Fix this by first going through the load imbalance calculations using
wake_affine() and once we make a decision of woken-up cpu vs previously-ran cpu,
then choose a possible idle sibling for waking up the task on.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1270079265.7835.8.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |   82 +++++++++++++++++++++++++--------------------------
 1 files changed, 40 insertions(+), 42 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 0a413c7..cbd8b8a 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1375,29 +1375,48 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 /*
  * Try and locate an idle CPU in the sched_domain.
  */
-static int
-select_idle_sibling(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_sibling(struct task_struct *p, int target)
 {
 	int cpu = smp_processor_id();
 	int prev_cpu = task_cpu(p);
+	struct sched_domain *sd;
 	int i;
 
 	/*
-	 * If this domain spans both cpu and prev_cpu (see the SD_WAKE_AFFINE
-	 * test in select_task_rq_fair) and the prev_cpu is idle then that's
-	 * always a better target than the current cpu.
+	 * If the task is going to be woken-up on this cpu and if it is
+	 * already idle, then it is the right target.
+	 */
+	if (target == cpu && idle_cpu(cpu))
+		return cpu;
+
+	/*
+	 * If the task is going to be woken-up on the cpu where it previously
+	 * ran and if it is currently idle, then it the right target.
 	 */
-	if (target == cpu && !cpu_rq(prev_cpu)->cfs.nr_running)
+	if (target == prev_cpu && idle_cpu(prev_cpu))
 		return prev_cpu;
 
 	/*
-	 * Otherwise, iterate the domain and find an elegible idle cpu.
+	 * Otherwise, iterate the domains and find an elegible idle cpu.
 	 */
-	for_each_cpu_and(i, sched_domain_span(sd), &p->cpus_allowed) {
-		if (!cpu_rq(i)->cfs.nr_running) {
-			target = i;
+	for_each_domain(target, sd) {
+		if (!(sd->flags & SD_SHARE_PKG_RESOURCES))
 			break;
+
+		for_each_cpu_and(i, sched_domain_span(sd), &p->cpus_allowed) {
+			if (idle_cpu(i)) {
+				target = i;
+				break;
+			}
 		}
+
+		/*
+		 * Lets stop looking for an idle sibling when we reached
+		 * the domain that spans the current cpu and prev_cpu.
+		 */
+		if (cpumask_test_cpu(cpu, sched_domain_span(sd)) &&
+		    cpumask_test_cpu(prev_cpu, sched_domain_span(sd)))
+			break;
 	}
 
 	return target;
@@ -1421,7 +1440,7 @@ select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_
 	int cpu = smp_processor_id();
 	int prev_cpu = task_cpu(p);
 	int new_cpu = cpu;
-	int want_affine = 0, cpu_idle = !current->pid;
+	int want_affine = 0;
 	int want_sd = 1;
 	int sync = wake_flags & WF_SYNC;
 
@@ -1460,36 +1479,13 @@ select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_
 		}
 
 		/*
-		 * While iterating the domains looking for a spanning
-		 * WAKE_AFFINE domain, adjust the affine target to any idle cpu
-		 * in cache sharing domains along the way.
+		 * If both cpu and prev_cpu are part of this domain,
+		 * cpu is a valid SD_WAKE_AFFINE target.
 		 */
-		if (want_affine) {
-			int target = -1;
-
-			/*
-			 * If both cpu and prev_cpu are part of this domain,
-			 * cpu is a valid SD_WAKE_AFFINE target.
-			 */
-			if (cpumask_test_cpu(prev_cpu, sched_domain_span(tmp)))
-				target = cpu;
-
-			/*
-			 * If there's an idle sibling in this domain, make that
-			 * the wake_affine target instead of the current cpu.
-			 */
-			if (!cpu_idle && tmp->flags & SD_SHARE_PKG_RESOURCES)
-				target = select_idle_sibling(p, tmp, target);
-
-			if (target >= 0) {
-				if (tmp->flags & SD_WAKE_AFFINE) {
-					affine_sd = tmp;
-					want_affine = 0;
-					if (target != cpu)
-						cpu_idle = 1;
-				}
-				cpu = target;
-			}
+		if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
+		    cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
+			affine_sd = tmp;
+			want_affine = 0;
 		}
 
 		if (!want_sd && !want_affine)
@@ -1520,8 +1516,10 @@ select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_
 #endif
 
 	if (affine_sd) {
-		if (cpu_idle || cpu == prev_cpu || wake_affine(affine_sd, p, sync))
-			return cpu;
+		if (cpu == prev_cpu || wake_affine(affine_sd, p, sync))
+			return select_idle_sibling(p, cpu);
+		else
+			return select_idle_sibling(p, prev_cpu);
 	}
 
 	while (sd) {

      parent reply	other threads:[~2010-04-23 10:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08 22:19 [patch v2 1/2] sched: check for prev_cpu == this_cpu before calling wake_affine() Suresh Siddha
2010-03-08 22:19 ` [patch v2 2/2] sched: fix select_idle_sibling() logic in select_task_rq_fair() Suresh Siddha
2010-03-31 10:25 ` [patch v2 1/2] sched: check for prev_cpu == this_cpu before calling wake_affine() Peter Zijlstra
2010-03-31 23:47   ` Suresh Siddha
2010-04-01  5:32     ` Mike Galbraith
2010-04-01 21:04       ` Suresh Siddha
2010-04-02  6:20         ` Mike Galbraith
2010-04-02 17:05           ` Suresh Siddha
2010-04-02 19:43             ` Mike Galbraith
2010-04-14 20:45           ` Suresh Siddha
2010-04-15  5:17             ` Mike Galbraith
2010-04-20  8:46     ` Peter Zijlstra
2010-04-20  8:55       ` Peter Zijlstra
2010-04-20 17:03         ` Suresh Siddha
2010-04-23 10:50     ` tip-bot for Suresh Siddha [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-99bd5e2f245d8cd17d040c82d40becdb3efd9b69@git.kernel.org \
    --to=suresh.b.siddha@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.