All of lore.kernel.org
 help / color / mirror / Atom feed
From: tip-bot for Mel Gorman <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: matt@codeblueprint.co.uk, hpa@zytor.com,
	torvalds@linux-foundation.org, mingo@kernel.org,
	linux-kernel@vger.kernel.org, peterz@infradead.org,
	efault@gmx.de, mgorman@techsingularity.net, tglx@linutronix.de
Subject: [tip:sched/urgent] sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS
Date: Tue, 6 Feb 2018 03:56:54 -0800	[thread overview]
Message-ID: <tip-32e839dda3ba576943365f0f5817ce5c843137dc@git.kernel.org> (raw)
In-Reply-To: <20180130104555.4125-5-mgorman@techsingularity.net>

Commit-ID:  32e839dda3ba576943365f0f5817ce5c843137dc
Gitweb:     https://git.kernel.org/tip/32e839dda3ba576943365f0f5817ce5c843137dc
Author:     Mel Gorman <mgorman@techsingularity.net>
AuthorDate: Tue, 30 Jan 2018 10:45:55 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 6 Feb 2018 10:20:37 +0100

sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS

The select_idle_sibling() (SIS) rewrite in commit:

  10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings()")

... replaced a domain iteration with a search that broadly speaking
does a wrapped walk of the scheduler domain sharing a last-level-cache.

While this had a number of improvements, one consequence is that two tasks
that share a waker/wakee relationship push each other around a socket. Even
though two tasks may be active, all cores are evenly used. This is great from
a search perspective and spreads a load across individual cores, but it has
adverse consequences for cpufreq. As each CPU has relatively low utilisation,
cpufreq may decide the utilisation is too low to used a higher P-state and
overall computation throughput suffers.

While individual cpufreq and cpuidle drivers may compensate by artifically
boosting P-state (at c0) or avoiding lower C-states (during idle), it does
not help if hardware-based cpufreq (e.g. HWP) is used.

This patch tracks a recently used CPU based on what CPU a task was running
on when it last was a waker a CPU it was recently using when a task is a
wakee. During SIS, the recently used CPU is used as a target if it's still
allowed by the task and is idle.

The benefit may be non-obvious so consider an example of two tasks
communicating back and forth. Task A may be an application doing IO where
task B is a kworker or kthread like journald. Task A may issue IO, wake
B and B wakes up A on completion.  With the existing scheme this may look
like the following (potentially different IDs if SMT is in use but similar
principal applies).

 A (cpu 0)	wake	B (wakes on cpu 1)
 B (cpu 1)	wake	A (wakes on cpu 2)
 A (cpu 2)	wake	B (wakes on cpu 3)
 etc.

A careful reader may wonder why CPU 0 was not idle when B wakes A the
first time and it's simply due to the fact that A can be rescheduled to
another CPU and the pattern is that prev == target when B tries to wakeup A
and the information about CPU 0 has been lost.

With this patch, the pattern is more likely to be:

 A (cpu 0)	wake	B (wakes on cpu 1)
 B (cpu 1)	wake	A (wakes on cpu 0)
 A (cpu 0)	wake	B (wakes on cpu 1)
 etc

i.e. two communicating casts are more likely to use just two cores instead
of all available cores sharing a LLC.

The most dramatic speedup was noticed on dbench using the XFS filesystem on
UMA as clients interact heavily with workqueues in that configuration. Note
that a similar speedup is not observed on ext4 as the wakeup pattern
is different:

                          4.15.0-rc9             4.15.0-rc9
                           waprev-v1        biasancestor-v1
 Hmean      1      287.54 (   0.00%)      817.01 ( 184.14%)
 Hmean      2     1268.12 (   0.00%)     1781.24 (  40.46%)
 Hmean      4     1739.68 (   0.00%)     1594.47 (  -8.35%)
 Hmean      8     2464.12 (   0.00%)     2479.56 (   0.63%)
 Hmean     64     1455.57 (   0.00%)     1434.68 (  -1.44%)

The results can be less dramatic on NUMA where automatic balancing interferes
with the test. It's also known that network benchmarks running on localhost
also benefit quite a bit from this patch (roughly 10% on netperf RR for UDP
and TCP depending on the machine). Hackbench also seens small improvements
(6-11% depending on machine and thread count). The facebook schbench was also
tested but in most cases showed little or no different to wakeup latencies.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180130104555.4125-5-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h |  8 ++++++++
 kernel/sched/core.c   |  1 +
 kernel/sched/fair.c   | 22 ++++++++++++++++++++--
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 166144c..92744e3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -555,6 +555,14 @@ struct task_struct {
 	unsigned long			wakee_flip_decay_ts;
 	struct task_struct		*last_wakee;
 
+	/*
+	 * recent_used_cpu is initially set as the last CPU used by a task
+	 * that wakes affine another task. Waker/wakee relationships can
+	 * push tasks around a CPU where each wakeup moves to the next one.
+	 * Tracking a recently used CPU allows a quick search for a recently
+	 * used CPU that may be idle.
+	 */
+	int				recent_used_cpu;
 	int				wake_cpu;
 #endif
 	int				on_rq;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b40540e..36f113a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2461,6 +2461,7 @@ void wake_up_new_task(struct task_struct *p)
 	 * Use __set_task_cpu() to avoid calling sched_class::migrate_task_rq,
 	 * as we're not fully set-up yet.
 	 */
+	p->recent_used_cpu = task_cpu(p);
 	__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
 #endif
 	rq = __task_rq_lock(p, &rf);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index db45b35..5eb3ffc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6197,7 +6197,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 static int select_idle_sibling(struct task_struct *p, int prev, int target)
 {
 	struct sched_domain *sd;
-	int i;
+	int i, recent_used_cpu;
 
 	if (idle_cpu(target))
 		return target;
@@ -6208,6 +6208,21 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (prev != target && cpus_share_cache(prev, target) && idle_cpu(prev))
 		return prev;
 
+	/* Check a recently used CPU as a potential idle candidate */
+	recent_used_cpu = p->recent_used_cpu;
+	if (recent_used_cpu != prev &&
+	    recent_used_cpu != target &&
+	    cpus_share_cache(recent_used_cpu, target) &&
+	    idle_cpu(recent_used_cpu) &&
+	    cpumask_test_cpu(p->recent_used_cpu, &p->cpus_allowed)) {
+		/*
+		 * Replace recent_used_cpu with prev as it is a potential
+		 * candidate for the next wake.
+		 */
+		p->recent_used_cpu = prev;
+		return recent_used_cpu;
+	}
+
 	sd = rcu_dereference(per_cpu(sd_llc, target));
 	if (!sd)
 		return target;
@@ -6375,9 +6390,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 
 	if (!sd) {
 pick_cpu:
-		if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
+		if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
 			new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
 
+			if (want_affine)
+				current->recent_used_cpu = cpu;
+		}
 	} else {
 		new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
 	}

      parent reply	other threads:[~2018-02-06 12:01 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-30 10:45 [PATCH 0/4] Reduce migrations and unnecessary spreading of load to multiple CPUs Mel Gorman
2018-01-30 10:45 ` [PATCH 1/4] sched/fair: Remove unnecessary parameters from wake_affine_idle Mel Gorman
2018-02-06 11:55   ` [tip:sched/urgent] sched/fair: Remove unnecessary parameters from wake_affine_idle() tip-bot for Mel Gorman
2018-01-30 10:45 ` [PATCH 2/4] sched/fair: Restructure wake_affine to return a CPU id Mel Gorman
2018-02-06 11:56   ` [tip:sched/urgent] sched/fair: Restructure wake_affine*() " tip-bot for Mel Gorman
2018-01-30 10:45 ` [PATCH 3/4] sched/fair: Do not migrate if the prev_cpu is idle Mel Gorman
2018-02-06 11:56   ` [tip:sched/urgent] " tip-bot for Mel Gorman
2018-01-30 10:45 ` [PATCH 4/4] sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS Mel Gorman
2018-01-30 11:50   ` Peter Zijlstra
2018-01-30 12:57     ` Mel Gorman
2018-01-30 13:15       ` Peter Zijlstra
2018-01-30 13:25         ` Mel Gorman
2018-01-30 13:40           ` Peter Zijlstra
2018-01-30 14:06             ` Mel Gorman
2018-01-31  9:22         ` Rafael J. Wysocki
2018-01-31 10:17           ` Peter Zijlstra
2018-01-31 11:54             ` Mel Gorman
2018-01-31 17:44             ` Srinivas Pandruvada
2018-02-01  9:11               ` Peter Zijlstra
2018-02-01  7:50             ` Rafael J. Wysocki
2018-02-01  9:11               ` Peter Zijlstra
2018-02-01 13:18                 ` Srinivas Pandruvada
2018-02-02 11:00                   ` Rafael J. Wysocki
2018-02-02 14:54                     ` Srinivas Pandruvada
2018-02-02 19:48                       ` Mel Gorman
2018-02-02 20:01                         ` Srinivas Pandruvada
2018-02-05 11:10                           ` Mel Gorman
2018-02-05 17:04                             ` Srinivas Pandruvada
2018-02-05 17:50                               ` Mel Gorman
2018-02-04  8:42                         ` Rafael J. Wysocki
2018-02-04  8:38                       ` Rafael J. Wysocki
2018-02-02 11:42                 ` Rafael J. Wysocki
2018-02-02 12:46                   ` Peter Zijlstra
2018-02-02 12:55                     ` Peter Zijlstra
2018-02-02 14:08                     ` Peter Zijlstra
2018-02-03 16:30                       ` Srinivas Pandruvada
2018-02-05 10:44                         ` Peter Zijlstra
2018-02-05 10:58                           ` Ingo Molnar
2018-02-02 12:58                   ` Peter Zijlstra
2018-02-02 13:27                   ` Mel Gorman
2018-01-30 13:15       ` Mike Galbraith
2018-01-30 13:25         ` Peter Zijlstra
2018-01-30 13:35           ` Mike Galbraith
2018-01-30 11:53   ` Peter Zijlstra
2018-01-30 12:59     ` Mel Gorman
2018-01-30 13:06     ` Peter Zijlstra
2018-01-30 13:18       ` Mel Gorman
2018-02-06 11:56   ` tip-bot for Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-32e839dda3ba576943365f0f5817ce5c843137dc@git.kernel.org \
    --to=tipbot@zytor.com \
    --cc=efault@gmx.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.