[PATCH 3/9] sched/core: Implement new approach to scale select_idle_cpu()

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Linux-Stable <stable@vger.kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 3/9] sched/core: Implement new approach to scale select_idle_cpu()
Date: Mon, 10 Jul 2017 13:37:46 +0100	[thread overview]
Message-ID: <20170710123752.7563-4-mgorman@techsingularity.net> (raw)
In-Reply-To: <20170710123752.7563-1-mgorman@techsingularity.net>

From: Peter Zijlstra <peterz@infradead.org>

commit 1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4 upstream.

Hackbench recently suffered a bunch of pain, first by commit:

  4c77b18cf8b7 ("sched/fair: Make select_idle_cpu() more aggressive")

and then by commit:

  c743f0a5c50f ("sched/fair, cpumask: Export for_each_cpu_wrap()")

which fixed a bug in the initial for_each_cpu_wrap() implementation
that made select_idle_cpu() even more expensive. The bug was that it
would skip over CPUs when bits were consequtive in the bitmask.

This however gave me an idea to fix select_idle_cpu(); where the old
scheme was a cliff-edge throttle on idle scanning, this introduces a
more gradual approach. Instead of stopping to scan entirely, we limit
how many CPUs we scan.

Initial benchmarks show that it mostly recovers hackbench while not
hurting anything else, except Mason's schbench, but not as bad as the
old thing.

It also appears to recover the tbench high-end, which also suffered like
hackbench.

Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: kitsunyan <kitsunyan@inbox.ru>
Cc: linux-kernel@vger.kernel.org
Cc: lvenanci@redhat.com
Cc: riel@redhat.com
Cc: xiaolong.ye@intel.com
Link: http://lkml.kernel.org/r/20170517105350.hk5m4h4jb6dfr65a@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c     | 21 ++++++++++++++++-----
 kernel/sched/features.h |  1 +
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c9c1818d3caf..19e2469f7aea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5772,27 +5772,38 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	struct sched_domain *this_sd;
-	u64 avg_cost, avg_idle = this_rq()->avg_idle;
+	u64 avg_cost, avg_idle;
 	u64 time, cost;
 	s64 delta;
-	int cpu;
+	int cpu, nr = INT_MAX;
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
 	if (!this_sd)
 		return -1;
 
-	avg_cost = this_sd->avg_scan_cost;
-
 	/*
 	 * Due to large variance we need a large fuzz factor; hackbench in
 	 * particularly is sensitive here.
 	 */
-	if (sched_feat(SIS_AVG_CPU) && (avg_idle / 512) < avg_cost)
+	avg_idle = this_rq()->avg_idle / 512;
+	avg_cost = this_sd->avg_scan_cost + 1;
+
+	if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost)
 		return -1;
 
+	if (sched_feat(SIS_PROP)) {
+		u64 span_avg = sd->span_weight * avg_idle;
+		if (span_avg > 4*avg_cost)
+			nr = div_u64(span_avg, avg_cost);
+		else
+			nr = 4;
+	}
+
 	time = local_clock();
 
 	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
+		if (!--nr)
+			return -1;
 		if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
 			continue;
 		if (idle_cpu(cpu))
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 11192e0cb122..ce7b4b6ac733 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -55,6 +55,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
  * When doing wakeups, attempt to limit superfluous scans of the LLC domain.
  */
 SCHED_FEAT(SIS_AVG_CPU, false)
+SCHED_FEAT(SIS_PROP, true)
 
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
-- 
2.13.1

next prev parent reply	other threads:[~2017-07-10 12:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-10 12:37 [PATCH 0/9] Performance-related backports for 4.12 Mel Gorman
2017-07-10 12:37 ` [PATCH 1/9] x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings Mel Gorman
2017-07-10 12:37 ` [PATCH 2/9] sched/fair, cpumask: Export for_each_cpu_wrap() Mel Gorman
2017-07-10 12:37 ` Mel Gorman [this message]
2017-07-10 12:37 ` [PATCH 4/9] sched/numa: Use down_read_trylock() for the mmap_sem Mel Gorman
2017-07-10 12:37 ` [PATCH 5/9] sched/numa: Override part of migrate_degrades_locality() when idle balancing Mel Gorman
2017-07-10 12:37 ` [PATCH 6/9] sched/fair: Simplify wake_affine() for the single socket case Mel Gorman
2017-07-10 12:37 ` [PATCH 7/9] sched/numa: Implement NUMA node level wake_affine() Mel Gorman
2017-07-10 12:37 ` [PATCH 8/9] sched/fair: Remove effective_load() Mel Gorman
2017-07-10 12:37 ` [PATCH 9/9] sched/numa: Hide numa_wake_affine() from UP build Mel Gorman
2017-07-10 15:25 ` [PATCH 0/9] Performance-related backports for 4.12 Greg KH
2017-07-10 15:32   ` Mel Gorman

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c9c1818d3ca dfblob:19e2469f7ae dfblob:11192e0cb12
dfblob:ce7b4b6ac73 )
 OR (
bs:"[PATCH 3/9] sched/core: Implement new approach to scale select_idle_cpu()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170710123752.7563-4-mgorman@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox