From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:59866 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753772AbdGJPYr (ORCPT ); Mon, 10 Jul 2017 11:24:47 -0400 Subject: Patch "sched/core: Implement new approach to scale select_idle_cpu()" has been added to the 4.12-stable tree To: peterz@infradead.org, clm@fb.com, efault@gmx.de, gregkh@linuxfoundation.org, kitsunyan@inbox.ru, matt@codeblueprint.co.uk, mgorman@techsingularity.net, mingo@kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org Cc: , From: Date: Mon, 10 Jul 2017 17:24:43 +0200 Message-ID: <14997002831016@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: This is a note to let you know that I've just added the patch titled sched/core: Implement new approach to scale select_idle_cpu() to the 4.12-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: sched-core-implement-new-approach-to-scale-select_idle_cpu.patch and it can be found in the queue-4.12 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >>From 1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 17 May 2017 12:53:50 +0200 Subject: sched/core: Implement new approach to scale select_idle_cpu() From: Peter Zijlstra commit 1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4 upstream. Hackbench recently suffered a bunch of pain, first by commit: 4c77b18cf8b7 ("sched/fair: Make select_idle_cpu() more aggressive") and then by commit: c743f0a5c50f ("sched/fair, cpumask: Export for_each_cpu_wrap()") which fixed a bug in the initial for_each_cpu_wrap() implementation that made select_idle_cpu() even more expensive. The bug was that it would skip over CPUs when bits were consequtive in the bitmask. This however gave me an idea to fix select_idle_cpu(); where the old scheme was a cliff-edge throttle on idle scanning, this introduces a more gradual approach. Instead of stopping to scan entirely, we limit how many CPUs we scan. Initial benchmarks show that it mostly recovers hackbench while not hurting anything else, except Mason's schbench, but not as bad as the old thing. It also appears to recover the tbench high-end, which also suffered like hackbench. Tested-by: Matt Fleming Signed-off-by: Peter Zijlstra (Intel) Cc: Chris Mason Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: hpa@zytor.com Cc: kitsunyan Cc: linux-kernel@vger.kernel.org Cc: lvenanci@redhat.com Cc: riel@redhat.com Cc: xiaolong.ye@intel.com Link: http://lkml.kernel.org/r/20170517105350.hk5m4h4jb6dfr65a@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman --- kernel/sched/fair.c | 21 ++++++++++++++++----- kernel/sched/features.h | 1 + 2 files changed, 17 insertions(+), 5 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5772,27 +5772,38 @@ static inline int select_idle_smt(struct static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target) { struct sched_domain *this_sd; - u64 avg_cost, avg_idle = this_rq()->avg_idle; + u64 avg_cost, avg_idle; u64 time, cost; s64 delta; - int cpu; + int cpu, nr = INT_MAX; this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc)); if (!this_sd) return -1; - avg_cost = this_sd->avg_scan_cost; - /* * Due to large variance we need a large fuzz factor; hackbench in * particularly is sensitive here. */ - if (sched_feat(SIS_AVG_CPU) && (avg_idle / 512) < avg_cost) + avg_idle = this_rq()->avg_idle / 512; + avg_cost = this_sd->avg_scan_cost + 1; + + if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost) return -1; + if (sched_feat(SIS_PROP)) { + u64 span_avg = sd->span_weight * avg_idle; + if (span_avg > 4*avg_cost) + nr = div_u64(span_avg, avg_cost); + else + nr = 4; + } + time = local_clock(); for_each_cpu_wrap(cpu, sched_domain_span(sd), target) { + if (!--nr) + return -1; if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) continue; if (idle_cpu(cpu)) --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -55,6 +55,7 @@ SCHED_FEAT(TTWU_QUEUE, true) * When doing wakeups, attempt to limit superfluous scans of the LLC domain. */ SCHED_FEAT(SIS_AVG_CPU, false) +SCHED_FEAT(SIS_PROP, true) /* * Issue a WARN when we do multiple update_rq_clock() calls Patches currently in stable-queue which might be from peterz@infradead.org are queue-4.12/sched-numa-hide-numa_wake_affine-from-up-build.patch queue-4.12/sched-numa-use-down_read_trylock-for-the-mmap_sem.patch queue-4.12/x86-uaccess-optimize-copy_user_enhanced_fast_string-for-short-strings.patch queue-4.12/sched-core-implement-new-approach-to-scale-select_idle_cpu.patch queue-4.12/sched-fair-remove-effective_load.patch queue-4.12/sched-numa-implement-numa-node-level-wake_affine.patch queue-4.12/sched-fair-simplify-wake_affine-for-the-single-socket-case.patch queue-4.12/sched-fair-cpumask-export-for_each_cpu_wrap.patch queue-4.12/sched-numa-override-part-of-migrate_degrades_locality-when-idle-balancing.patch