[PATCH 2/2] sched/fair: Increase wakeup_gran if current task has not executed the minimum granularity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Mike Galbraith <efault@gmx.de>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 2/2] sched/fair: Increase wakeup_gran if current task has not executed the minimum granularity
Date: Thu, 21 Oct 2021 15:56:03 +0100	[thread overview]
Message-ID: <20211021145603.5313-3-mgorman@techsingularity.net> (raw)
In-Reply-To: <20211021145603.5313-1-mgorman@techsingularity.net>

Commit 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
moved the kernel.sched_wakeup_granularity_ns sysctl under debugfs.
One of the reasons why this sysctl may be used may be for "optimising
for throughput", particularly when overloaded. The tool TuneD sometimes
alters this for two profiles e.g. "mssql" and "throughput-performance". At
least version 2.9 does but it changed in master where it also will poke
at debugfs instead. This patch aims to reduce the motivation to tweak
sysctl_sched_wakeup_granularity by increasing sched_wakeup_granularity
if the running task runtime has not exceeded sysctl_sched_min_granularity.

During task migration or wakeup, a decision is made on whether
to preempt the current task or not. To limit over-scheduled,
sysctl_sched_wakeup_granularity delays the preemption to allow at least 1ms
of runtime before preempting. However, when a domain is heavily overloaded
(e.g. hackbench), the degree of over-scheduling is still severe. This is
problematic as time is wasted rescheduling tasks that could instead be
used by userspace tasks.

However, care must be taken. Even if a system is overloaded, there may
be high priority threads that must still be able to run. Mike Galbraith
explained the constraints as follows;

        CFS came about because the O1 scheduler was unfair to the
        point it had starvation problems. People pretty much across the
        board agreed that a fair scheduler was a much way better way
        to go, and CFS was born.  It didn't originally have the sleep
        credit business, but had to grow it to become _short term_ fair.
        Ingo cut the sleep credit in half because of overscheduling, and
        that has worked out pretty well all told.. but now you're pushing
        it more in the unfair direction, all the way to extremely unfair
        for anything and everything very light.

        Fairness isn't the holy grail mind you, and at some point, giving
        up on short term fairness certainly isn't crazy, as proven by your
        hackbench numbers and other numbers we've seen over the years,
        but taking bites out of the 'CF' in the CFS that was born to be a
        corner-case killer is.. worrisome.  The other shoe will drop.. it
        always does :)

This patch increases the wakeup granularity if the current task has not
reached its minimum preemption granularity. The current task may still
be preempted but the difference in runtime must be higher.

hackbench-process-pipes
                          5.15.0-rc3             5.15.0-rc3
               sched-wakeeflips-v1r1sched-scalewakegran-v3r2
Amean     1        0.3890 (   0.00%)      0.3823 (   1.71%)
Amean     4        0.5217 (   0.00%)      0.4867 (   6.71%)
Amean     7        0.5387 (   0.00%)      0.5053 (   6.19%)
Amean     12       0.5443 (   0.00%)      0.5450 (  -0.12%)
Amean     21       0.6487 (   0.00%)      0.6807 (  -4.93%)
Amean     30       0.8033 (   0.00%)      0.7107 *  11.54%*
Amean     48       1.2400 (   0.00%)      1.0447 *  15.75%*
Amean     79       1.8200 (   0.00%)      1.6033 *  11.90%*
Amean     110      2.5820 (   0.00%)      2.0763 *  19.58%*
Amean     141      3.2203 (   0.00%)      2.5313 *  21.40%*
Amean     172      3.8200 (   0.00%)      3.1163 *  18.42%*
Amean     203      4.3357 (   0.00%)      3.5560 *  17.98%*
Amean     234      4.8047 (   0.00%)      3.8913 *  19.01%*
Amean     265      5.1243 (   0.00%)      4.2293 *  17.47%*
Amean     296      5.5940 (   0.00%)      4.5357 *  18.92%*

                  5.15.0-rc3  5.15.0-rc3
         sched-wakeeflips-v1r1 sched-scalewakegran-v3r2
Duration User        2567.27     2034.17
Duration System     21098.79    17137.08
Duration Elapsed      136.49      120.2

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c     | 17 +++++++++++++++--
 kernel/sched/features.h |  2 ++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d00af3b97d8f..dee108470297 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7052,10 +7052,23 @@ balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 }
 #endif /* CONFIG_SMP */
 
-static unsigned long wakeup_gran(struct sched_entity *se)
+static unsigned long
+wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
 {
 	unsigned long gran = sysctl_sched_wakeup_granularity;
 
+	if (sched_feat(SCALE_WAKEUP_GRAN)) {
+		unsigned long delta_exec;
+
+		/*
+		 * Increase the wakeup granularity if curr's runtime
+		 * is less than the minimum preemption granularity.
+		 */
+		delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+		if (delta_exec < sysctl_sched_min_granularity)
+			gran += sysctl_sched_min_granularity;
+	}
+
 	/*
 	 * Since its curr running now, convert the gran from real-time
 	 * to virtual-time in his units.
@@ -7094,7 +7107,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
 	if (vdiff <= 0)
 		return -1;
 
-	gran = wakeup_gran(se);
+	gran = wakeup_gran(curr, se);
 	if (vdiff > gran)
 		return 1;
 
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 7f8dace0964c..611591355ffd 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -95,3 +95,5 @@ SCHED_FEAT(LATENCY_WARN, false)
 
 SCHED_FEAT(ALT_PERIOD, true)
 SCHED_FEAT(BASE_SLICE, true)
+
+SCHED_FEAT(SCALE_WAKEUP_GRAN, true)
-- 
2.31.1

next prev parent reply	other threads:[~2021-10-21 14:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 14:56 [PATCH 0/2] Reduce stacking and overscheduling Mel Gorman
2021-10-21 14:56 ` [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers Mel Gorman
2021-10-22 10:26   ` Mike Galbraith
2021-10-22 11:05     ` Mel Gorman
2021-10-22 12:00       ` Mike Galbraith
2021-10-25  6:35       ` Mike Galbraith
2021-10-26  8:18         ` Mel Gorman
2021-10-26 10:15           ` Mike Galbraith
2021-10-26 10:41             ` Mike Galbraith
2021-10-26 11:57               ` Mel Gorman
2021-10-26 12:13                 ` Mike Galbraith
2021-10-27  2:09                   ` Mike Galbraith
2021-10-27  9:00                     ` Mel Gorman
2021-10-27 10:18                       ` Mike Galbraith
2021-11-09 11:56   ` Peter Zijlstra
2021-11-09 12:55     ` Mike Galbraith
2021-10-21 14:56 ` Mel Gorman [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-10-28  9:48 [PATCH v4 0/2] Reduce stacking and overscheduling Mel Gorman
2021-10-28  9:48 ` [PATCH 2/2] sched/fair: Increase wakeup_gran if current task has not executed the minimum granularity Mel Gorman
2021-10-29 16:07   ` Vincent Guittot

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:d00af3b97d8 dfblob:dee10847029 dfblob:7f8dace0964
dfblob:611591355ff )
 OR (
bs:"[PATCH 2/2] sched/fair: Increase wakeup_gran if current task has not executed the minimum granularity" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211021145603.5313-3-mgorman@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox