[PATCH] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matt Fleming <matt@codeblueprint.co.uk>
To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Matt Fleming <matt@codeblueprint.co.uk>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	stable@vger.kernel.org,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
Date: Wed,  8 Feb 2017 13:29:24 +0000	[thread overview]
Message-ID: <20170208132924.3038-1-matt@codeblueprint.co.uk> (raw)

The calculation for the next sample window when exiting NOH_HZ idle
does not handle the fact that we may not have reached the next sample
window yet, i.e. that we came out of idle between sample windows.

If we wake from NO_HZ idle after the pending this_rq->calc_load_update
window time when we want idle but before the next sample window, we
will add an unnecessary LOAD_FREQ delay to the load average
accounting, delaying any update for potentially ~9seconds.

This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.

It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().

This issue is easy to reproduce before,

  commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")

just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.

Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: <stable@vger.kernel.org> # v3.5+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/loadavg.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index a2d6eb71f06b..a7a6f3646970 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -199,6 +199,7 @@ void calc_load_enter_idle(void)
 void calc_load_exit_idle(void)
 {
 	struct rq *this_rq = this_rq();
+	unsigned long next_window;

 	/*
 	 * If we're still before the sample window, we're done.
@@ -210,10 +211,16 @@ void calc_load_exit_idle(void)
 	 * We woke inside or after the sample window, this means we're already
 	 * accounted through the nohz accounting, so skip the entire deal and
 	 * sync up for the next window.
+	 *
+	 * The next window is 'calc_load_update' if we haven't reached it yet,
+	 * and 'calc_load_update + 10' if we're inside the current window.
 	 */
-	this_rq->calc_load_update = calc_load_update;
-	if (time_before(jiffies, this_rq->calc_load_update + 10))
-		this_rq->calc_load_update += LOAD_FREQ;
+	next_window = calc_load_update;
+
+	if (time_in_range_open(jiffies, next_window, next_window + 10)
+		next_window += LOAD_FREQ;
+
+	this_rq->calc_load_update = next_window;
 }

 static long calc_load_fold_idle(void)
-- 
2.10.0

next             reply	other threads:[~2017-02-08 13:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-08 13:29 Matt Fleming [this message]
2017-02-08 16:04 ` [PATCH] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting kbuild test robot
2017-02-08 16:07 ` kbuild test robot
2017-02-09 15:07 ` Peter Zijlstra
2017-02-15 15:30   ` Frederic Weisbecker
2017-02-15 15:12 ` Frederic Weisbecker
2017-02-15 16:16   ` Matt Fleming
2017-02-15 16:44     ` Frederic Weisbecker

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a2d6eb71f06 dfblob:a7a6f364697 )
 OR (
bs:"[PATCH] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170208132924.3038-1-matt@codeblueprint.co.uk \
    --to=matt@codeblueprint.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=umgwanakikbuti@gmail.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).