From: Peter Zijlstra <peterz@infradead.org>
To: Aman Gupta <aman@tmm1.net>
Cc: "Lesław Kopeć" <leslaw.kopec@nasza-klasa.pl>,
linux-kernel@vger.kernel.org,
"Chase Douglas" <chase.douglas@canonical.com>,
"Damien Wyart" <damien.wyart@free.fr>,
"Kyle McMartin" <kyle@redhat.com>,
"Venkatesh Pallipadi" <venki@google.com>,
"Jonathan Nieder" <jrnieder@gmail.com>,
"Doug Smythies" <dsmythies@telus.net>
Subject: Re: Inconsistent load average on tickless kernels
Date: Tue, 06 Mar 2012 00:25:03 +0100 [thread overview]
Message-ID: <1330989903.11248.261.camel@twins> (raw)
In-Reply-To: <CAK=uwuxvzGwes4ffy9EXVReG95U3HQaKmPPKgFnHzo_MwaO_NA@mail.gmail.com>
Doug actually spotted the problem and reported it off-list.
The below patch appears to sort the issue, but I haven't been able to
test the very long idle path simply because x86 doesn't go idle that
long.
I tried writing hpet64 support so we could idle that long, killed all
kinds of stupid kernel threads (watchdogs mostly) that keep waking up
and got a brick..
Clearly I need to try again... but I thought I'd at least share this
stuff.
---
Subject: sched: Fix nohz load accounting -- again!
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 01 Mar 2012 15:04:46 +0100
Various people reported nohz load tracking still being wrecked, but Doug
spotted the actual problem. We fold the nohz remainder in too soon,
causing us to loose samples and under-account.
So instead of playing catch-up up-front, always do a single load-fold
with whatever state we encounter and only then fold the nohz remainder
and play catch-up.
Reported-by: Doug Smythies <dsmythies@telus.net>
Reported-by: Les�=82aw Kope=C4=87 <leslaw.kopec@nasza-klasa.pl>
Reported-by: Aman Gupta <aman@tmm1.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched/core.c | 53 +++++++++++++++++++++++++--------------------------
1 files changed, 26 insertions(+), 27 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b83e8d0..6ffde97 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2266,13 +2266,10 @@ calc_load_n(unsigned long load, unsigned long exp,
* Once we've updated the global active value, we need to apply the exponential
* weights adjusted to the number of cycles missed.
*/
-static void calc_global_nohz(unsigned long ticks)
+static void calc_global_nohz(void)
{
long delta, active, n;
- if (time_before(jiffies, calc_load_update))
- return;
-
/*
* If we crossed a calc_load_update boundary, make sure to fold
* any pending idle changes, the respective CPUs might have
@@ -2284,31 +2281,25 @@ static void calc_global_nohz(unsigned long ticks)
atomic_long_add(delta, &calc_load_tasks);
/*
- * If we were idle for multiple load cycles, apply them.
+ * It could be the one fold was all it took, we done!
*/
- if (ticks >= LOAD_FREQ) {
- n = ticks / LOAD_FREQ;
+ if (time_before(jiffies, calc_load_update + 10))
+ return;
- active = atomic_long_read(&calc_load_tasks);
- active = active > 0 ? active * FIXED_1 : 0;
+ /*
+ * Catch-up, fold however many we are behind still
+ */
+ delta = jiffies - calc_load_update - 10;
+ n = 1 + (delta / LOAD_FREQ);
- avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
- avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
- avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
+ active = atomic_long_read(&calc_load_tasks);
+ active = active > 0 ? active * FIXED_1 : 0;
- calc_load_update += n * LOAD_FREQ;
- }
+ avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
+ avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
+ avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
- /*
- * Its possible the remainder of the above division also crosses
- * a LOAD_FREQ period, the regular check in calc_global_load()
- * which comes after this will take care of that.
- *
- * Consider us being 11 ticks before a cycle completion, and us
- * sleeping for 4*LOAD_FREQ + 22 ticks, then the above code will
- * age us 4 cycles, and the test in calc_global_load() will
- * pick up the final one.
- */
+ calc_load_update += n * LOAD_FREQ;
}
#else
void calc_load_account_idle(struct rq *this_rq)
@@ -2320,7 +2311,7 @@ static inline long calc_load_fold_idle(void)
return 0;
}
-static void calc_global_nohz(unsigned long ticks)
+static void calc_global_nohz(void)
{
}
#endif
@@ -2348,8 +2339,6 @@ void calc_global_load(unsigned long ticks)
{
long active;
- calc_global_nohz(ticks);
-
if (time_before(jiffies, calc_load_update + 10))
return;
@@ -2361,6 +2350,16 @@ void calc_global_load(unsigned long ticks)
avenrun[2] = calc_load(avenrun[2], EXP_15, active);
calc_load_update += LOAD_FREQ;
+
+ /*
+ * Account one period with whatever state we found before
+ * folding in the nohz state and ageing the entire idle period.
+ *
+ * This avoids loosing a sample when we go idle between
+ * calc_load_account_active() (10 ticks ago) and now and thus
+ * under-accounting.
+ */
+ calc_global_nohz();
}
/*
next prev parent reply other threads:[~2012-03-05 23:25 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-06 6:51 Inconsistent load average on tickless kernels Aman Gupta
2012-02-23 15:46 ` Lesław Kopeć
2012-02-29 12:06 ` Peter Zijlstra
2012-02-29 16:24 ` Peter Zijlstra
2012-02-29 17:03 ` Peter Zijlstra
2012-03-05 19:57 ` Lesław Kopeć
2012-03-05 22:45 ` Aman Gupta
2012-03-05 23:25 ` Peter Zijlstra [this message]
2012-03-05 23:32 ` Peter Zijlstra
2012-03-05 23:33 ` Peter Zijlstra
2012-04-17 12:52 ` Lesław Kopeć
2012-04-17 15:30 ` Jonathan Nieder
2012-04-23 16:20 ` Lesław Kopeć
2012-04-23 17:57 ` Jonathan Nieder
2012-04-23 20:21 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1330989903.11248.261.camel@twins \
--to=peterz@infradead.org \
--cc=aman@tmm1.net \
--cc=chase.douglas@canonical.com \
--cc=damien.wyart@free.fr \
--cc=dsmythies@telus.net \
--cc=jrnieder@gmail.com \
--cc=kyle@redhat.com \
--cc=leslaw.kopec@nasza-klasa.pl \
--cc=linux-kernel@vger.kernel.org \
--cc=venki@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox