public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle
@ 2016-01-21  9:23 Vik Heyndrickx
  2016-01-21 15:28 ` Peter Zijlstra
  2016-01-21 18:54 ` [tip:sched/urgent] sched: Fix non-zero idle loadavg tip-bot for Vik Heyndrickx
  0 siblings, 2 replies; 5+ messages in thread
From: Vik Heyndrickx @ 2016-01-21  9:23 UTC (permalink / raw)
  To: a.p.zijlstra; +Cc: linux-kernel

Systems show a minimal load average of 0.00, 0.01, 0.05 even when they 
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the 
last five years up until kernel version 4.4, show a 5- and 15-minute 
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on 
idle systems, but the way the kernel calculates this value prevents it 
from getting lower than the mentioned values. Likewise but not as 
obviously noticeable, a fully loaded system with no processes waiting, 
shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95 (multiplied by number 
of cores).
By removing the single code line that performed a rounding on the 
internally kept load value, effectively returning this function 
calc_load to its state it had before, the visualization problem is 
completely fixed.
The modified code was tested on nohz=off and nohz kernels. It was tested 
on vanilla kernel 4.4 and on centos 7.1 kernel 3.10.0-327. It was tested 
on single, dual, and octal cores system. It was tested on virtual hosts 
and bare hardware. No unwanted effects have been observed, and the 
problems that the patch intended to fix were indeed gone.

The following patch is for kernel version 4.x . In kernel 3.x, the 
affected code was in core.c instead of loadavg.c

Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>

--- linux-4.4-org/kernel/sched/loadavg.c 2016-01-21 09:11:15 +0100
+++ linux-4.4/kernel/sched/loadavg.c 2016-01-21 09:11:31 +0100
@@ -101,7 +101,6 @@ calc_load(unsigned long load, unsigned l
  {
         load *= exp;
         load += active * (FIXED_1 - exp);
-       load += 1UL << (FSHIFT - 1);
         return load >> FSHIFT;
  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle
  2016-01-21  9:23 [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle Vik Heyndrickx
@ 2016-01-21 15:28 ` Peter Zijlstra
  2016-01-21 18:38   ` Doug Smythies
  2016-01-21 18:54 ` [tip:sched/urgent] sched: Fix non-zero idle loadavg tip-bot for Vik Heyndrickx
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2016-01-21 15:28 UTC (permalink / raw)
  To: Vik Heyndrickx; +Cc: linux-kernel, Doug Smythies

On Thu, Jan 21, 2016 at 10:23:25AM +0100, Vik Heyndrickx wrote:
> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they have
> no load at all.

Thanks, I've edited the patch Changelog to include a few extra details
you mentioned in our preview correspondence.

See below. Please let me know if you're OK with this.

---
Subject: sched: Fix non-zero idle loadavg
From: Vik Heyndrickx <vik.heyndrickx@veribox.net>
Date: Thu, 21 Jan 2016 10:23:25 +0100

Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.4, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with
no processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99,
0.95 (multiplied by number of cores).

By removing the single code line that performed a rounding on the
internally kept load value, effectively returning this function
calc_load to its state it had before, the visualization problem is
completely fixed.

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.4 and on centos 7.1 kernel 3.10.0-327. It was tested
on single, dual, and octal cores system. It was tested on virtual hosts
and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Fixes: 0f004f5a696a ("sched: Cure more NO_HZ load average woes")
Cc: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
[Changelog edits]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/56A0A38D.4040900@veribox.net
---
 kernel/sched/loadavg.c |    1 -
 1 file changed, 1 deletion(-)

--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -101,7 +101,6 @@ calc_load(unsigned long load, unsigned l
 {
 	load *= exp;
 	load += active * (FIXED_1 - exp);
-	load += 1UL << (FSHIFT - 1);
 	return load >> FSHIFT;
 }
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle
  2016-01-21 15:28 ` Peter Zijlstra
@ 2016-01-21 18:38   ` Doug Smythies
  2016-01-22  0:43     ` Vik Heyndrickx
  0 siblings, 1 reply; 5+ messages in thread
From: Doug Smythies @ 2016-01-21 18:38 UTC (permalink / raw)
  To: 'Peter Zijlstra', 'Vik Heyndrickx'
  Cc: linux-kernel, Doug Smythies

On 2016.01.21 07:29 Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 10:23:25AM +0100, Vik Heyndrickx wrote:
>> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they have
>> no load at all.
>> ---
>> Subject: sched: Fix non-zero idle loadavg
>> From: Vik Heyndrickx <vik.heyndrickx@veribox.net>
>> Date: Thu, 21 Jan 2016 10:23:25 +0100

>> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
>> have no load at all.

>> By removing the single code line that performed a rounding on the
>> internally kept load value, effectively returning this function
>> calc_load to its state it had before, the visualization problem is
>> completely fixed.

Yes, but it introduces a systematic error, rather than the current
balanced error. Thus it doubles the maximum error due to finite number
of bits used in the math. 

>> Once the (old) load becomes 93 or higher, it mathematically can never
>> get lower than 93, even when the active (load) remains 0 forever.
>> This results in the strange 0.00, 0.01, 0.05 uptime values on idle
>> systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

As I mentioned on the bug report [1], this is a consequence
of carrying a finite number of bits with a so very strong
IIR (Infinite Impulse Response) filter coefficient.

>> It is not correct to add a 0.5 rounding (=1024/2048) here, since the
>> result from this function is fed back into the next iteration again,
>> so the result of that +0.5 rounding value then gets multiplied by
>> (2048-2037), and then rounded again, so there is a virtual "ghost"
>> load created, next to the old and active load terms.

If you do not round then you get a doubling of problems on the load
increasing side of things. Consider an old load value of 1862 (90.92%),
regardless of how it got there, and a new load value of 2048 (100%)
from here onwards. With this proposed change, the 15 minute math becomes:

new = (old * 2037 + load * (2048 - 2037)) / 2048
new = (1862 * 2037 + 2048 * (2048 - 2037)) / 2048
new = 1862

So, the 100% load will always be shown as 91% (double the old limit).

I have been running this proposed code with 100% load on CPU 7 for a couple
of hours now, and the 15 minute load average is stuck at 0.91.

Myself, I would not take out the rounding, but I defer to Peter.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=45001

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:sched/urgent] sched: Fix non-zero idle loadavg
  2016-01-21  9:23 [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle Vik Heyndrickx
  2016-01-21 15:28 ` Peter Zijlstra
@ 2016-01-21 18:54 ` tip-bot for Vik Heyndrickx
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Vik Heyndrickx @ 2016-01-21 18:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: efault, tglx, linux-kernel, hpa, vik.heyndrickx, torvalds,
	dsmythies, mingo, peterz

Commit-ID:  1f9649ef6aa1bac53fb478d9e641b22d67f8423c
Gitweb:     http://git.kernel.org/tip/1f9649ef6aa1bac53fb478d9e641b22d67f8423c
Author:     Vik Heyndrickx <vik.heyndrickx@veribox.net>
AuthorDate: Thu, 21 Jan 2016 10:23:25 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 21 Jan 2016 18:55:23 +0100

sched: Fix non-zero idle loadavg

Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.4, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with
no processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99,
0.95 (multiplied by number of cores).

By removing the single code line that performed a rounding on the
internally kept load value, effectively returning this function
calc_load to its state it had before, the visualization problem is
completely fixed.

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.4 and on centos 7.1 kernel 3.10.0-327. It was tested
on single, dual, and octal cores system. It was tested on virtual hosts
and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
[ Changelog edits ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 0f004f5a696a ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/56A0A38D.4040900@veribox.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/loadavg.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index ef71590..eb83b93 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -101,7 +101,6 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
 {
 	load *= exp;
 	load += active * (FIXED_1 - exp);
-	load += 1UL << (FSHIFT - 1);
 	return load >> FSHIFT;
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle
  2016-01-21 18:38   ` Doug Smythies
@ 2016-01-22  0:43     ` Vik Heyndrickx
  0 siblings, 0 replies; 5+ messages in thread
From: Vik Heyndrickx @ 2016-01-22  0:43 UTC (permalink / raw)
  To: Doug Smythies, 'Peter Zijlstra'; +Cc: linux-kernel

On 21/01/2016 19:38, Doug Smythies wrote:
> new = (old * 2037 + load * (2048 - 2037)) / 2048
> new = (1862 * 2037 + 2048 * (2048 - 2037)) / 2048
> new = 1862
>
> So, the 100% load will always be shown as 91% (double the old limit).

Math seems sound, but the fact is that the load on all my test machines 
now drops to 0.00/0.00/0.00 on idle, and increases to e.g. on my octa- 
core 8.00/8.00/8.00 on full load.

I used mprime -t to cause a full load on all cores.

load can never drop below 0, but can and will exceed 2048 unless nothing 
else is running, which is then likely the reason why 8.00 is actually 
reached for the 5 and 15 minute value despite the math here above.

I would not worry too much about the accuracy, because, with or without 
the rounding, it takes more than three quarters of an hour to reach 100% 
15-minute loadavg from idle, or the other way around 0% from full load.

> I have been running this proposed code with 100% load on CPU 7 for a couple
> of hours now, and the 15 minute load average is stuck at 0.91.

In theory possible, but any instantaneous load above 100% will raise 
that 0.91 further.
I think I can easily change the calc_load algorithm further to have the 
best of both worlds, but it will still take 45 minutes to reach a 
15-minute avgload.

After 40 minutes, and just to be sure, I checked again, and my buildhost 
now has the following load: 8.00, 8.02, 7.94.

-- 
Vik

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-22  0:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-21  9:23 [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle Vik Heyndrickx
2016-01-21 15:28 ` Peter Zijlstra
2016-01-21 18:38   ` Doug Smythies
2016-01-22  0:43     ` Vik Heyndrickx
2016-01-21 18:54 ` [tip:sched/urgent] sched: Fix non-zero idle loadavg tip-bot for Vik Heyndrickx

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox