From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1760095AbcAUSrS (ORCPT <rfc822;w@1wt.eu>);
	Thu, 21 Jan 2016 13:47:18 -0500
Received: from cmta7.telus.net ([209.171.16.80]:33426 "EHLO cmta7.telus.net"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1759793AbcAUSrA (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 21 Jan 2016 13:47:00 -0500
X-Greylist: delayed 492 seconds by postgrey-1.27 at vger.kernel.org; Thu, 21 Jan 2016 13:47:00 EST
X-Authority-Analysis: v=2.1 cv=fqshHwMf c=1 sm=2 tr=0
 a=zJWegnE7BH9C0Gl4FFgQyA==:117 a=zJWegnE7BH9C0Gl4FFgQyA==:17
 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=aatUQebYAAAA:8
 a=Pyq9K9CWowscuQLKlpiwfMBGOR0=:19 a=kj9zAlcOel0A:10 a=egogFJTRAAAA:8
 a=VwQbUJbxAAAA:8 a=bm_musDndCPpHZw70_IA:9 a=ZCHKB15I_ztvxE5o:21
 a=-SSp0-fJ8OauOydo:21 a=CjuIK1q_8ugA:10
X-Telus-Outbound-IP: 173.180.45.4
From: "Doug Smythies" <dsmythies@telus.net>
To: "'Peter Zijlstra'" <peterz@infradead.org>,
        "'Vik Heyndrickx'" <vik.heyndrickx@veribox.net>
Cc: <linux-kernel@vger.kernel.org>, "Doug Smythies" <dsmythies@telus.net>
References: <56A0A38D.4040900@veribox.net> <20160121152859.GP6356@twins.programming.kicks-ass.net>
In-Reply-To: <20160121152859.GP6356@twins.programming.kicks-ass.net>
Subject: RE: [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle
Date: Thu, 21 Jan 2016 10:38:43 -0800
Message-ID: <002901d1547a$f5309480$df91bd80$@net>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AdFUYHXGnDH9ziEORmePTv8XWg49XAACulIw
Content-Language: en-ca
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 2016.01.21 07:29 Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 10:23:25AM +0100, Vik Heyndrickx wrote:
>> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they have
>> no load at all.
>> ---
>> Subject: sched: Fix non-zero idle loadavg
>> From: Vik Heyndrickx <vik.heyndrickx@veribox.net>
>> Date: Thu, 21 Jan 2016 10:23:25 +0100

>> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
>> have no load at all.

>> By removing the single code line that performed a rounding on the
>> internally kept load value, effectively returning this function
>> calc_load to its state it had before, the visualization problem is
>> completely fixed.

Yes, but it introduces a systematic error, rather than the current
balanced error. Thus it doubles the maximum error due to finite number
of bits used in the math. 

>> Once the (old) load becomes 93 or higher, it mathematically can never
>> get lower than 93, even when the active (load) remains 0 forever.
>> This results in the strange 0.00, 0.01, 0.05 uptime values on idle
>> systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

As I mentioned on the bug report [1], this is a consequence
of carrying a finite number of bits with a so very strong
IIR (Infinite Impulse Response) filter coefficient.

>> It is not correct to add a 0.5 rounding (=1024/2048) here, since the
>> result from this function is fed back into the next iteration again,
>> so the result of that +0.5 rounding value then gets multiplied by
>> (2048-2037), and then rounded again, so there is a virtual "ghost"
>> load created, next to the old and active load terms.

If you do not round then you get a doubling of problems on the load
increasing side of things. Consider an old load value of 1862 (90.92%),
regardless of how it got there, and a new load value of 2048 (100%)
from here onwards. With this proposed change, the 15 minute math becomes:

new = (old * 2037 + load * (2048 - 2037)) / 2048
new = (1862 * 2037 + 2048 * (2048 - 2037)) / 2048
new = 1862

So, the 100% load will always be shown as 91% (double the old limit).

I have been running this proposed code with 100% load on CPU 7 for a couple
of hours now, and the 15 minute load average is stuck at 0.91.

Myself, I would not take out the rounding, but I defer to Peter.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=45001