From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760095AbcAUSrS (ORCPT ); Thu, 21 Jan 2016 13:47:18 -0500 Received: from cmta7.telus.net ([209.171.16.80]:33426 "EHLO cmta7.telus.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759793AbcAUSrA (ORCPT ); Thu, 21 Jan 2016 13:47:00 -0500 X-Greylist: delayed 492 seconds by postgrey-1.27 at vger.kernel.org; Thu, 21 Jan 2016 13:47:00 EST X-Authority-Analysis: v=2.1 cv=fqshHwMf c=1 sm=2 tr=0 a=zJWegnE7BH9C0Gl4FFgQyA==:117 a=zJWegnE7BH9C0Gl4FFgQyA==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=aatUQebYAAAA:8 a=Pyq9K9CWowscuQLKlpiwfMBGOR0=:19 a=kj9zAlcOel0A:10 a=egogFJTRAAAA:8 a=VwQbUJbxAAAA:8 a=bm_musDndCPpHZw70_IA:9 a=ZCHKB15I_ztvxE5o:21 a=-SSp0-fJ8OauOydo:21 a=CjuIK1q_8ugA:10 X-Telus-Outbound-IP: 173.180.45.4 From: "Doug Smythies" To: "'Peter Zijlstra'" , "'Vik Heyndrickx'" Cc: , "Doug Smythies" References: <56A0A38D.4040900@veribox.net> <20160121152859.GP6356@twins.programming.kicks-ass.net> In-Reply-To: <20160121152859.GP6356@twins.programming.kicks-ass.net> Subject: RE: [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle Date: Thu, 21 Jan 2016 10:38:43 -0800 Message-ID: <002901d1547a$f5309480$df91bd80$@net> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AdFUYHXGnDH9ziEORmePTv8XWg49XAACulIw Content-Language: en-ca Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2016.01.21 07:29 Peter Zijlstra wrote: > On Thu, Jan 21, 2016 at 10:23:25AM +0100, Vik Heyndrickx wrote: >> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they have >> no load at all. >> --- >> Subject: sched: Fix non-zero idle loadavg >> From: Vik Heyndrickx >> Date: Thu, 21 Jan 2016 10:23:25 +0100 >> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they >> have no load at all. >> By removing the single code line that performed a rounding on the >> internally kept load value, effectively returning this function >> calc_load to its state it had before, the visualization problem is >> completely fixed. Yes, but it introduces a systematic error, rather than the current balanced error. Thus it doubles the maximum error due to finite number of bits used in the math. >> Once the (old) load becomes 93 or higher, it mathematically can never >> get lower than 93, even when the active (load) remains 0 forever. >> This results in the strange 0.00, 0.01, 0.05 uptime values on idle >> systems. Note: 93/2048 = 0.0454..., which rounds up to 0.05. As I mentioned on the bug report [1], this is a consequence of carrying a finite number of bits with a so very strong IIR (Infinite Impulse Response) filter coefficient. >> It is not correct to add a 0.5 rounding (=1024/2048) here, since the >> result from this function is fed back into the next iteration again, >> so the result of that +0.5 rounding value then gets multiplied by >> (2048-2037), and then rounded again, so there is a virtual "ghost" >> load created, next to the old and active load terms. If you do not round then you get a doubling of problems on the load increasing side of things. Consider an old load value of 1862 (90.92%), regardless of how it got there, and a new load value of 2048 (100%) from here onwards. With this proposed change, the 15 minute math becomes: new = (old * 2037 + load * (2048 - 2037)) / 2048 new = (1862 * 2037 + 2048 * (2048 - 2037)) / 2048 new = 1862 So, the 100% load will always be shown as 91% (double the old limit). I have been running this proposed code with 100% load on CPU 7 for a couple of hours now, and the 15 minute load average is stuck at 0.91. Myself, I would not take out the rounding, but I defer to Peter. [1] https://bugzilla.kernel.org/show_bug.cgi?id=45001