From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262979AbUBZUXs (ORCPT ); Thu, 26 Feb 2004 15:23:48 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262971AbUBZUU0 (ORCPT ); Thu, 26 Feb 2004 15:20:26 -0500 Received: from sccrmhc11.comcast.net ([204.127.202.55]:8078 "EHLO sccrmhc11.comcast.net") by vger.kernel.org with ESMTP id S262843AbUBZURK (ORCPT ); Thu, 26 Feb 2004 15:17:10 -0500 Subject: Re: [RFC][PATCH] O(1) Entitlement Based Scheduler From: Albert Cahalan To: Peter Williams Cc: linux-kernel mailing list , johnl@aurema.com In-Reply-To: <403D8FE6.2010905@aurema.com> References: <1077766232.10393.992.camel@cube> <403D8FE6.2010905@aurema.com> Content-Type: text/plain Organization: Message-Id: <1077818221.2255.3.camel@cube> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 26 Feb 2004 12:57:02 -0500 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2004-02-26 at 01:19, Peter Williams wrote: > Albert Cahalan wrote: >> John Lee writes: >>> The usage rates for each task are estimated using Kalman >>> filter techniques, the estimates being similar to those >>> obtained by taking a running average over twice the filter >>> _response half life_ (see below). However, Kalman filter >>> values are cheaper to compute and don't require the >>> maintenance of historical usage data. >> >> >> Linux dearly needs this. Please separate out this part >> of the patch and send it in. > > This information can be determined from the SleepAVG: field in the > /proc//status and /proc//task//status files by > subtracting the value there from 100. This doesn't seem to be the case. For example, a fork() causes the value to be adjusted in both child and parent. Also, perhaps the name is wrong, but I'd think SleepAVG has more to do with the average length of a sleep. It sure isn't documented. (time constant? type of decay?) There's also a need for whole-process stats and cumulative (sum of exited children) stats. %CPU can go as high as 51200%. > Without our patch this value is a > directly calculated estimated of the task's sleep rate which is > available because it used by the O(1) scheduler's heuristics. With our > patches, it is calculated from our estimate of the task's usage because > we dispensed with the sleep average calculations as they are no longer > needed. We decided to still report sleep average in the status file > because we were reluctant to alter the contents of such files in case we > broke user space programs. Generally this is a good move, though I don't expect anything to be using SleepAVG at the moment. >> Right now, Linux does not report the recent CPU usage >> of a process. The UNIX standard requires that "ps" >> report this; right now ps substitutes CPU usage over >> the whole lifetime of a process. >> >> Both per-task and per-process (tid and tgid) numbers >> are needed. Both percent and permill (1/1000) units >> get reported, so don't convert to integer percent. > > I think a modification to fs/proc/array.c to make this field a per > million rather than a percent value would satisfy your needs. It would > be a very small change but there would be concerns about breaking > programs that rely on it being a percentage. Nothing can rely on it existing at all, so a name change would solve the problem of apps getting confused. BTW, permill is not per-million, it is per-thousand. Per-million or per-billion would be fine as long as it doesn't overflow.