From mboxrd@z Thu Jan  1 00:00:00 1970
From: Con Kolivas <kernel@kolivas.org>
Subject: Re: Stolen and degraded time and schedulers
Date: Thu, 15 Mar 2007 08:36:07 +1100
Message-ID: <200703150836.08670.kernel@kolivas.org>
References: <45F6D1D0.6080905@goop.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Return-path: <virtualization-bounces@lists.osdl.org>
In-Reply-To: <45F6D1D0.6080905@goop.org>
Content-Disposition: inline
List-Unsubscribe: <https://lists.osdl.org/mailman/listinfo/virtualization>,
	<mailto:virtualization-request@lists.osdl.org?subject=unsubscribe>
List-Archive: <http://lists.osdl.org/pipermail/virtualization>
List-Post: <mailto:virtualization@lists.osdl.org>
List-Help: <mailto:virtualization-request@lists.osdl.org?subject=help>
List-Subscribe: <https://lists.osdl.org/mailman/listinfo/virtualization>,
	<mailto:virtualization-request@lists.osdl.org?subject=subscribe>
Sender: virtualization-bounces@lists.osdl.org
Errors-To: virtualization-bounces@lists.osdl.org
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: cpufreq@lists.linux.org.uk, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Chris Wright <chrisw@sous-sol.org>, Virtualization Mailing List <virtualization@lists.osdl.org>, john stultz <johnstul@us.ibm.com>, Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>
List-Id: virtualization@lists.linuxfoundation.org

On Wednesday 14 March 2007 03:31, Jeremy Fitzhardinge wrote:
> The current Linux scheduler makes one big assumption: that 1ms of CPU
> time is the same as any other 1ms of CPU time, and that therefore a
> process makes the same amount of progress regardless of which particular
> ms of time it gets.
>
> This assumption is wrong now, and will become more wrong as
> virtualization gets more widely used.
>
> It's wrong now, because it fails to take into account of several kinds
> of missing time:
>
>    1. interrupts - time spent in an ISR is accounted to the current
>       process, even though it gets no direct benefit
>    2. SMM - time is completely lost from the kernel
>    3. slow CPUs - 1ms of 600MHz CPU is less useful than 1ms of 2.4GHz CPU
>
> The first two - time lost to interrupts - are a well known problem, and
> are generally considered to be a non issue.  If you're losing a
> significant amount of time to interrupts, you probably have bigger
> problems.  (Or maybe not?)
>
> The third is not something I've seen discussed before, but it seems like
> it could be a significant problem today.  Certainly, I've noticed it
> myself: an interactive program decides to do something CPU-intensive
> (like start an animation), and it chugs until the conservative governor
> brings the CPU up to speed.  Certainly some of this is because its just
> plain CPU-starved, but I think another factor is that it gets penalized
> for running on a slow CPU: 1ms is not 1ms.  And for power reasons you
> want to encourage processes to run on slow CPUs rather than penalize them.
>
> Virtualization just exacerbates this.  If you have a busy machine
> running multiple virtual CPUs, then each VCPU may only get a small
> proportion of the total amount of available CPU time.  If the kernel's
> scheduler asserts that "you were just scheduled for 1ms, therefore you
> made 1ms of progress", then many timeslices will effectively end up
> being 1ms of 0Mhz CPU - because the VCPU wasn't scheduled and the real
> CPU was doing something else.
>
>
> So how to deal with this?  Basically we need a clock which measures "CPU
> work units", and have the scheduler use this clock.
>
> A "CPU work unit" clock has these properties:
>
>     * inherently per-CPU (from the kernel's perspective, so it would be
>       per-VCPU in a virtual machine)
>     * monotonic - you can't do negative work
>     * measured in "work units"
>
> A "work unit" is probably most simply expressed in cycles - you assume a
> cycle of CPU time is equivalent in terms of work done to any other
> cycle.  This means that 1 cycle at 600MHz is equivalent to 1 cycle at
> 2.4GHz - but of course the 2.4GHz processor gets 4 times as many in any
> real time interval.  (This is the instance where the worst kind of tsc -
> varying speed which stops on idle - is actually exactly what you want.)
>
> You could also measure "work units" in terms of normalized time units:
> if the fastest CPU on the machine is 2.4GHz, then 1ms is 1ms a work unit
> on that CPU, but 250us on the 600MHz CPU.
>
> It doesn't really matter what the unit is, so long as it is used
> consistently to measure how much progress all processes made.

I think you're looking for a complex solution to a problem that doesn't exi=
st. =

The job of the process scheduler is to meter out the available cpu resource=
s. =

It cannot make up cycles for a slow cpu or one that is throttled. If the =

problem is happening due to throttling it should be fixed by altering the =

throttle. The example you describe with the conservative governor is as eas=
y =

to fix as changing to the ondemand governor. Differential power cpus on an =

SMP machine should be managed by SMP balancing choices based on power group=
s.

It would be fine to implement some other accounting of this definition of t=
ime =

for other purposes but not for process scheduler decisions per se.

Sorry to chime in late.  My physical condition prevents me spending any =

extended period of time at the computer so I've tried to be succinct with m=
y =

comments and may not be able to reply again.

-- =

-ck