From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: credit2 question
Date: Thu, 24 Jan 2013 10:56:19 +0000
Message-ID: <51011353.1020709@eu.citrix.com>
References: <5100F37702000078000B8FC2@nat28.tlf.novell.com>
	<5101039E.4060504@eu.citrix.com>
	<5101159F02000078000B90DC@nat28.tlf.novell.com>
	<510107E7.7050701@eu.citrix.com>
	<51011E2802000078000B910E@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <51011E2802000078000B910E@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 24/01/13 10:42, Jan Beulich wrote:
>>>> On 24.01.13 at 11:07, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> On 24/01/13 10:06, Jan Beulich wrote:
>>>>>> On 24.01.13 at 10:49, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>>>> On 24/01/13 07:40, Jan Beulich wrote:
>>>>> George,
>>>>>
>>>>> I'm getting puzzled by the second c2t() invocation in
>>>>> csched_runtime(): Why is the difference of credits being passed
>>>>> here? Doesn't that (unless svc->credit is non-positive, i.e. in all
>>>>> but unusual cases) guarantee time > ntime, and particularly
>>>>> allow for negative ntime?
>>>> Ah, right -- yes, if the other guys' credit is positive, "ntime" is
>>>> guaranteed to be lower.  Since c2t() involves integer division, it would
>>>> definiteyl be good to get rid of the extra call if we can.
>>>>
>>>> My general principle is to make the code clear and easily readable
>>>> first, and then do optimization afterwards -- in this case I just never
>>>> came back and did the optimization step.
>>> Oh, I wasn't thinking of just the optimization. It seemed wrong to
>>> me to do the subtraction there in the first place: "time" is being
>>> calculated from a plain value, so why would "ntime" be calculated
>>> from a delta?
>> Ah, right -- so the idea here was to run until snext->credit was equal
>> to svc->credit.  That's why the delta.
> Which then means that under normal circumstances you would
> always only run each vCPU for CSCHED_MIN_TIMER, which
> seems quite odd.

Only when both domains are burning cpu -- which in my tests, even for 
very busy domains, was very rarely the case.  This is particularly in 
the case of HVM domains, where there are regular MMIO accesses that 
throw everything into a bit of a kilter. :-)

> Wouldn't it be more fair to do e.g.
>
>              if ( time > ntime )
>                  time = (time + ntime) / 2;
>
> since otherwise at the expiry of the time the two vCPU-s have
> equal credit, whereas you would generally expect a vCPU that
> just finished running to have lower credit than the next one to
> run?

The divide by two thing would either get just variations of runtimes, or 
always MAX_TIMER.  Suppose we have "burner" vcpus v1 and v2 (and we 
didn't have the 'max'): Just after the "reset", everyone's credit is 
around 10ms; so this would cause v1 to run for (10+(10-10))/2 == 5ms, 
then v2 to run for (10+(10-5))/2=7.5ms (!), then v1 to run for 
(5+(5-2.5))/2 = 3.75ms, &c.

We could do the average after applying MAX, but then you just get a 
"steady state" for burners of (MIN+MAX)/2.

In any case, scheduling is like economics: apparently simple minor 
changes can have really big effects, so you need to be very careful 
about changing things without doing experiments first; and I don't 
really have time at the moment.  The current one is known to be at least 
"not awful", so I think we should just stick with it until someone can 
take a proper look at potential alternatives. :-)

> But as you validly said earlier, avoiding the c2t() in cases where we
> can tell up front that "time" would end up below CSCHED_MIN_TIMER
> (particularly zero or negative) would be desirable. I'd prefer to
> leave doing that to you though.

I've already started. :-)

  -George