From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dave Winchell <dwinchell@virtualiron.com>
Subject: Re: [PATCH] Fix hvm guest time to be more accurate
Date: Mon, 29 Oct 2007 15:55:08 -0400
Message-ID: <47263A9C.3050004@virtualiron.com>
References: <C34BC908.1795D%Keir.Fraser@cl.cam.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <C34BC908.1795D%Keir.Fraser@cl.cam.ac.uk>
List-Unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
Cc: haitao.shan@intel.com, Dave Winchell <dwinchell@virtualiron.com>, xen-devel <xen-devel@lists.xensource.com>, "Dong,
	Eddie" <eddie.dong@intel.com>, Ben Guthro <bguthro@virtualiron.com>
List-Id: xen-devel@lists.xenproject.org

Keir,

I think its a good idea to have other modes.
However, I don't believe that the mode checked in to the staging
tree will keep good time for a 64 bit Linux guest, if that was what was 
intended.

Here's why:
The guest running under the new option gets a clock interrupt
after being de-scheduled for a while. It calculates missed_ticks
and bumps jiffies by missed_ticks. Jiffies is now correct.
Then, with the new mode as submitted, the guest will get missed_ticks
additional interrupts. For each, the guest will add 1 to jiffies.
The guest is now missed_ticks * clock_period ahead of where it should be.

Under the old/other option, the guest tsc is continuous after a de-scheduled
period, and thus the missed_ticks calculation in the guest results in zero.
Then missed_ticks interrupts are delivered and jiffies is correct.

I just ran a test with two 64bit Linux guests, one Red Hat and one Sles,
under load.  The hypervisor has constant tsc offset per the code 
submitted to
the staging tree.  In each 5 sec period the guest gained 6-10 seconds 
against
ntp time, an error of almost 200%.

[root@vs079 ~]# while :; do ntpdate -q 0.us.pool.ntp.org; sleep 5; done
server 8.15.10.42, stratum 2, offset -0.061007, delay 0.04959
29 Oct 15:21:21 ntpdate[3892]: adjust time server 8.15.10.42 offset 
-0.061007 sec
server 8.15.10.42, stratum 2, offset -0.077763, delay 0.07129
29 Oct 15:21:28 ntpdate[3894]: adjust time server 8.15.10.42 offset 
-0.077763 sec
server 8.15.10.42, stratum 2, offset -1.733141, delay 0.20813

(load started here.)

29 Oct 15:21:35 ntpdate[3968]: step time server 8.15.10.42 offset 
-1.733141 sec
server 8.15.10.42, stratum 2, offset -9.648700, delay 0.04861
29 Oct 15:21:54 ntpdate[4002]: step time server 8.15.10.42 offset 
-9.648700 sec
server 8.15.10.42, stratum 2, offset -22.872883, delay 0.05319
29 Oct 15:22:21 ntpdate[4027]: step time server 8.15.10.42 offset 
-22.872883 sec
server 8.15.10.42, stratum 2, offset -29.036008, delay 0.19337
29 Oct 15:22:38 ntpdate[4039]: step time server 8.15.10.42 offset 
-29.036008 sec
server 8.15.10.42, stratum 2, offset -34.880845, delay 0.04944
29 Oct 15:22:46 ntpdate[4058]: step time server 8.15.10.42 offset 
-34.880845 sec


With these three changes to the constant tsc offset policy in staging,
the error compared to ntp is about .02% under this load.

 > 1. Since you are in missed_ticks(), why not increase the threshold
 >     to 10 sec?
 >
 > 2. In missed_ticks() you should only increment pending_intr_nr by
 > missed_ticks
 >     calculated when  pt_support_time_frozen(domain).
 >
 > 3. You might as well fix this one too since its what we discussed and 
is so
 >     related to constant tsc offset:
 >       In pt_timer_fn, if !pt_support_time_frozen(domain) then
 >       pending_intr_nr should end up with a maximum value of one.
 >

So, I think these changes are necessary for a 64bit Linux policy. If you 
agree, should they go in
as fixes to the constant tsc offset policy in staging now or as a new 
policy?

thanks,
Dave


Keir Fraser wrote:

>I thought the point of the mode in Haitao's patch was to still deliver the
>'right' number of pending interrupts, but not stall the guest TSC while
>delivering them? That's what I checked in as c/s 16237 (in staging tree). If
>we want other modes too they can be added to the enumeration that c/s
>defines.
>
> -- Keir
>
>On 29/10/07 15:00, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
>
>  
>
>>Eddie, Haitao:
>>
>>The patch looks good with the following comments.
>>
>>1. Since you are in missed_ticks(), why not increase the threshold
>>    to 10 sec?
>>
>>2. In missed_ticks() you should only increment pending_intr_nr by
>>missed_ticks
>>    calculated when  pt_support_time_frozen(domain).
>>
>>3. You might as well fix this one too since its what we discussed and is so
>>    related to constant tsc offset:
>>      In pt_timer_fn, if !pt_support_time_frozen(domain) then
>>      pending_intr_nr should end up with a maximum value of one.
>>
>>regards,
>>Dave
>>
>>
>>Dong, Eddie wrote:
>>
>>    
>>
>>>Dave Winchell wrote:
>>> 
>>>
>>>      
>>>
>>>>Eddie,
>>>>
>>>>I implemented #2B and ran a three hour test
>>>>with sles9-64 and rh4u4-64 guests. Each guest had 8 vcpus
>>>>and the box was Intel with 2 physical processors.
>>>>The guests were running large loads.
>>>>Clock was pit. This is my usual test setup, except that I just
>>>>as often used AMD nodes with more processors.
>>>>
>>>>The time error was .02%, good enough for ntpd.
>>>>
>>>>The implementation keeps a constant guest tsc offset.
>>>>There is no pending_nr cancellation.
>>>>When the vpt.c timer expires, it only increments pending_nr
>>>>if its value is zero.
>>>>Missed_ticks() is still calculated, but only to update the new
>>>>timeout value. There is no adjustment to the tsc offset
>>>>(set_guest_time())
>>>>at clock interrupt delivery time nor at re-scheduling time.
>>>>
>>>>So, I like this method better than the pending_nr subtract.
>>>>I'm going to work on this some more and, if all goes well,
>>>>propose a new code submission soon.
>>>>I'll put some kind of policy switch in too, which we can discuss
>>>>and modify, but it will be along the lines of what we discussed below.
>>>>
>>>>Thanks for your input!
>>>>
>>>>-Dave
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Haitao Shai may posted his patch, can u check if there are something
>>>missed?
>>>thx,eddie
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Xen-devel mailing list
>>Xen-devel@lists.xensource.com
>>http://lists.xensource.com/xen-devel
>>    
>>
>
>
>  
>