Re: [PATCH] Fix hvm guest time to be more accurate

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Winchell <dwinchell@virtualiron.com>
To: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
Cc: haitao.shan@intel.com, Dave Winchell <dwinchell@virtualiron.com>,
	xen-devel <xen-devel@lists.xensource.com>,
	"Dong, Eddie" <eddie.dong@intel.com>,
	Ben Guthro <bguthro@virtualiron.com>
Subject: Re: [PATCH] Fix hvm guest time to be more accurate
Date: Mon, 29 Oct 2007 15:55:08 -0400	[thread overview]
Message-ID: <47263A9C.3050004@virtualiron.com> (raw)
In-Reply-To: <C34BC908.1795D%Keir.Fraser@cl.cam.ac.uk>

Keir,

I think its a good idea to have other modes.
However, I don't believe that the mode checked in to the staging
tree will keep good time for a 64 bit Linux guest, if that was what was 
intended.

Here's why:
The guest running under the new option gets a clock interrupt
after being de-scheduled for a while. It calculates missed_ticks
and bumps jiffies by missed_ticks. Jiffies is now correct.
Then, with the new mode as submitted, the guest will get missed_ticks
additional interrupts. For each, the guest will add 1 to jiffies.
The guest is now missed_ticks * clock_period ahead of where it should be.

Under the old/other option, the guest tsc is continuous after a de-scheduled
period, and thus the missed_ticks calculation in the guest results in zero.
Then missed_ticks interrupts are delivered and jiffies is correct.

I just ran a test with two 64bit Linux guests, one Red Hat and one Sles,
under load.  The hypervisor has constant tsc offset per the code 
submitted to
the staging tree.  In each 5 sec period the guest gained 6-10 seconds 
against
ntp time, an error of almost 200%.

[root@vs079 ~]# while :; do ntpdate -q 0.us.pool.ntp.org; sleep 5; done
server 8.15.10.42, stratum 2, offset -0.061007, delay 0.04959
29 Oct 15:21:21 ntpdate[3892]: adjust time server 8.15.10.42 offset 
-0.061007 sec
server 8.15.10.42, stratum 2, offset -0.077763, delay 0.07129
29 Oct 15:21:28 ntpdate[3894]: adjust time server 8.15.10.42 offset 
-0.077763 sec
server 8.15.10.42, stratum 2, offset -1.733141, delay 0.20813

(load started here.)

29 Oct 15:21:35 ntpdate[3968]: step time server 8.15.10.42 offset 
-1.733141 sec
server 8.15.10.42, stratum 2, offset -9.648700, delay 0.04861
29 Oct 15:21:54 ntpdate[4002]: step time server 8.15.10.42 offset 
-9.648700 sec
server 8.15.10.42, stratum 2, offset -22.872883, delay 0.05319
29 Oct 15:22:21 ntpdate[4027]: step time server 8.15.10.42 offset 
-22.872883 sec
server 8.15.10.42, stratum 2, offset -29.036008, delay 0.19337
29 Oct 15:22:38 ntpdate[4039]: step time server 8.15.10.42 offset 
-29.036008 sec
server 8.15.10.42, stratum 2, offset -34.880845, delay 0.04944
29 Oct 15:22:46 ntpdate[4058]: step time server 8.15.10.42 offset 
-34.880845 sec



With these three changes to the constant tsc offset policy in staging,
the error compared to ntp is about .02% under this load.

 > 1. Since you are in missed_ticks(), why not increase the threshold
 >     to 10 sec?
 >
 > 2. In missed_ticks() you should only increment pending_intr_nr by
 > missed_ticks
 >     calculated when  pt_support_time_frozen(domain).
 >
 > 3. You might as well fix this one too since its what we discussed and 
is so
 >     related to constant tsc offset:
 >       In pt_timer_fn, if !pt_support_time_frozen(domain) then
 >       pending_intr_nr should end up with a maximum value of one.
 >

So, I think these changes are necessary for a 64bit Linux policy. If you 
agree, should they go in
as fixes to the constant tsc offset policy in staging now or as a new 
policy?

thanks,
Dave



Keir Fraser wrote:

>I thought the point of the mode in Haitao's patch was to still deliver the
>'right' number of pending interrupts, but not stall the guest TSC while
>delivering them? That's what I checked in as c/s 16237 (in staging tree). If
>we want other modes too they can be added to the enumeration that c/s
>defines.
>
> -- Keir
>
>On 29/10/07 15:00, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
>
>  
>
>>Eddie, Haitao:
>>
>>The patch looks good with the following comments.
>>
>>1. Since you are in missed_ticks(), why not increase the threshold
>>    to 10 sec?
>>
>>2. In missed_ticks() you should only increment pending_intr_nr by
>>missed_ticks
>>    calculated when  pt_support_time_frozen(domain).
>>
>>3. You might as well fix this one too since its what we discussed and is so
>>    related to constant tsc offset:
>>      In pt_timer_fn, if !pt_support_time_frozen(domain) then
>>      pending_intr_nr should end up with a maximum value of one.
>>
>>regards,
>>Dave
>>
>>
>>Dong, Eddie wrote:
>>
>>    
>>
>>>Dave Winchell wrote:
>>> 
>>>
>>>      
>>>
>>>>Eddie,
>>>>
>>>>I implemented #2B and ran a three hour test
>>>>with sles9-64 and rh4u4-64 guests. Each guest had 8 vcpus
>>>>and the box was Intel with 2 physical processors.
>>>>The guests were running large loads.
>>>>Clock was pit. This is my usual test setup, except that I just
>>>>as often used AMD nodes with more processors.
>>>>
>>>>The time error was .02%, good enough for ntpd.
>>>>
>>>>The implementation keeps a constant guest tsc offset.
>>>>There is no pending_nr cancellation.
>>>>When the vpt.c timer expires, it only increments pending_nr
>>>>if its value is zero.
>>>>Missed_ticks() is still calculated, but only to update the new
>>>>timeout value. There is no adjustment to the tsc offset
>>>>(set_guest_time())
>>>>at clock interrupt delivery time nor at re-scheduling time.
>>>>
>>>>So, I like this method better than the pending_nr subtract.
>>>>I'm going to work on this some more and, if all goes well,
>>>>propose a new code submission soon.
>>>>I'll put some kind of policy switch in too, which we can discuss
>>>>and modify, but it will be along the lines of what we discussed below.
>>>>
>>>>Thanks for your input!
>>>>
>>>>-Dave
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Haitao Shai may posted his patch, can u check if there are something
>>>missed?
>>>thx,eddie
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Xen-devel mailing list
>>Xen-devel@lists.xensource.com
>>http://lists.xensource.com/xen-devel
>>    
>>
>
>
>  
>

next prev parent reply	other threads:[~2007-10-29 19:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-24 21:15 [PATCH] Fix hvm guest time to be more accurate Ben Guthro
2007-10-25  5:52 ` Dong, Eddie
2007-10-25 14:45   ` Dave Winchell
2007-10-26  6:48     ` Dong, Eddie
2007-10-26 13:56       ` Dave Winchell
2007-10-26 18:18         ` Dave Winchell
2007-10-29  9:58           ` Dong, Eddie
2007-10-29 15:00             ` Dave Winchell
2007-10-29 17:29               ` Keir Fraser
2007-10-29 19:55                 ` Dave Winchell [this message]
2007-10-29 20:40                   ` Keir Fraser
2007-10-29 20:44                     ` Dave Winchell
2007-10-30 11:45                 ` Dong, Eddie
2007-10-29  9:57         ` Dong, Eddie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47263A9C.3050004@virtualiron.com \
    --to=dwinchell@virtualiron.com \
    --cc=Keir.Fraser@cl.cam.ac.uk \
    --cc=bguthro@virtualiron.com \
    --cc=eddie.dong@intel.com \
    --cc=haitao.shan@intel.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.