From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Winchell Subject: Re: [PATCH] Fix hvm guest time to be more accurate Date: Mon, 29 Oct 2007 15:55:08 -0400 Message-ID: <47263A9C.3050004@virtualiron.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: haitao.shan@intel.com, Dave Winchell , xen-devel , "Dong, Eddie" , Ben Guthro List-Id: xen-devel@lists.xenproject.org Keir, I think its a good idea to have other modes. However, I don't believe that the mode checked in to the staging tree will keep good time for a 64 bit Linux guest, if that was what was intended. Here's why: The guest running under the new option gets a clock interrupt after being de-scheduled for a while. It calculates missed_ticks and bumps jiffies by missed_ticks. Jiffies is now correct. Then, with the new mode as submitted, the guest will get missed_ticks additional interrupts. For each, the guest will add 1 to jiffies. The guest is now missed_ticks * clock_period ahead of where it should be. Under the old/other option, the guest tsc is continuous after a de-scheduled period, and thus the missed_ticks calculation in the guest results in zero. Then missed_ticks interrupts are delivered and jiffies is correct. I just ran a test with two 64bit Linux guests, one Red Hat and one Sles, under load. The hypervisor has constant tsc offset per the code submitted to the staging tree. In each 5 sec period the guest gained 6-10 seconds against ntp time, an error of almost 200%. [root@vs079 ~]# while :; do ntpdate -q 0.us.pool.ntp.org; sleep 5; done server 8.15.10.42, stratum 2, offset -0.061007, delay 0.04959 29 Oct 15:21:21 ntpdate[3892]: adjust time server 8.15.10.42 offset -0.061007 sec server 8.15.10.42, stratum 2, offset -0.077763, delay 0.07129 29 Oct 15:21:28 ntpdate[3894]: adjust time server 8.15.10.42 offset -0.077763 sec server 8.15.10.42, stratum 2, offset -1.733141, delay 0.20813 (load started here.) 29 Oct 15:21:35 ntpdate[3968]: step time server 8.15.10.42 offset -1.733141 sec server 8.15.10.42, stratum 2, offset -9.648700, delay 0.04861 29 Oct 15:21:54 ntpdate[4002]: step time server 8.15.10.42 offset -9.648700 sec server 8.15.10.42, stratum 2, offset -22.872883, delay 0.05319 29 Oct 15:22:21 ntpdate[4027]: step time server 8.15.10.42 offset -22.872883 sec server 8.15.10.42, stratum 2, offset -29.036008, delay 0.19337 29 Oct 15:22:38 ntpdate[4039]: step time server 8.15.10.42 offset -29.036008 sec server 8.15.10.42, stratum 2, offset -34.880845, delay 0.04944 29 Oct 15:22:46 ntpdate[4058]: step time server 8.15.10.42 offset -34.880845 sec With these three changes to the constant tsc offset policy in staging, the error compared to ntp is about .02% under this load. > 1. Since you are in missed_ticks(), why not increase the threshold > to 10 sec? > > 2. In missed_ticks() you should only increment pending_intr_nr by > missed_ticks > calculated when pt_support_time_frozen(domain). > > 3. You might as well fix this one too since its what we discussed and is so > related to constant tsc offset: > In pt_timer_fn, if !pt_support_time_frozen(domain) then > pending_intr_nr should end up with a maximum value of one. > So, I think these changes are necessary for a 64bit Linux policy. If you agree, should they go in as fixes to the constant tsc offset policy in staging now or as a new policy? thanks, Dave Keir Fraser wrote: >I thought the point of the mode in Haitao's patch was to still deliver the >'right' number of pending interrupts, but not stall the guest TSC while >delivering them? That's what I checked in as c/s 16237 (in staging tree). If >we want other modes too they can be added to the enumeration that c/s >defines. > > -- Keir > >On 29/10/07 15:00, "Dave Winchell" wrote: > > > >>Eddie, Haitao: >> >>The patch looks good with the following comments. >> >>1. Since you are in missed_ticks(), why not increase the threshold >> to 10 sec? >> >>2. In missed_ticks() you should only increment pending_intr_nr by >>missed_ticks >> calculated when pt_support_time_frozen(domain). >> >>3. You might as well fix this one too since its what we discussed and is so >> related to constant tsc offset: >> In pt_timer_fn, if !pt_support_time_frozen(domain) then >> pending_intr_nr should end up with a maximum value of one. >> >>regards, >>Dave >> >> >>Dong, Eddie wrote: >> >> >> >>>Dave Winchell wrote: >>> >>> >>> >>> >>>>Eddie, >>>> >>>>I implemented #2B and ran a three hour test >>>>with sles9-64 and rh4u4-64 guests. Each guest had 8 vcpus >>>>and the box was Intel with 2 physical processors. >>>>The guests were running large loads. >>>>Clock was pit. This is my usual test setup, except that I just >>>>as often used AMD nodes with more processors. >>>> >>>>The time error was .02%, good enough for ntpd. >>>> >>>>The implementation keeps a constant guest tsc offset. >>>>There is no pending_nr cancellation. >>>>When the vpt.c timer expires, it only increments pending_nr >>>>if its value is zero. >>>>Missed_ticks() is still calculated, but only to update the new >>>>timeout value. There is no adjustment to the tsc offset >>>>(set_guest_time()) >>>>at clock interrupt delivery time nor at re-scheduling time. >>>> >>>>So, I like this method better than the pending_nr subtract. >>>>I'm going to work on this some more and, if all goes well, >>>>propose a new code submission soon. >>>>I'll put some kind of policy switch in too, which we can discuss >>>>and modify, but it will be along the lines of what we discussed below. >>>> >>>>Thanks for your input! >>>> >>>>-Dave >>>> >>>> >>>> >>>> >>>> >>>Haitao Shai may posted his patch, can u check if there are something >>>missed? >>>thx,eddie >>> >>> >>> >>> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >> >> > > > >