From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Winchell Subject: Re: [PATCH] Add a timer mode that disables pending missed ticks Date: Thu, 01 Nov 2007 17:14:50 -0400 Message-ID: <472A41CA.7080405@virtualiron.com> References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------010105000402010406090500" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: "Dong, Eddie" , Dave Winchell , xen-devel@lists.xensource.com, "Shan, Haitao" , "Jiang, Yunhong" List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --------------010105000402010406090500 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Keir, Haitao, Eddie: Here is the patch for timekeeping. There are three components. 1. Increase missed ticks threshold to 100 seconds. Per the comment in the patch, I have seen scheduling delays of 5 seconds in a test lasting 30 minutes. The result is an immediate error of 5 seconds in guest time that persists. 2. In pt_timer_fn for no_missed_tick_accounting option, only increment pt->pending_intr_nr if its found to be zero. This deals with the case where the guest has interrupts disabled for longer than a tick period. 3. Call pt_process_missed_ticks() unconditionally and place the test for no_missed_tick_accounting inside pt_process_missed_ticks(). This returns the calculation for the next timer expiration to the original method. The original method is important as it sets up a theoretical time space at t=0 where expirations occur only at n*period, where n is any integer. This, in turn, removes rounding errors. Accuracy tests: A. 32bit Linux guest (Item '1' above relevant) Test: 2 64bit Linux guests and 1 32bit Linux guest stacked. 8 vcpus for each guest. 2 physical processors on the node. All vcpus heavily loaded with work. Test duration: 3 hours. Result: clock error < .02%. (Before this patch guest time was 16 seconds slow in a 3.5 minute test.) B. 64bit Linux guest (Items '2' and '3' above relevant) Test: 2 64bit Linux guests. 8 vcpus for each guest. 2 physical processors on the node. All vcpus heavily loaded with work. Test duration: 3 hours. Result: clock error .02%. (Before this patch clock error was 1.3%. The contributions of items '2' and '3' above to the improvement in accuracy are .4% and .9% respectively. Of course, these tests are without ntpd running in the guest. Ntpd requires a local clock error of < .05%. With the .02% accuracy, one can used ntpd to synchronize the clock. thanks, Dave Keir Fraser wrote: >I'm going to apply Haitao's no_missed_ticks_fix.patch. If you think fixes >are needed apart from that, please provide a unified diff. > > -- Keir > >On 30/10/07 21:16, "Dave Winchell" wrote: > > > >>Keir, >> >>Here are my comments on your change. >>I've attached an updated vpt.c with the changes. >> >>1. For the no_missed_tick_accounting method, we still need the update >> to pt->scheduled taking into account the time that has elapsed when >> missed ticks are calculated. missed_ticks (pt_process_missed_ticks) >> is called from pt_restore_timer() and pt_timer_fn() and, thus, its >> easiest to put the check for no_missed_tick_accounting method in >> missed_ticks() itself. >> >>2. In pt_timer_fn you don't want to increment pending_intr_nr beyond 1 >> for no_missed_tick_accounting option. >> >>thanks, >>Dave >> >> >>Keir Fraser wrote: >> >> >> >>>Applied as c/s 16274. Please take a look and make sure the mode works >>>as you expect. >>> >>> -- Keir >>> >>>On 30/10/07 14:28, "Shan, Haitao" wrote: >>> >>> Hi, Keir, >>> >>> This patch adds a new timer mode, in which no missed ticks is >>> calculated. This can be used with latest x86_64 linux guest, since >>> it can pick up missed ticks themselves. >>> >>> <> >>> >>> Best Regards >>> Haitao Shan >>> >>> ------------------------------------------------------------------------ >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> >>> >>*** vpt.c.new.c 2007-10-30 16:45:26.000000000 -0400 >>--- vpt.c 2007-10-30 15:30:57.000000000 -0400 >>*************** >>*** 57,73 **** >> return; >> >> missed_ticks = missed_ticks / (s_time_t) pt->period + 1; >>! >>! if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) { >>! if ( missed_ticks > 1000 ) >>! { >>! /* TODO: Adjust guest time together */ >>! pt->pending_intr_nr++; >>! } >>! else >>! { >>! pt->pending_intr_nr += missed_ticks; >>! } >> } >> >> pt->scheduled += missed_ticks * pt->period; >>--- 57,70 ---- >> return; >> >> missed_ticks = missed_ticks / (s_time_t) pt->period + 1; >>! if ( missed_ticks > 1000 ) >>! { >>! /* TODO: Adjust guest time together */ >>! pt->pending_intr_nr++; >>! } >>! else >>! { >>! pt->pending_intr_nr += missed_ticks; >> } >> >> pt->scheduled += missed_ticks * pt->period; >>*************** >>*** 120,126 **** >> >> list_for_each_entry ( pt, head, list ) >> { >>! pt_process_missed_ticks(pt); >> set_timer(&pt->timer, pt->scheduled); >> } >> >>--- 117,124 ---- >> >> list_for_each_entry ( pt, head, list ) >> { >>! if ( !mode_is(v->domain, no_missed_tick_accounting) ) >>! pt_process_missed_ticks(pt); >> set_timer(&pt->timer, pt->scheduled); >> } >> >>*************** >>*** 135,151 **** >> >> pt_lock(pt); >> >>! if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) { >>! if(!pt->pending_intr_nr) >>! pt->pending_intr_nr++; >>! } >>! else >>! pt->pending_intr_nr++; >> >> if ( !pt->one_shot ) >> { >> pt->scheduled += pt->period; >>! pt_process_missed_ticks(pt); >> set_timer(&pt->timer, pt->scheduled); >> } >> >>--- 133,152 ---- >> >> pt_lock(pt); >> >>! pt->pending_intr_nr++; >> >> if ( !pt->one_shot ) >> { >> pt->scheduled += pt->period; >>! if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) >>! { >>! pt_process_missed_ticks(pt); >>! } >>! else if ( (NOW() - pt->scheduled) >= 0 ) >>! { >>! pt->pending_intr_nr++; >>! pt->scheduled = NOW() + pt->period; >>! } >> set_timer(&pt->timer, pt->scheduled); >> } >> >>/* >> * vpt.c: Virtual Platform Timer >> * >> * Copyright (c) 2006, Xiaowei Yang, Intel Corporation. >> * >> * This program is free software; you can redistribute it and/or modify it >> * under the terms and conditions of the GNU General Public License, >> * version 2, as published by the Free Software Foundation. >> * >> * This program is distributed in the hope it will be useful, but WITHOUT >> * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or >> * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for >> * more details. >> * >> * You should have received a copy of the GNU General Public License along >>with >> * this program; if not, write to the Free Software Foundation, Inc., 59 >>Temple >> * Place - Suite 330, Boston, MA 02111-1307 USA. >> * >> */ >> >>#include >>#include >>#include >>#include >> >>#define mode_is(d, name) \ >> ((d)->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name) >> >>static void pt_lock(struct periodic_time *pt) >>{ >> struct vcpu *v; >> >> for ( ; ; ) >> { >> v = pt->vcpu; >> spin_lock(&v->arch.hvm_vcpu.tm_lock); >> if ( likely(pt->vcpu == v) ) >> break; >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >> } >>} >> >>static void pt_unlock(struct periodic_time *pt) >>{ >> spin_unlock(&pt->vcpu->arch.hvm_vcpu.tm_lock); >>} >> >>static void pt_process_missed_ticks(struct periodic_time *pt) >>{ >> s_time_t missed_ticks; >> >> if ( pt->one_shot ) >> return; >> >> missed_ticks = NOW() - pt->scheduled; >> if ( missed_ticks <= 0 ) >> return; >> >> missed_ticks = missed_ticks / (s_time_t) pt->period + 1; >> >> if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) { >>if ( missed_ticks > 1000 ) >> { >>/* TODO: Adjust guest time together */ >>pt->pending_intr_nr++; >> } >>else >> { >>pt->pending_intr_nr += missed_ticks; >> } >> } >> >> pt->scheduled += missed_ticks * pt->period; >>} >> >>static void pt_freeze_time(struct vcpu *v) >>{ >> if ( !mode_is(v->domain, delay_for_missed_ticks) ) >> return; >> >> v->arch.hvm_vcpu.guest_time = hvm_get_guest_time(v); >>} >> >>static void pt_thaw_time(struct vcpu *v) >>{ >> if ( !mode_is(v->domain, delay_for_missed_ticks) ) >> return; >> >> if ( v->arch.hvm_vcpu.guest_time == 0 ) >> return; >> >> hvm_set_guest_time(v, v->arch.hvm_vcpu.guest_time); >> v->arch.hvm_vcpu.guest_time = 0; >>} >> >>void pt_save_timer(struct vcpu *v) >>{ >> struct list_head *head = &v->arch.hvm_vcpu.tm_list; >> struct periodic_time *pt; >> >> if ( test_bit(_VPF_blocked, &v->pause_flags) ) >> return; >> >> spin_lock(&v->arch.hvm_vcpu.tm_lock); >> >> list_for_each_entry ( pt, head, list ) >> stop_timer(&pt->timer); >> >> pt_freeze_time(v); >> >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >>} >> >>void pt_restore_timer(struct vcpu *v) >>{ >> struct list_head *head = &v->arch.hvm_vcpu.tm_list; >> struct periodic_time *pt; >> >> spin_lock(&v->arch.hvm_vcpu.tm_lock); >> >> list_for_each_entry ( pt, head, list ) >> { >>pt_process_missed_ticks(pt); >> set_timer(&pt->timer, pt->scheduled); >> } >> >> pt_thaw_time(v); >> >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >>} >> >>static void pt_timer_fn(void *data) >>{ >> struct periodic_time *pt = data; >> >> pt_lock(pt); >> >> if (mode_is(pt->vcpu->domain, no_missed_tick_accounting)) { >>if(!pt->pending_intr_nr) >> pt->pending_intr_nr++; >> } >> else >>pt->pending_intr_nr++; >> >> if ( !pt->one_shot ) >> { >> pt->scheduled += pt->period; >>pt_process_missed_ticks(pt); >> set_timer(&pt->timer, pt->scheduled); >> } >> >> vcpu_kick(pt->vcpu); >> >> pt_unlock(pt); >>} >> >>void pt_update_irq(struct vcpu *v) >>{ >> struct list_head *head = &v->arch.hvm_vcpu.tm_list; >> struct periodic_time *pt; >> uint64_t max_lag = -1ULL; >> int irq = -1; >> >> spin_lock(&v->arch.hvm_vcpu.tm_lock); >> >> list_for_each_entry ( pt, head, list ) >> { >> if ( !is_isa_irq_masked(v, pt->irq) && pt->pending_intr_nr && >> ((pt->last_plt_gtime + pt->period_cycles) < max_lag) ) >> { >> max_lag = pt->last_plt_gtime + pt->period_cycles; >> irq = pt->irq; >> } >> } >> >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >> >> if ( is_lvtt(v, irq) ) >> { >> vlapic_set_irq(vcpu_vlapic(v), irq, 0); >> } >> else if ( irq >= 0 ) >> { >> hvm_isa_irq_deassert(v->domain, irq); >> hvm_isa_irq_assert(v->domain, irq); >> } >>} >> >>static struct periodic_time *is_pt_irq( >> struct vcpu *v, struct hvm_intack intack) >>{ >> struct list_head *head = &v->arch.hvm_vcpu.tm_list; >> struct periodic_time *pt; >> struct RTCState *rtc = &v->domain->arch.hvm_domain.pl_time.vrtc; >> int vector; >> >> list_for_each_entry ( pt, head, list ) >> { >> if ( !pt->pending_intr_nr ) >> continue; >> >> if ( is_lvtt(v, pt->irq) ) >> { >> if ( pt->irq != intack.vector ) >> continue; >> return pt; >> } >> >> vector = get_isa_irq_vector(v, pt->irq, intack.source); >> >> /* RTC irq need special care */ >> if ( (intack.vector != vector) || >> ((pt->irq == 8) && !is_rtc_periodic_irq(rtc)) ) >> continue; >> >> return pt; >> } >> >> return NULL; >>} >> >>void pt_intr_post(struct vcpu *v, struct hvm_intack intack) >>{ >> struct periodic_time *pt; >> time_cb *cb; >> void *cb_priv; >> >> spin_lock(&v->arch.hvm_vcpu.tm_lock); >> >> pt = is_pt_irq(v, intack); >> if ( pt == NULL ) >> { >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >> return; >> } >> >> if ( pt->one_shot ) >> { >> pt->enabled = 0; >> list_del(&pt->list); >> } >> else >> { >> pt->pending_intr_nr--; >> if ( mode_is(v->domain, no_missed_tick_accounting) ) >> pt->last_plt_gtime = hvm_get_guest_time(v); >> else >> pt->last_plt_gtime += pt->period_cycles; >> } >> >> if ( mode_is(v->domain, delay_for_missed_ticks) && >> (hvm_get_guest_time(v) < pt->last_plt_gtime) ) >> hvm_set_guest_time(v, pt->last_plt_gtime); >> >> cb = pt->cb; >> cb_priv = pt->priv; >> >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >> >> if ( cb != NULL ) >> cb(v, cb_priv); >>} >> >>void pt_reset(struct vcpu *v) >>{ >> struct list_head *head = &v->arch.hvm_vcpu.tm_list; >> struct periodic_time *pt; >> >> spin_lock(&v->arch.hvm_vcpu.tm_lock); >> >> list_for_each_entry ( pt, head, list ) >> { >> pt->pending_intr_nr = 0; >> pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu); >> pt->scheduled = NOW() + pt->period; >> set_timer(&pt->timer, pt->scheduled); >> } >> >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >>} >> >>void pt_migrate(struct vcpu *v) >>{ >> struct list_head *head = &v->arch.hvm_vcpu.tm_list; >> struct periodic_time *pt; >> >> spin_lock(&v->arch.hvm_vcpu.tm_lock); >> >> list_for_each_entry ( pt, head, list ) >> migrate_timer(&pt->timer, v->processor); >> >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >>} >> >>void create_periodic_time( >> struct vcpu *v, struct periodic_time *pt, uint64_t period, >> uint8_t irq, char one_shot, time_cb *cb, void *data) >>{ >> destroy_periodic_time(pt); >> >> spin_lock(&v->arch.hvm_vcpu.tm_lock); >> >> pt->enabled = 1; >> pt->pending_intr_nr = 0; >> >> /* Periodic timer must be at least 0.9ms. */ >> if ( (period < 900000) && !one_shot ) >> { >> gdprintk(XENLOG_WARNING, >> "HVM_PlatformTime: program too small period %"PRIu64"\n", >> period); >> period = 900000; >> } >> >> pt->period = period; >> pt->vcpu = v; >> pt->last_plt_gtime = hvm_get_guest_time(pt->vcpu); >> pt->irq = irq; >> pt->period_cycles = (u64)period * cpu_khz / 1000000L; >> pt->one_shot = one_shot; >> pt->scheduled = NOW() + period; >> /* >> * Offset LAPIC ticks from other timer ticks. Otherwise guests which use >> * LAPIC ticks for process accounting can see long sequences of process >> * ticks incorrectly accounted to interrupt processing. >> */ >> if ( is_lvtt(v, irq) ) >> pt->scheduled += period >> 1; >> pt->cb = cb; >> pt->priv = data; >> >> list_add(&pt->list, &v->arch.hvm_vcpu.tm_list); >> >> init_timer(&pt->timer, pt_timer_fn, pt, v->processor); >> set_timer(&pt->timer, pt->scheduled); >> >> spin_unlock(&v->arch.hvm_vcpu.tm_lock); >>} >> >>void destroy_periodic_time(struct periodic_time *pt) >>{ >> if ( !pt->enabled ) >> return; >> >> pt_lock(pt); >> pt->enabled = 0; >> list_del(&pt->list); >> pt_unlock(pt); >> >> /* >> * pt_timer_fn() can run until this kill_timer() returns. We must do this >> * outside pt_lock() otherwise we can deadlock with pt_timer_fn(). >> */ >> kill_timer(&pt->timer); >>} >> >> > > > > --------------010105000402010406090500 Content-Type: text/x-patch; name="vpt.xen.un.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="vpt.xen.un.patch" diff -r 2717128cbdd1 xen/arch/x86/hvm/vpt.c --- a/xen/arch/x86/hvm/vpt.c Wed Oct 31 10:07:42 2007 +0000 +++ b/xen/arch/x86/hvm/vpt.c Thu Nov 01 16:19:06 2007 -0400 @@ -57,14 +57,23 @@ static void pt_process_missed_ticks(stru return; missed_ticks = missed_ticks / (s_time_t) pt->period + 1; - if ( missed_ticks > 1000 ) - { - /* TODO: Adjust guest time together */ - pt->pending_intr_nr++; - } - else - { - pt->pending_intr_nr += missed_ticks; + + if( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) { + /* When many guests are stacked on a node and the total number of vcpus + * running loads is 10 timed the number of physical processors, + * scheduling delays of 5 seconds are common. So set the threshold at + * 100 seconds to be safe. It would be reasonable to remove this + * check against threshold completely. + */ + if ( missed_ticks > 100000 ) + { + /* TODO: Adjust guest time together */ + pt->pending_intr_nr++; + } + else + { + pt->pending_intr_nr += missed_ticks; + } } pt->scheduled += missed_ticks * pt->period; @@ -117,15 +126,7 @@ void pt_restore_timer(struct vcpu *v) list_for_each_entry ( pt, head, list ) { - if ( !mode_is(v->domain, no_missed_tick_accounting) ) - { - pt_process_missed_ticks(pt); - } - else if ( (NOW() - pt->scheduled) >= 0 ) - { - pt->pending_intr_nr++; - pt->scheduled = NOW() + pt->period; - } + pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); } @@ -140,13 +141,14 @@ static void pt_timer_fn(void *data) pt_lock(pt); - pt->pending_intr_nr++; + if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) || !pt->pending_intr_nr ) { + pt->pending_intr_nr++; + } if ( !pt->one_shot ) { pt->scheduled += pt->period; - if ( !mode_is(pt->vcpu->domain, no_missed_tick_accounting) ) - pt_process_missed_ticks(pt); + pt_process_missed_ticks(pt); set_timer(&pt->timer, pt->scheduled); } --------------010105000402010406090500 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --------------010105000402010406090500--