Hi Keir, Thanks for applying the fixes in the last submit. In moving the test for no_missed_tick_accounting into pt_process_missed_ticks() you forgot to add the scheduling part. This patch adds the scheduling. It also defines two options for scheduling, PT_SCHEDULE_SYNC and PT_SCHEDULE_ASYNC. The default is PT_SCHEDULE_SYNC. I am not providing any means for selecting between these two options. My purpose is to have something to discuss and you can do as you like with this patch or tell me what you want. We have been discussing the difference between SYNC and ASYNC scheduling. I found the reason the code checked in as of 10.31 had an error of .23%. In the following fragment from pt_restore_timer() if ( !mode_is(v->domain, no_missed_tick_accounting) ) { pt_process_missed_ticks(pt); } else if ( (NOW() - pt->scheduled) >= 0 ) { pt->pending_intr_nr++; pt->scheduled = NOW() + pt->period; } set_timer(&pt->timer, pt->scheduled); the pt->pending_intr_nr++; line is the problem because if the value of pt->pending_intr_nr is non-zero before the increment, then you get extra timer injections. We can either test for zero, set unconditionally to 1, or leave the line out altogether and let the timeout take care of it. I have chosen the later in the patch. The result with this fixed, and with the patch submitted here is as follows with the usual test I have been describing. For both SYNC and ASYNC methods, with a sles9sp3-64 and rh4u4-64 guest running at the same time, the errors were less than .028%. I have run some tests with single guests and big loads and found sles to be in error by .1% with the ASYNC method. All of the SYNC tests have been under .03%. But my tests are short, usually 10 to 20 minutes. And there is quite a bit of variability. Looking at the timer code for the two guests, they look identical to me. In my review of the Linux code, I see no reason why the interrupt deliveries need to be "SYNC". This has been your intuition all along. So, I leave the choice up to you. thanks, Dave Dave Winchell wrote: > oops, you're right. I missed the ( (NOW() - pt->scheduled) >= 0 ). > > This means I will have to come up with another explanation. > > -Dave > > > Keir Fraser wrote: > >> Thanks for the explanation. I think you are mis-describing the current >> behaviour of vpt.c though (If I understand it correctly myself, which >> I may >> not!). As I understand it, we do *not* unconditionally deliver a tick >> and >> reset the time space when a vcpu is scheduled onto a cpu. We only do >> that if >> (NOW() - pt->scheduled) >= 0 -- that is if more than a tick period has >> passed. Otherwise we wait to deliver the next tick exactly one period >> after >> the previous tick was delivered. The remainder of your explanation >> seems to >> be predicated on the (impossible?) case that we deliver a tick less >> than one >> period after the previous one. >> >> Am I confused? :-) Also, what version of Linux are we talking about >> here, >> and what periodic timer, and x86/64 or i386? There are lots of different >> Linux timer-handling logics out there in the wild! >> >> -- Keir >> >> On 2/11/07 15:51, "Dave Winchell" wrote: >> >> >> >>> Let D be the time that the clock vcpu is descheduled and P be >>> the clock period. When D < P, I think there is an issue. >>> >>> The reason is that Linux's offset calculation, which affects the >>> last clock interrupt tsc that is recorded, doesn't kick in if the >>> time since the last interrupt is less than P. In this case it sets >>> offset to zero. Linux records the tsc of the current (last) clock >>> interrupt >>> as (current tsc - offset). >>> >>> Interrupt delivery method AS delivers a clock interrupt >>> at context switch (in) time. When D < P, Linux records the >>> interrupt delivery time as current tsc and bumps jiffies. >>> This results in a gain of time equal to P-D over wall time. >>> >>> For method S this never happens because the interrupt isn't >>> delivered until the next boundary. >>> >> >> >> >> >> >