From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Dan Magenheimer" Subject: RE: [PATCH 0/2] Improve hpet accuracy Date: Thu, 12 Jun 2008 22:47:36 -0600 Message-ID: <20080612224736875.00000057128@djm-pc> References: Reply-To: "dan.magenheimer@oracle.com" Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Dave Winchell , Keir Fraser , xen-devel Cc: Ben Guthro List-Id: xen-devel@lists.xenproject.org Hi Dave -- Hmmm... in my earlier runs with rhel5u1-64, I had apic=3D0 (yes apic, not acpi). Changing it to apic=3D1 gives excellent results (< 0.01% even with overcommit). Changing it back to apic=3D0 has the same fairly bad results, 0.08% with no overcommit and 0.16% (and climbing) with overcommit. Note that this is all with vcpus=3D1. How odd... I vaguely recalled from some research a couple of months ago that hpet is read MORE than once/tick on the boot processor. I can't seem to find the table I compiled from that research, but I did find this in an email I sent to you: "You probably know this already but an n-way 2.6 Linux kernel reads hpet (n+1)*1000 times/second. Let's take five 2-way guests as an example; that comes to 15000 hpet reads/second...." I wondered what was different between apic=3D1 vs 0. Using: # cat /proc/interrupts | grep 'LOC|timer'; sleep 10; \ cat /proc/interrupts | grep 'LOC|timer' you can see that there are always 1000 LOC/sec. But with apic=3D1 there are also about 350 IO-APIC-edge-timer/sec and with apic=3D0 there are 1000 XT-PIC-timer/sec. I suspect that the latter of these (XT-PIC-timer) is messing up your policy and the former (edge-timer) is not. Dan -----Original Message----- From: Dave Winchell [mailto:dwinchell@virtualiron.com] Sent: Thursday, June 12, 2008 4:49 PM To: dan.magenheimer@oracle.com; Keir Fraser; xen-devel Cc: Ben Guthro; Dave Winchell Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy Dan, You shouldn't be getting higher than .05%. I'd like to figure out what is wrong. I'm running the same guest you are with heavy loads and the physical processors overcommitted by 3:1. And I'm seeing .027% error on rh5u1-64 after an hour. Can you type ^a^a^a at the console and then type 'Z' a couple of times about 10 seconds apart and send me the output? Do this when you have a domain running that is keeping poor time. You should take drift measurements over a period of time that is at least 20 minutes, preferably longer. Also, can you send me a tarball of your sources from the xen directory? thanks, Dave -----Original Message----- From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] Sent: Thu 6/12/2008 6:05 PM To: Dave Winchell; Keir Fraser; xen-devel Cc: Ben Guthro Subject: Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy (Going back on list.) OK, so looking at the updated patch, hpet_avoid=3D1 is actually working, just reporting wrong, correct? With el5u1-64-hvm and hpet_avoid=3D1 and timer_mode=3D4, skew is under -0.04% and falling. With hpet_avoid=3D0, it looks about the same. However both cases seem to start creeping up again when I put load on, then fall again when I remove the load -- even with sched-credit capping cpu usage. Odd! This implies to me that the activity in the other domains IS affecting skew on the domain-under-test. (Keir, any comments on the hypothesis attached below?) Another theoretical oddity... if you are always delivering timer ticks "late", fewer than the nominal 1000 ticks/sec should be being received. So then why is guest time actually going faster than an external source? (In my mind, going faster is much worse than going slower because if ntpd or a human moves time backwards to compensate for a clock going faster, "make" and other programs can get very confused.) Dan > -----Original Message----- > From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > Sent: Thursday, June 12, 2008 3:13 PM > To: 'Dave Winchell' > Subject: RE: xen hpet patch > > > One more thought while waiting for compile and reboot: > > Am I right that all of the policies are correcting for when > a domain "A" is out-of-context? There's nothing in any other > domain "B" that can account for any timer loss/gain in domain > "A". The only reason we are running other domains is to ensure > that domain "A" is sometimes out-of-context, and the more > it is out-of-context, the more likely we will observe > a problem, correct? > > If this is true, it doesn't matter what workload is run > in the non-A domains... as long as it is loading the > CPU(s), thus ensuring that domain A is sometimes not > scheduled on any CPU. > > And if all this is true, we may not need to run other > domains at all... running "xm sched-credit -d A -c 50" > should result in domain A being out-of-context at least > half the time.