Re: [PATCH 0/2] Improve hpet accuracy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Winchell <dwinchell@virtualiron.com>
To: dan.magenheimer@oracle.com, Keir Fraser <keir.fraser@eu.citrix.com>
Cc: Dave Winchell <dwinchell@virtualiron.com>,
	xen-devel <xen-devel@lists.xensource.com>,
	Ben Guthro <bguthro@virtualiron.com>
Subject: Re: [PATCH 0/2] Improve hpet accuracy
Date: Fri, 13 Jun 2008 10:58:08 -0400	[thread overview]
Message-ID: <48528B00.40208@virtualiron.com> (raw)
In-Reply-To: <B99564216C25704085A82B41C46DD3427B0610@exchange.katana.local>

Dan, Keir,

In an overnight (17.5 hrs) test with three guests,
8 vcpus each on 8 physical cpus all under
usex b48 loads I noted the following errors:

rh4u664 -.72 sec (.0012%)
rhas5u164 -10.2 sec (.016%)
sles10u164 -9.3 sec (.015%)

The number for rh4u664 is what I am used to seeing on this
platform. The other ones are 10 times worse, but still good enough
for ntp. The reason they are worse is that the guest clock code
for hpet in rhas5u164 looks at the cmp register to calculate interrupt 
delay.
I mentioned before on this list that one of the beauties of hpet
was the fine hpet code in the guest (rh4u664) which did not use the delay
computation, which in my mind is unnecessary and adds error.
Well, in rhas5u164 and I assume in sles10u164 delay is back in
and so is the associated error.

The cmp register is also the reason for the hesitations on boot.
I'll have more to say on this later.

thanks,
Dave



Dave Winchell wrote:

> Hi Dan,
>
> I'm glad your able to reproduce my results.
> Are you still seeing the boot time hang up?
> Is this the reason for vcpus=1?
>
> > you can see that there are always 1000 LOC/sec.  But
> > with apic=1 there are also about 350 IO-APIC-edge-timer/sec
> > and with apic=0 there are 1000 XT-PIC-timer/sec.
>
> > I suspect that the latter of these (XT-PIC-timer) is
> > messing up your policy and the former (edge-timer) is not.
>
> Thanks for this data. Your analysis is correct, I think.
> I wrote the interrupt routing and callback code for the
> IOAPIC edge triggered interrupts. The PIC path does not
> have the callbacks. With no callbacks, it always looks to
> the routing code in hpet.c like its been longer than a period
> since the last one as the end-of-interrupt time stamp is zero. Thus, 
> you get
> an interrupt each timeout or 1000 interrupts/sec.
> 350 is a typical amount when the algorithm for missed ticks is
> doing its thing. I'll put this on the bug list - unless no one
> cares about apic=0.
>
> thanks,
> Dave
>
>
> -----Original Message-----
> From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> Sent: Fri 6/13/2008 12:47 AM
> To: Dave Winchell; Keir Fraser; xen-devel
> Cc: Ben Guthro
> Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
>
> Hi Dave --
>
> Hmmm... in my earlier runs with rhel5u1-64, I had apic=0
> (yes apic, not acpi).  Changing it to apic=1 gives excellent
> results (< 0.01% even with overcommit).  Changing it back
> to apic=0 has the same fairly bad results, 0.08% with no
> overcommit and 0.16% (and climbing) with overcommit.
> Note that this is all with vcpus=1.
>
> How odd...
>
> I vaguely recalled from some research a couple of months ago
> that hpet is read MORE than once/tick on the boot processor.
> I can't seem to find the table I compiled from that research,
> but I did find this in an email I sent to you:
>
> "You probably know this already but an n-way 2.6 Linux
> kernel reads hpet (n+1)*1000 times/second.  Let's take
> five 2-way guests as an example; that comes to 15000
> hpet reads/second...."
>
> I wondered what was different between apic=1 vs 0. Using:
>
> # cat /proc/interrupts | grep 'LOC|timer'; sleep 10; \
>      cat /proc/interrupts | grep 'LOC|timer'
>
> you can see that there are always 1000 LOC/sec.  But
> with apic=1 there are also about 350 IO-APIC-edge-timer/sec
> and with apic=0 there are 1000 XT-PIC-timer/sec.
>
> I suspect that the latter of these (XT-PIC-timer) is
> messing up your policy and the former (edge-timer) is not.
>
> Dan
>
> -----Original Message-----
> From: Dave Winchell [mailto:dwinchell@virtualiron.com]
> Sent: Thursday, June 12, 2008 4:49 PM
> To: dan.magenheimer@oracle.com; Keir Fraser; xen-devel
> Cc: Ben Guthro; Dave Winchell
> Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
>
>
> Dan,
>
> You shouldn't be getting higher than .05%.
> I'd like to figure out what is wrong. I'm running the same guest
> you are with heavy loads and the physical processors overcommitted
> by 3:1. And I'm seeing .027% error on rh5u1-64 after an hour.
>
> Can you type ^a^a^a at the console and then
> type 'Z' a couple of times about 10 seconds apart and send
> me the output? Do this when you have a domain
> running that is keeping poor time.
>
> You should take drift measurements over a period
> of time that is at least 20 minutes, preferably longer.
>
> Also, can you send me a tarball of your sources from
> the xen directory?
>
>
> thanks,
> Dave
>
>
>
>
> -----Original Message-----
> From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> Sent: Thu 6/12/2008 6:05 PM
> To: Dave Winchell; Keir Fraser; xen-devel
> Cc: Ben Guthro
> Subject: Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
>
> (Going back on list.)
>
> OK, so looking at the updated patch, hpet_avoid=1 is actually
> working, just reporting wrong, correct?
>
> With el5u1-64-hvm and hpet_avoid=1 and timer_mode=4, skew
> is under -0.04% and falling.  With hpet_avoid=0, it looks
> about the same.  However both cases seem to start creeping
> up again when I put load on, then fall again when I remove
> the load -- even with sched-credit capping cpu usage.  Odd!
> This implies to me that the activity in the other domains
> IS affecting skew on the domain-under-test. (Keir, any
> comments on the hypothesis attached below?)
>
> Another theoretical oddity... if you are always delivering
> timer ticks "late", fewer than the nominal 1000 ticks/sec
> should be being received.  So then why is guest time actually
> going faster than an external source?
>
> (In my mind, going faster is much worse than going slower
> because if ntpd or a human moves time backwards to compensate
> for a clock going faster, "make" and other programs can
> get very confused.)
>
> Dan
>
> > -----Original Message-----
> > From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> > Sent: Thursday, June 12, 2008 3:13 PM
> > To: 'Dave Winchell'
> > Subject: RE: xen hpet patch
> >
> >
> > One more thought while waiting for compile and reboot:
> >
> > Am I right that all of the policies are correcting for when
> > a domain "A" is out-of-context?  There's nothing in any other
> > domain "B" that can account for any timer loss/gain in domain
> > "A".  The only reason we are running other domains is to ensure
> > that domain "A" is sometimes out-of-context, and the more
> > it is out-of-context, the more likely we will observe
> > a problem, correct?
> >
> > If this is true, it doesn't matter what workload is run
> > in the non-A domains... as long as it is loading the
> > CPU(s), thus ensuring that domain A is sometimes not
> > scheduled on any CPU.
> >
> > And if all this is true, we may not need to run other
> > domains at all... running "xm sched-credit -d A -c 50"
> > should result in domain A being out-of-context at least
> > half the time.
>
>

next prev parent reply	other threads:[~2008-06-13 14:58 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4851940F.2050307@virtualiron.com>
2008-06-12 22:05 ` [PATCH 0/2] Improve hpet accuracy Dan Magenheimer
2008-06-12 22:49   ` Dave Winchell
2008-06-13  4:47     ` Dan Magenheimer
2008-06-13  7:33       ` Keir Fraser
2008-06-13 15:39         ` Dan Magenheimer
2008-06-13 12:08       ` Dave Winchell
2008-06-13 14:58         ` Dave Winchell [this message]
2008-06-13 15:52         ` Dan Magenheimer
2008-06-13 16:53           ` Dave Winchell
2008-06-13  0:38   ` Dave Winchell
2008-06-13  2:21     ` Dan Magenheimer
2008-06-13  3:12       ` Dave Winchell
2008-06-05 14:59 Ben Guthro
2008-06-06  8:58 ` Keir Fraser
2008-06-06 10:45   ` Dave Winchell
2008-06-06 15:53     ` Dan Magenheimer
2008-06-06 17:54       ` Dave Winchell
2008-06-06 19:33       ` Dave Winchell
2008-06-06 20:29         ` Dan Magenheimer
2008-06-06 22:34           ` Keir Fraser
2008-06-07 21:20             ` Dave Winchell
2008-06-09 21:07               ` Dan Magenheimer
2008-06-09 21:44                 ` Dave Winchell
2008-06-08 20:32           ` Dave Winchell
2008-06-08 21:10             ` Dan Magenheimer
2008-06-08 23:26               ` Dave Winchell
2008-06-09  7:36                 ` Keir Fraser
2008-06-09 11:13                   ` Dave Winchell
2008-06-09 12:03                     ` Keir Fraser
2008-06-09 12:10                       ` Keir Fraser
2008-06-09 13:08                         ` Dave Winchell
2008-06-09 20:48                   ` Dan Magenheimer
2008-06-09 21:18                     ` Keir Fraser
2008-06-09 21:46                       ` Dan Magenheimer
2008-06-08 21:18             ` Dan Magenheimer
2008-06-09 22:02             ` Dan Magenheimer
2008-06-09 23:34               ` Dave Winchell
2008-06-10  3:21                 ` Dan Magenheimer
2008-06-11  1:44                   ` Dan Magenheimer
2008-06-11 13:58                     ` Dave Winchell
2008-06-11 16:47                       ` Dan Magenheimer
2008-06-12 22:51                     ` Dan Magenheimer
2008-06-10  7:52                 ` Keir Fraser
2008-06-10 11:49                   ` Dave Winchell
2008-06-10 12:34                     ` Dave Winchell
2008-06-10 12:42                       ` Keir Fraser
2008-06-10 17:13                         ` Dave Winchell
2008-06-11  8:30                           ` Keir Fraser
2008-06-11 11:38                             ` Dave Winchell
2008-06-06 15:35 ` Steven Hand
2008-06-06 17:34   ` Dave Winchell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48528B00.40208@virtualiron.com \
    --to=dwinchell@virtualiron.com \
    --cc=bguthro@virtualiron.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.