From: "David S. Ahern" <daahern@cisco.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Glauber Costa <gcosta@redhat.com>, kvm-devel <kvm@vger.kernel.org>
Subject: Re: PIT/ntp/timekeeping [was Re: kvm guest loops_per_jiffy miscalibration under host load]
Date: Tue, 29 Jul 2008 11:29:30 -0600 [thread overview]
Message-ID: <488F537A.9030202@cisco.com> (raw)
In-Reply-To: <488F3FF4.3070804@cisco.com>
[-- Attachment #1: Type: text/plain, Size: 5203 bytes --]
Forgot to add something in my last response: Another time-based oddity
I'm seeing in multi-processor guests is the microseconds value changing
as the process is moved between the vcpus.
The attached code exemplifies what I mean. In a RHEL3 VM with 2 vcpus,
start the program with an argument of 990000 (to get a wakeup every ~1
sec). Once started lock it to a vcpu. You'll nice consistent output like:
1217351975.261974
1217351976.262292
1217351977.262608
1217351978.262929
1217351979.263243
1217351980.263563
1217351981.263940
Then switch the affinity to the other vcpu. The microseconds value jumps:
1217351982.796132
1217351983.797411
1217351984.797719
1217351985.798041
1217351986.798368
1217351987.798788
1217351988.799025
Toggling the affinity or letting the process roam between the 2
processors causes the microseconds to jump. These means that data logged
using the microseconds value will show time jumps back and forth.
As I understand it the root cause is the TSC-based updates to what is
returned by gettimeofday so the fact that they toggle means the 2 vcpus
see different tsc counts. Is there anyway to make the counts coherent as
processes roam vcpus?
david
David S. Ahern wrote:
>
> Marcelo Tosatti wrote:
>> On Tue, Jul 22, 2008 at 01:56:12PM -0600, David S. Ahern wrote:
>>> I've been running a series of tests on RHEL3, RHEL4, and RHEL5. The
>>> short of it is that all of them keep time quite well with 1 vcpu. In the
>>> case of RHEL3 and RHEL4 time is stable for *both* the uniprocessor and
>>> smp kernels, again with only 1 vcpu (there's no up/smp distinction in
>>> the kernels for RHEL5).
>>>
>>> As soon as the number of vcpus is >1, time drifts systematically with
>>> the guest *leading* the host. I see this on unloaded guests and hosts
>>> (ie., cpu usage on the host ~<5%). The drift is averaging around
>>> 0.5%-0.6% (i.e., 5 seconds gained in the guest per 1000 seconds of real
>>> wall time).
>>>
>>> This very reproducible. All I am doing is installing stock RHEL3.8, 4.4
>>> and 5.2, i386 versions, starting them and watching the drift with no
>>> time servers. In all of these recent cases the results are for in-kernel
>>> pit.
>> David,
>>
>> You mentioned earlier problems with ntpd syncing the guest time? Can you
>> provide more details?
>>
>
> It would lose sync often, and 'ntpq -c pe' would show a '*' indicative
> of a sync when in fact time in the guest was off by 5-10 seconds. It may
> very well be a side effect of the drift due to repeated timer injection
> of timer interrupts / lost interrupts.
>
> With your PIT injection patches:
>
> 1. For a stock RHEL4.4 guest, ntpd synchronized quickly and saw no need
> to adjust time after the initial startup tweak of 1.004620 sec by
> ntpdate. After 40 hours it has maintained time very well with no
> adjustments. Of course the guest is relatively idle -- it is only
> keeping time.
>
> 2. For a stock RHEL3.8 guest, I cannot get ntpd to do anything. This
> guest is running on the same host as the RHEL4 guest and using the same
> time server. This guest has been around for a few weeks and has been
> subjected to very tests -- like running with the no-kernel-pit and -tdf
> options. In light of 3. below I'll re-create this guest and see if the
> problem goes away.
>
> 3. For a RHEL3.8 guest running a Cisco product, ntpd was able to
> synchonize just fine. We are running ntpd with different arguments;
> however using the same syntax on the stock rhel3 guest did not help.
>
> As for as time updates, over 21+ hours of uptime there have been 20 time
> resets -- adjustments ranging from -1.01 seconds to +0.75 seconds. This
> is a remarkable improvement. Before this PIT patch set I was seeing time
> resets of 3-5 seconds every 15 minutes. This is a 2 vcpu guest running a
> modest load (disk + network) that pushes cpu usage of ~25%. Point being
> that the guest is keeping time reasonably well while do something
> useful. :-)
>
> I am planning to install 4 vcpu guests for both RHEL3 and RHEL4 today
> and again with modest loads to see how it holds up.
>
>> I find it _necessary_ to use the RR scheduling policy for any Linux
>> guest running at static 1000Hz (no dynticks), otherwise timer interrupts
>> will invariably be missed. And reinjection plus lost tick adjustment is
>> always problematic (will drift either way, depending which version of
>> Linux). With the standard batch scheduling policy _idle_ guests can wait
>> to run upto 6/7 ms in my testing (thus 6/7 lost timer events). Which
>> also means latency can be horrible.
>>
>
> Noted. I'd prefer not to start priority escalations, but if it's needed....
>
> What about for the RHEL4.7 kernel running at 250 HZ? I understand it
> with 4.7 you can pass a command line divider to run the clock at a
> slower rate. In the past I've recompiled RHEL4 kernels to run at 250 HZ
> which was a trade-off between too fast (overhead of timer interrupts)
> and too slow (need for better scheduling latency).
>
>
> david
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: showtime.c --]
[-- Type: text/x-csrc, Size: 545 bytes --]
#include <sys/time.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <libgen.h>
int main(int argc, char *argv[])
{
unsigned long tsleep;
struct timeval tv;
if (argc != 2) {
printf("usage: %s sleeptime\n", basename(argv[0]));
return 1;
}
tsleep = atoi(argv[1]);
if (tsleep == 0) {
printf("usage: invalid sleeptime\n");
return 2;
}
while(1) {
if (gettimeofday(&tv, NULL) != 0)
printf("gettimeofday failed\n");
else
printf("%ld.%ld\n", tv.tv_sec, tv.tv_usec);
usleep(tsleep);
}
return 0;
}
prev parent reply other threads:[~2008-07-29 17:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-22 3:25 kvm guest loops_per_jiffy miscalibration under host load Marcelo Tosatti
2008-07-22 8:22 ` Jan Kiszka
2008-07-22 12:49 ` Marcelo Tosatti
2008-07-22 15:54 ` Jan Kiszka
2008-07-22 22:00 ` Dor Laor
2008-07-22 19:56 ` David S. Ahern
2008-07-23 2:57 ` David S. Ahern
2008-07-29 14:58 ` Marcelo Tosatti
2008-07-29 16:06 ` PIT/ntp/timekeeping [was Re: kvm guest loops_per_jiffy miscalibration under host load] David S. Ahern
2008-07-29 17:29 ` David S. Ahern [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=488F537A.9030202@cisco.com \
--to=daahern@cisco.com \
--cc=gcosta@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.