From: Dor Laor <dlaor@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
"Yang, Sheng" <sheng.yang@intel.com>,
Alexander Graf <agraf@suse.de>,
"David S. Ahern" <daahern@cisco.com>,
kvm-devel <kvm@vger.kernel.org>,
Glauber de Oliveira Costa <gcosta@redhat.com>,
Gleb Natapov <gleb@redhat.com>
Subject: Re: gettimeofday "slow" in RHEL4 guests
Date: Mon, 29 Dec 2008 18:12:23 +0200 [thread overview]
Message-ID: <4958F6E7.9020309@redhat.com> (raw)
In-Reply-To: <4958CC9A.5050008@redhat.com>
Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> The tsc clock on older Linux 2.6 kernels compensates for lost ticks.
>> The algorithm uses the PIT count (latched) to measure the delay between
>> interrupt generation and handling, and sums that value, on the next
>> interrupt, to the TSC delta.
>>
>> Sheng investigated this problem in the discussions before in-kernel PIT
>> was merged:
>>
>> http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg13873.html
>>
>>
>> The algorithm overcompensates for lost ticks and the guest time runs
>> faster than the hosts.
>>
>> There are two issues:
>>
>> 1) A bug in the in-kernel PIT which miscalculates the count value.
>>
>> 2) For the case where more than one interrupt is lost, and later
>> reinjected, the value read from PIT count is meaningless for the purpose
>> of the tsc algorithm. The count is interpreted as the delay until the
>> next interrupt, which is not the case with reinjection.
>>
>> As Sheng mentioned in the thread above, Xen pulls back the TSC value
>> when reinjecting interrupts. VMWare ESX has a notion of "virtual TSC",
>> which I believe is similar in this context.
>>
>> For KVM I believe the best immediate solution (for now) is to provide an
>> option to disable reinjection, behaving similarly to real hardware. The
>> advantage is simplicity compared to virtualizing the time sources.
>>
>> The QEMU PIT emulation has a limit on the rate of interrupt reinjection,
>> perhaps something similar should be investigated in the future.
>>
>> The following patch (which contains the bugfix for 1) and disabled
>> reinjection) fixes the severe time drift on RHEL4 with "clock=tsc".
>> What I'm proposing is to condition reinjection with an option
>> (-kvm-pit-no-reinject or something).
>>
>> Comments or better ideas?
>>
>>
>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>> index e665d1c..608af7b 100644
>> --- a/arch/x86/kvm/i8254.c
>> +++ b/arch/x86/kvm/i8254.c
>> @@ -201,13 +201,16 @@ static int __pit_timer_fn(struct kvm_kpit_state
>> *ps)
>> if (!atomic_inc_and_test(&pt->pending))
>> set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests);
>>
>> + if (atomic_read(&pt->pending) > 1)
>> + atomic_set(&pt->pending, 1);
>> +
>>
>
> Replace the atomic_inc() with atomic_set(, 1) instead? One less test,
> and more important, the logic is scattered less around the source.
But having only a pending bit instead of a counter will cause kvm to
drop pit irqs on rare high load situations.
The disable reinjection option is better.
>
>> if (vcpu0 && waitqueue_active(&vcpu0->wq))
>> wake_up_interruptible(&vcpu0->wq);
>>
>> hrtimer_add_expires_ns(&pt->timer, pt->period);
>> pt->scheduled = hrtimer_get_expires_ns(&pt->timer);
>> if (pt->period)
>> - ps->channels[0].count_load_time =
>> hrtimer_get_expires(&pt->timer);
>> + ps->channels[0].count_load_time = ktime_get();
>>
>> return (pt->period == 0 ? 0 : 1);
>> }
>>
>
> I don't like the idea of punting to the user but looks like we don't
> have a choice. Hopefully vendors will port kvmclock to these kernels
> and release them as updates -- time simply doesn't work will with
> virtualization, especially Linux guests.
>
Except for these 'tsc compensate' guest, what are the occasions where
the guest writes his tsc?
If this is the only case we can disable reinjection once we trap tsc writes.
next prev parent reply other threads:[~2008-12-29 16:12 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-24 17:47 gettimeofday "slow" in RHEL4 guests David S. Ahern
2008-11-25 4:41 ` David S. Ahern
2008-11-25 10:14 ` Andi Kleen
2008-11-25 11:17 ` Alexander Graf
2008-11-25 11:48 ` Andi Kleen
2008-11-25 12:13 ` Alexander Graf
2008-11-25 12:52 ` Andi Kleen
2008-12-28 18:38 ` Marcelo Tosatti
2008-12-29 12:37 ` Yang, Sheng
2008-12-29 13:11 ` Avi Kivity
2008-12-29 16:12 ` Dor Laor [this message]
2008-12-29 16:27 ` Avi Kivity
2008-12-29 16:29 ` Avi Kivity
2008-11-25 17:20 ` Hollis Blanchard
2008-11-25 19:09 ` David S. Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4958F6E7.9020309@redhat.com \
--to=dlaor@redhat.com \
--cc=agraf@suse.de \
--cc=avi@redhat.com \
--cc=daahern@cisco.com \
--cc=gcosta@redhat.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=sheng.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).