From: Zachary Amsden <zamsden@redhat.com>
To: Joerg Roedel <joro@8bytes.org>
Cc: "Roedel, Joerg" <Joerg.Roedel@amd.com>, kvm <kvm@vger.kernel.org>,
Jan Kiszka <jan.kiszka@web.de>
Subject: Re: Bug in KVM clock backwards compensation
Date: Thu, 28 Apr 2011 20:00:57 -0700 [thread overview]
Message-ID: <4DBA29E9.9000905@redhat.com> (raw)
In-Reply-To: <20110428202002.GF13402@8bytes.org>
On 04/28/2011 01:20 PM, Joerg Roedel wrote:
> On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote:
>
>> On 04/28/2011 12:13 AM, Roedel, Joerg wrote:
>>
>
>>> I see it different. This code wants to check if the _guest_ tsc moves
>>> forwared (or at least not backwards). So it is fully legitimate to just
>>> do this by reading the guest-tsc and compare it to the last one the
>>> guest had.
>>>
>> That wasn't the intention when I wrote that code. It's simply there to
>> detect backwards motion of the host TSC. The guest TSC can legally go
>> backwards whenever the guest decides to change it, so checking the guest
>> TSC doesn't make sense here.
>>
> This code checks how many guest tsc cycles have passed since this vCPU
> was de-scheduled last time (and before it is running again). So since
> the vCPU hasn't run in the meantime it had no chance to change its TSC.
> Further, the other parameters like the TSC offset and the scaling
> multiplier havn't changed too, so the only variable in the guest-tsc
> calculation is the host-tsc.
> So this calculation using the guest-tsc can detect backwards going
> host-tsc as good as the old one. The benefit here is that we can feed
> consistent values into adjust_tsc_offset().
>
While true, this is more complex than the original code. The original
code here doesn't try to actually compensate for the guest TSC
difference - instead what it does is NULL any discovered host TSC delta:
if (tsc_delta < 0)
mark_tsc_unstable("KVM discovered backwards TSC");
if (check_tsc_unstable()) {
kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
vcpu->arch.tsc_catchup = 1;
}
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
Erasing that delta also erases elapsed time since the VCPU has last been
run, which isn't desirable, so it then sets tsc_catchup mode, which will
restore the proper TSC. The request here triggers code which later
updates the TSC offset again.
To avoid complexity, I think it's simplest to do the first computation
in terms of the host TSC.
>> Yes, with tsc-scaling, the machines already have stable TSCs - the above
>> test is for older hardware which could have problems, and can be
>> reverted back to the original code without worrying about switching
>> units.
>>
> This is the case pratically. But architecturally the tsc-scaling feature
> does not depend on a constant tsc, so we can make no such assumtions.
> Additionally, it may happen that Linux mis-detects an unstable tsc for
> some reason (broken BIOS, bug in the code, ...). Therefore I think it
> is dangerous to assume that this code will never run on tsc-scaling
> capable hosts. And if it does and we don't manage the tsc-offset units
> right, we may see very weird behavior.
>
I agree, it is best to handle this case - hardware can and will change -
but the TSC adjustment in terms of guest rate should be done under the
atomic protection right before entering hardware virtualized mode - here:
I left compute_guest_tsc in place to recompute time in guest units here,
even if the underlying hardware rate changes.
/*
* We may have to catch up the TSC to match elapsed wall clock
* time for two reasons, even if kvmclock is used.
* 1) CPU could have been running below the maximum TSC rate
* 2) Broken TSC compensation resets the base at each VCPU
* entry to avoid unknown leaps of TSC even when running
* again on the same CPU. This may cause apparent elapsed
* time to disappear, and the guest to stand still or run
* very slowly.
*/
if (vcpu->tsc_catchup) {
u64 tsc = compute_guest_tsc(v, kernel_ns);
if (tsc > tsc_timestamp) {
kvm_x86_ops->adjust_tsc_offset(v, tsc -
tsc_timestamp);
tsc_timestamp = tsc;
}
}
So yeah, the code is getting pretty complex but we'd like to avoid that
as much as possible - so I would prefer to have the hardware backwards
compensation separate from the guest rate computation by doing this:
step 1) remove any backwards hardware TSC delta (in hardware units)
step 2) recompute guest TSC from a stable clock (gotten from kernel_ns)
and apply adjustment (in guest units)
So it appears you can just share most of the logic of guest TSC catchup
mode.
What do you think?
Zach
next prev parent reply other threads:[~2011-04-29 3:01 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-28 6:59 Bug in KVM clock backwards compensation Zachary Amsden
2011-04-28 7:06 ` Jan Kiszka
2011-04-28 7:22 ` Roedel, Joerg
2011-04-28 19:06 ` Zachary Amsden
2011-04-28 22:38 ` Jan Kiszka
2011-04-28 17:48 ` Zachary Amsden
2011-04-28 7:13 ` Roedel, Joerg
2011-04-28 18:34 ` Zachary Amsden
2011-04-28 20:20 ` Joerg Roedel
2011-04-29 3:00 ` Zachary Amsden [this message]
2011-04-29 8:40 ` Joerg Roedel
2011-04-29 18:17 ` Zachary Amsden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DBA29E9.9000905@redhat.com \
--to=zamsden@redhat.com \
--cc=Joerg.Roedel@amd.com \
--cc=jan.kiszka@web.de \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).