From: Zachary Amsden <zamsden@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: KVM <kvm@vger.kernel.org>, Marcelo Tosatti <mtosatti@redhat.com>,
Glauber Costa <glommer@redhat.com>,
Linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/18] Robust TSC compensation
Date: Mon, 19 Jul 2010 10:39:53 -1000 [thread overview]
Message-ID: <4C44B819.2030203@redhat.com> (raw)
In-Reply-To: <4C431546.9030906@redhat.com>
On 07/18/2010 04:52 AM, Avi Kivity wrote:
> On 07/13/2010 05:25 AM, Zachary Amsden wrote:
>> Make the match of TSC find TSC writes that are close to each other
>> instead of perfectly identical; this allows the compensator to also
>> work in migration / suspend scenarios.
>>
>
> What scenario exactly?
After migration, qemu will write back MSRs, including TSC to the VCPUs.
They won't have exactly matching values, because they get read out at
different times (actually, because the TSC for the VCPUs never stops,
they can have wildly different times if there was some host overload /
swap / suspend event).
When restarting the CPUs, qemu will try to write back the TSC and then
we end up desynchronizing the system.
It's an ugly problem, and this is an ugly solution.
Better would be to "stop" the VCPUs (requires some kernel
synchronization to determine TSC stop point), or to simply take the
maximum TSC in qemu and write that to all of the CPUs (this assumes the
guest wants to have TSCs in sync at all).
Both methods have to assume small deltas in TSC are unintentional
effects in order to correctly resynchronize.
>
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -926,21 +926,27 @@ void guest_write_tsc(struct kvm_vcpu *vcpu, u64
>> data)
>> struct kvm *kvm = vcpu->kvm;
>> u64 offset, ns, elapsed;
>> struct timespec ts;
>> + s64 sdiff;
>>
>> spin_lock(&kvm->arch.tsc_write_lock);
>> offset = data - native_read_tsc();
>> ns = get_kernel_ns();
>> elapsed = ns - kvm->arch.last_tsc_nsec;
>> + sdiff = data - kvm->arch.last_tsc_write;
>> + if (sdiff< 0)
>> + sdiff = -sdiff;
>>
>> /*
>> - * Special case: identical write to TSC within 5 seconds of
>> + * Special case: close write to TSC within 5 seconds of
>> * another CPU is interpreted as an attempt to synchronize
>> - * (the 5 seconds is to accomodate host load / swapping).
>> + * The 5 seconds is to accomodate host load / swapping as
>> + * well as any reset of TSC during the boot process.
>> *
>> * In that case, for a reliable TSC, we can match TSC offsets,
>> - * or make a best guest using kernel_ns value.
>> + * or make a best guest using elapsed value.
>> */
>> - if (data == kvm->arch.last_tsc_write&& elapsed< 5ULL *
>> NSEC_PER_SEC) {
>> + if (sdiff< nsec_to_cycles(5ULL * NSEC_PER_SEC)&&
>> + elapsed< 5ULL * NSEC_PER_SEC) {
>> if (!check_tsc_unstable()) {
>> offset = kvm->arch.last_tsc_offset;
>> pr_debug("kvm: matched tsc offset for %llu\n", data);
>
> Don't we have to adjust offset to the required different between tsc?
> Or do we assume, that if the guest wrote close enough values, it is
> trying to cleverly compensate for IPI latency?
>
No, we have to assume that any small (small being defined as < 5 second)
difference is unintentional. It's not perfect and is certainly error
prone (without one of the two assists from qemu that I mention above).
I think qemu should probably take the maximum TSC and apply it to all VCPUs.
Zach
next prev parent reply other threads:[~2010-07-19 20:39 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-13 2:25 KVM timekeeping fixes, V2 Zachary Amsden
2010-07-13 2:25 ` [PATCH 01/18] Make TSC offset writes non-preemptible Zachary Amsden
2010-07-13 21:33 ` Rik van Riel
2010-07-18 14:28 ` Avi Kivity
2010-07-18 14:30 ` Avi Kivity
2010-07-13 2:25 ` [PATCH 02/18] Fix SVM VMCB reset Zachary Amsden
2010-07-13 21:37 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 03/18] TSC reset compensation Zachary Amsden
2010-07-13 22:11 ` Rik van Riel
2010-07-18 14:34 ` Avi Kivity
2010-07-19 20:01 ` Zachary Amsden
2010-07-13 2:25 ` [PATCH 04/18] Make cpu_tsc_khz updates use local CPU Zachary Amsden
2010-07-14 14:41 ` Rik van Riel
2010-07-18 14:45 ` Avi Kivity
2010-07-19 20:06 ` Zachary Amsden
2010-07-20 8:53 ` Avi Kivity
2010-07-20 21:57 ` Zachary Amsden
2010-07-13 2:25 ` [PATCH 05/18] Warn about unstable TSC Zachary Amsden
2010-07-14 15:02 ` Rik van Riel
2010-07-18 14:47 ` Avi Kivity
2010-07-13 2:25 ` [PATCH 06/18] Unify TSC logic Zachary Amsden
2010-07-14 15:53 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 07/18] Fix deep C-state TSC desynchronization Zachary Amsden
2010-07-14 16:14 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 08/18] Add helper functions for time computation Zachary Amsden
2010-07-14 19:02 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 09/18] Robust TSC compensation Zachary Amsden
2010-07-13 20:34 ` Marcelo Tosatti
2010-07-13 21:15 ` Zachary Amsden
2010-07-13 21:42 ` David S. Ahern
2010-07-13 21:45 ` Zachary Amsden
2010-07-13 23:32 ` Zachary Amsden
2010-07-14 22:33 ` Rik van Riel
2010-07-18 14:52 ` Avi Kivity
2010-07-19 20:39 ` Zachary Amsden [this message]
2010-07-13 2:25 ` [PATCH 10/18] Keep SMP VMs more in sync on unstable TSC Zachary Amsden
2010-07-15 2:13 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 11/18] Perform hardware_enable in CPU_STARTING callback Zachary Amsden
2010-07-15 2:15 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 12/18] Add clock sync request to hardware enable Zachary Amsden
2010-07-15 2:32 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 13/18] Move scale_delta into common header Zachary Amsden
2010-07-15 2:35 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 14/18] Fix a possible backwards warp of kvmclock Zachary Amsden
2010-07-15 2:37 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 15/18] Implement getnsboottime kernel API Zachary Amsden
2010-07-15 2:41 ` Rik van Riel
2010-07-18 15:07 ` Avi Kivity
2010-07-13 2:25 ` [PATCH 16/18] Use getnsboottime in KVM Zachary Amsden
2010-07-15 2:43 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 17/18] Indicate reliable TSC in kvmclock Zachary Amsden
2010-07-15 2:44 ` Rik van Riel
2010-07-13 2:25 ` [PATCH 18/18] Add timekeeping documentation Zachary Amsden
2010-07-14 7:16 ` Takuya Yoshikawa
2010-07-14 20:28 ` Zachary Amsden
2010-07-16 13:19 ` KVM timekeeping fixes, V2 Joerg Roedel
2010-07-16 17:20 ` Zachary Amsden
2010-07-16 19:26 ` Joerg Roedel
2010-07-18 14:22 ` Avi Kivity
2010-07-18 15:08 ` Avi Kivity
2010-07-19 8:11 ` Zachary Amsden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C44B819.2030203@redhat.com \
--to=zamsden@redhat.com \
--cc=avi@redhat.com \
--cc=glommer@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).